The neighborhood \(N_i\) is defined as the set of samples that have a similarity greater than zero to the given sample \(s_i\). Segmentation is done using equality (==) for discrete features and less than or equal (<=) for continuous features. Note that feature values NA and NaN are also supported using is.na() and is.nan().

neighborhood(df, features, selectedFeatureNames = c(), retainMinValues = 0)

Arguments

df

data.frame to select the neighborhood from

features

data.frame of Bayes-features, used to segment/select the rows that should make up the neighborhood.

selectedFeatureNames

vector of names of features to use to demarcate the neighborhood. If empty, uses all features' names.

retainMinValues

DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.

Value

data.frame with rows that were selected as neighborhood. It is guaranteed that the rownames are maintained.

Examples

nbh <- mmb::neighborhood(df = iris, features = mmb::createFeatureForBayes( name = "Sepal.Width", value = mean(iris$Sepal.Width)))
#> Warning: No explicit feature selection, using all.
print(nrow(nbh))
#> [1] 83