Segment a dataset by a single sample and compute vicinities for it and the remaining samples in the neighborhood.

Given some data and one sample \(s_i\) from it, constructs the neighborhood \(N_i\) of that sample and assigns centralities to all other samples in that neighborhood to it. Samples that lie outside the neighborhood are assigned a vicinity of zero. Uses mmb::neighborhood() and mmb::centralities().

vicinitiesForSample(
  df,
  sampleFromDf,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  doEcdf = FALSE,
  ecdfMinusOne = FALSE,
  retainMinValues = 0
)

Arguments

df	data.frame that holds the data (and also the sample to use to define the neighborhood). Each sample in this data.frame is assigned a vicinity.
sampleFromDf	data.frame a single row from the given data.frame. This is used to select a neighborhood from the given data.
selectedFeatureNames	vector of names of features to use to compute the vicinity/centrality. This is passed to `mmb::neighborhood()`.
shiftAmount	numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF.
doEcdf	boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values.
ecdfMinusOne	boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use.
retainMinValues	DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.

Value

data.frame with a single column 'vicinity' and the same rownames as the given data.frame. Each row then holds the vicinity for the corresponding row.

Examples

vic <- mmb::vicinitiesForSample(
  df = iris, sampleFromDf = iris[1,], shiftAmount = 0.1)
#> Warning: No explicit feature selection, using all.
#> Warning: No explicit feature selection, using all.
#> Warning: Density estimation failed: Error in bw.SJ(x, method = "ste"): sample is too sparse to find TD
#> Warning: provided 11 variables to replace 1 variables
vic$vicinity
#>   [1] 1.980087 1.980087 1.980087 0.000000 0.000000 0.000000 0.000000 0.000000
#>   [9] 1.980087 0.000000 0.000000 0.000000 1.980087 1.980087 0.000000 0.000000
#>  [17] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [25] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [33] 0.000000 0.000000 0.000000 1.980087 0.000000 0.000000 1.980087 0.000000
#>  [41] 0.000000 0.000000 1.980087 0.000000 0.000000 0.000000 0.000000 1.980087
#>  [49] 0.000000 1.980087 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [57] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [65] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [73] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [81] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [89] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#>  [97] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [105] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [113] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [121] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [129] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [137] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> [145] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

# Plot the ordered samples to get an idea which ones have a vicinity > 0
plot(x=rownames(vic), y=vic$vicinity)