metrics_as_scores.data package
Submodules
metrics_as_scores.data.pregenerate module
This module contains top-level function that are used in highly parallel scenarios for pre-generating densities for own datasets, either from previously computed fits for random variables or empirical densities.
- metrics_as_scores.data.pregenerate.generate_densities(dataset: ~metrics_as_scores.distribution.distribution.Dataset, clazz: type[metrics_as_scores.distribution.distribution.Density] = <class 'metrics_as_scores.distribution.distribution.Empirical'>, unique_vals: ~typing.Optional[bool] = None, resample_samples=250000, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, num_jobs: ~typing.Optional[int] = None) dict[str, metrics_as_scores.distribution.distribution.Density][source]
 Generates a set of
Densityobjects for a certainDistTransform. For each combination, we will later save one file that is then to be used in the web application, as generating these on-the-fly would take too long.- dataset: 
Dataset Required for obtaining quantity types, contexts, and filtered data.
- clazz: 
type[Density] A type of empirical density to generate densities for.
- unique_vals: 
bool Used to conditionally add some jitter to data to all data points unique. This is automatically set to True if the class is
Empirical, because this class is for continuous RVs. If the data is not continuous (real), then setting this to True will make it so.- resample_samples: 
int Unsigned integer, passed forward to the type of dict[str, Density].
- dist_transform: 
DistTransform The chosen transformation for the data.
- Return type:
 dict[str, Density]- Returns:
 A dictionary where the key is made of the context and quantity type, and the value is the generated
Empiricaldensity.
- dataset: 
 
- metrics_as_scores.data.pregenerate.fits_to_MAS_densities(dataset: Dataset, distns_dict: dict[int, metrics_as_scores.data.pregenerate_fit.FitResult], dist_transform: DistTransform, use_continuous: bool) dict[str, Union[metrics_as_scores.distribution.distribution.Parametric, metrics_as_scores.distribution.distribution.Parametric_discrete]][source]
 Converts previously produced parametric fits to
Densityobjects that can be loaded and used in the web application. Similar togenerate_densities(), this method also returns a dictionary with generated parametric densities.- dataset: 
Dataset Required for obtaining quantity types, contexts, and filtered data.
- distns_dict: 
dict[int, FitResult] Dictionary with all fit results for a data transform. The int-key is just the previously used grid index and not relevant here.
- dist_transform: 
DistTransform The chosen transformation for the data.
- use_continuous: 
bool Used to select and generate densities based on either continuous (True) RVs or discrete RVs.
- Return type:
 dict[str, Union[Parametric, Parametric_discrete]]- Returns:
 A dictionary where the key is made of the context and quantity type, and the value is the generated
Union[Parametric, Parametric_discrete]density.
- dataset: 
 
- metrics_as_scores.data.pregenerate.generate_empirical(dataset: Dataset, densities_dir: Path, clazz: Union[Empirical, KDE_approx], transform: DistTransform) None[source]
 Generates a set of empirical (continuous) densities for a given density type (Empirical or KDE_Approx) and data transform.
- dataset: 
Dataset Required for obtaining quantity types, contexts, and filtered data.
- densities_dir: 
Path The directory to store the generated densities. The resulting file is a key of the used density type and data transform.
- clazz: 
Union[Empirical, KDE_approx] The type of density you wish to create.
- transform: 
DistTransform The chosen transformation for the data.
- Returns:
 This method does not return anything but only writes the result to disk.
- dataset: 
 
- metrics_as_scores.data.pregenerate.generate_parametric(dataset: Dataset, densities_dir: Path, fits_dir: Path, clazz: Union[Parametric, Parametric_discrete], transform: DistTransform) None[source]
 Generates a set of parametric densities for a given density type (Parametric or Parametric_discrete) and data transform.
- dataset: 
Dataset Required for obtaining quantity types, contexts, and filtered data.
- densities_dir: 
Path The directory to store the generated densities. The resulting file is a key of the used density type and data transform.
- clazz: 
Union[Parametric, Parametric_discrete] The type of density you wish to create.
- transform: 
DistTransform The chosen transformation for the data.
- Returns:
 This method does not return anything but only writes the result to disk.
- dataset: 
 
- metrics_as_scores.data.pregenerate.generate_empirical_discrete(dataset: Dataset, densities_dir: Path, transform: DistTransform) None[source]
 Generates discrete empirical densities for a given data transform. Only uses the type :py:class:
Empirical_discretefor this.- dataset: 
Dataset Required for obtaining quantity types, contexts, and filtered data.
- densities_dir: 
Path The directory to store the generated densities. The resulting file is a key of the used density type and data transform.
- transform: 
DistTransform The chosen transformation for the data.
- Returns:
 This method does not return anything but only writes the result to disk.
- dataset: 
 
metrics_as_scores.data.pregenerate_distns module
This module contains a single function that is used in highly parallel scenarios for fitting continuous and discrete random variables to data.
- metrics_as_scores.data.pregenerate_distns.generate_parametric_fits(ds: Dataset, num_jobs: int, fitter_type: type[metrics_as_scores.distribution.fitting.Fitter], dist_transform: DistTransform, selected_rvs_c: list[type[scipy.stats._distn_infrastructure.rv_continuous]], selected_rvs_d: list[type[scipy.stats._distn_infrastructure.rv_discrete]], data_dict: dict[str, nptyping.ndarray.NDArray], transform_values_dict: dict[str, float], data_discrete_dict: dict[str, nptyping.ndarray.NDArray], transform_values_discrete_dict: dict[str, float]) list[metrics_as_scores.data.pregenerate_fit.FitResult][source]
 The thinking is this: To each data series we can always fit a continuous distribution, whether it’s discrete or continuous data. The same is not true the other way round, i.e., we must not fit a discrete distribution if the data is known to be continuous. Therefore, we do the following:
Regardless of the data, always attempt to fit a continuous RV
For all discrete data, also attempt to fit a discrete RV
That means that for discrete data, we will have two kinds of fitted RVs. Also, when fitting a continuous RV to discrete data, we will add jitter to the data.
- ds: 
Dataset The data, needed for obtaining quantity types and contexts. Also passed forward to
fit().- num_jobs: 
int Degree of parallelization used.
- fitter_type: 
type[Fitter] The class for the fitter to use, either
FitterorFitterPymoo.- dist_transform: 
DistTransform The transform for which to generate parametric fits for. Later, we will save a single file per transform, containing all related fits.
- selected_rvs_c: 
list[type[rv_continuous]] Continuous RVs to attempt to fit.
- selected_rvs_d: 
list[type[rv_discrete]] Discrete RVs to attempt to fit.
- data_dict: 
dict[str, NDArray[Shape["*"], Float]] A dictionary where they key consists of the context and the quantity type. For each entry, it contains a 1-D array of data used for fitting.
- transform_values_dict: 
dict[str, float] Similar to
data_dict, this dictionary contains the transformation value that was used to transform the data in the 1-D array.- data_discrete_dict: 
dict[str, NDArray[Shape["*"], Float]] Like
data_dict, but for discrete RVs fitted to discrete data.- transform_values_discrete_dict: 
dict[str, float] Like
transform_values_dict, but for the discrete datas.
- Returns:
 A list of :py:class:
FitResultobjects.
metrics_as_scores.data.pregenerate_fit module
This is an extra module that holds functions globally, such that we can exploit
multiprocessing effortlessly. Here, the main fit() function is defined.
- metrics_as_scores.data.pregenerate_fit.Continuous_RVs_dict: dict[str, type[scipy.stats._distn_infrastructure.rv_continuous]] = {'alpha_gen': <class 'scipy.stats._continuous_distns.alpha_gen'>, 'anglit_gen': <class 'scipy.stats._continuous_distns.anglit_gen'>, 'arcsine_gen': <class 'scipy.stats._continuous_distns.arcsine_gen'>, 'argus_gen': <class 'scipy.stats._continuous_distns.argus_gen'>, 'beta_gen': <class 'scipy.stats._continuous_distns.beta_gen'>, 'betaprime_gen': <class 'scipy.stats._continuous_distns.betaprime_gen'>, 'bradford_gen': <class 'scipy.stats._continuous_distns.bradford_gen'>, 'burr12_gen': <class 'scipy.stats._continuous_distns.burr12_gen'>, 'burr_gen': <class 'scipy.stats._continuous_distns.burr_gen'>, 'cauchy_gen': <class 'scipy.stats._continuous_distns.cauchy_gen'>, 'chi2_gen': <class 'scipy.stats._continuous_distns.chi2_gen'>, 'chi_gen': <class 'scipy.stats._continuous_distns.chi_gen'>, 'cosine_gen': <class 'scipy.stats._continuous_distns.cosine_gen'>, 'crystalball_gen': <class 'scipy.stats._continuous_distns.crystalball_gen'>, 'dgamma_gen': <class 'scipy.stats._continuous_distns.dgamma_gen'>, 'dweibull_gen': <class 'scipy.stats._continuous_distns.dweibull_gen'>, 'erlang_gen': <class 'scipy.stats._continuous_distns.erlang_gen'>, 'expon_gen': <class 'scipy.stats._continuous_distns.expon_gen'>, 'exponnorm_gen': <class 'scipy.stats._continuous_distns.exponnorm_gen'>, 'exponpow_gen': <class 'scipy.stats._continuous_distns.exponpow_gen'>, 'exponweib_gen': <class 'scipy.stats._continuous_distns.exponweib_gen'>, 'f_gen': <class 'scipy.stats._continuous_distns.f_gen'>, 'fatiguelife_gen': <class 'scipy.stats._continuous_distns.fatiguelife_gen'>, 'fisk_gen': <class 'scipy.stats._continuous_distns.fisk_gen'>, 'foldcauchy_gen': <class 'scipy.stats._continuous_distns.foldcauchy_gen'>, 'foldnorm_gen': <class 'scipy.stats._continuous_distns.foldnorm_gen'>, 'gamma_gen': <class 'scipy.stats._continuous_distns.gamma_gen'>, 'gausshyper_gen': <class 'scipy.stats._continuous_distns.gausshyper_gen'>, 'genexpon_gen': <class 'scipy.stats._continuous_distns.genexpon_gen'>, 'genextreme_gen': <class 'scipy.stats._continuous_distns.genextreme_gen'>, 'gengamma_gen': <class 'scipy.stats._continuous_distns.gengamma_gen'>, 'genhalflogistic_gen': <class 'scipy.stats._continuous_distns.genhalflogistic_gen'>, 'genhyperbolic_gen': <class 'scipy.stats._continuous_distns.genhyperbolic_gen'>, 'geninvgauss_gen': <class 'scipy.stats._continuous_distns.geninvgauss_gen'>, 'genlogistic_gen': <class 'scipy.stats._continuous_distns.genlogistic_gen'>, 'gennorm_gen': <class 'scipy.stats._continuous_distns.gennorm_gen'>, 'genpareto_gen': <class 'scipy.stats._continuous_distns.genpareto_gen'>, 'gibrat_gen': <class 'scipy.stats._continuous_distns.gibrat_gen'>, 'gilbrat_gen': <class 'scipy.stats._continuous_distns.gilbrat_gen'>, 'gompertz_gen': <class 'scipy.stats._continuous_distns.gompertz_gen'>, 'gumbel_l_gen': <class 'scipy.stats._continuous_distns.gumbel_l_gen'>, 'gumbel_r_gen': <class 'scipy.stats._continuous_distns.gumbel_r_gen'>, 'halfcauchy_gen': <class 'scipy.stats._continuous_distns.halfcauchy_gen'>, 'halfgennorm_gen': <class 'scipy.stats._continuous_distns.halfgennorm_gen'>, 'halflogistic_gen': <class 'scipy.stats._continuous_distns.halflogistic_gen'>, 'halfnorm_gen': <class 'scipy.stats._continuous_distns.halfnorm_gen'>, 'hypsecant_gen': <class 'scipy.stats._continuous_distns.hypsecant_gen'>, 'invgamma_gen': <class 'scipy.stats._continuous_distns.invgamma_gen'>, 'invgauss_gen': <class 'scipy.stats._continuous_distns.invgauss_gen'>, 'invweibull_gen': <class 'scipy.stats._continuous_distns.invweibull_gen'>, 'johnsonsb_gen': <class 'scipy.stats._continuous_distns.johnsonsb_gen'>, 'johnsonsu_gen': <class 'scipy.stats._continuous_distns.johnsonsu_gen'>, 'kappa3_gen': <class 'scipy.stats._continuous_distns.kappa3_gen'>, 'kappa4_gen': <class 'scipy.stats._continuous_distns.kappa4_gen'>, 'ksone_gen': <class 'scipy.stats._continuous_distns.ksone_gen'>, 'kstwo_gen': <class 'scipy.stats._continuous_distns.kstwo_gen'>, 'kstwobign_gen': <class 'scipy.stats._continuous_distns.kstwobign_gen'>, 'laplace_asymmetric_gen': <class 'scipy.stats._continuous_distns.laplace_asymmetric_gen'>, 'laplace_gen': <class 'scipy.stats._continuous_distns.laplace_gen'>, 'levy_gen': <class 'scipy.stats._continuous_distns.levy_gen'>, 'levy_l_gen': <class 'scipy.stats._continuous_distns.levy_l_gen'>, 'loggamma_gen': <class 'scipy.stats._continuous_distns.loggamma_gen'>, 'logistic_gen': <class 'scipy.stats._continuous_distns.logistic_gen'>, 'loglaplace_gen': <class 'scipy.stats._continuous_distns.loglaplace_gen'>, 'lognorm_gen': <class 'scipy.stats._continuous_distns.lognorm_gen'>, 'lomax_gen': <class 'scipy.stats._continuous_distns.lomax_gen'>, 'maxwell_gen': <class 'scipy.stats._continuous_distns.maxwell_gen'>, 'mielke_gen': <class 'scipy.stats._continuous_distns.mielke_gen'>, 'moyal_gen': <class 'scipy.stats._continuous_distns.moyal_gen'>, 'nakagami_gen': <class 'scipy.stats._continuous_distns.nakagami_gen'>, 'ncf_gen': <class 'scipy.stats._continuous_distns.ncf_gen'>, 'nct_gen': <class 'scipy.stats._continuous_distns.nct_gen'>, 'ncx2_gen': <class 'scipy.stats._continuous_distns.ncx2_gen'>, 'norm_gen': <class 'scipy.stats._continuous_distns.norm_gen'>, 'norminvgauss_gen': <class 'scipy.stats._continuous_distns.norminvgauss_gen'>, 'pareto_gen': <class 'scipy.stats._continuous_distns.pareto_gen'>, 'pearson3_gen': <class 'scipy.stats._continuous_distns.pearson3_gen'>, 'powerlaw_gen': <class 'scipy.stats._continuous_distns.powerlaw_gen'>, 'powerlognorm_gen': <class 'scipy.stats._continuous_distns.powerlognorm_gen'>, 'powernorm_gen': <class 'scipy.stats._continuous_distns.powernorm_gen'>, 'rayleigh_gen': <class 'scipy.stats._continuous_distns.rayleigh_gen'>, 'rdist_gen': <class 'scipy.stats._continuous_distns.rdist_gen'>, 'recipinvgauss_gen': <class 'scipy.stats._continuous_distns.recipinvgauss_gen'>, 'reciprocal_gen': <class 'scipy.stats._continuous_distns.reciprocal_gen'>, 'rice_gen': <class 'scipy.stats._continuous_distns.rice_gen'>, 'semicircular_gen': <class 'scipy.stats._continuous_distns.semicircular_gen'>, 'skew_norm_gen': <class 'scipy.stats._continuous_distns.skew_norm_gen'>, 'skewcauchy_gen': <class 'scipy.stats._continuous_distns.skewcauchy_gen'>, 'studentized_range_gen': <class 'scipy.stats._continuous_distns.studentized_range_gen'>, 't_gen': <class 'scipy.stats._continuous_distns.t_gen'>, 'trapezoid_gen': <class 'scipy.stats._continuous_distns.trapezoid_gen'>, 'triang_gen': <class 'scipy.stats._continuous_distns.triang_gen'>, 'truncexpon_gen': <class 'scipy.stats._continuous_distns.truncexpon_gen'>, 'truncnorm_gen': <class 'scipy.stats._continuous_distns.truncnorm_gen'>, 'truncpareto_gen': <class 'scipy.stats._continuous_distns.truncpareto_gen'>, 'truncweibull_min_gen': <class 'scipy.stats._continuous_distns.truncweibull_min_gen'>, 'tukeylambda_gen': <class 'scipy.stats._continuous_distns.tukeylambda_gen'>, 'uniform_gen': <class 'scipy.stats._continuous_distns.uniform_gen'>, 'vonmises_gen': <class 'scipy.stats._continuous_distns.vonmises_gen'>, 'wald_gen': <class 'scipy.stats._continuous_distns.wald_gen'>, 'weibull_max_gen': <class 'scipy.stats._continuous_distns.weibull_max_gen'>, 'weibull_min_gen': <class 'scipy.stats._continuous_distns.weibull_min_gen'>, 'wrapcauchy_gen': <class 'scipy.stats._continuous_distns.wrapcauchy_gen'>}
 Dictionary of continuous random variables that are supported by scipy.stats. Note this is a dictionary of types, rather than instances.
- metrics_as_scores.data.pregenerate_fit.Discrete_RVs_dict: dict[str, type[scipy.stats._distn_infrastructure.rv_discrete]] = {'bernoulli_gen': <class 'scipy.stats._discrete_distns.bernoulli_gen'>, 'betabinom_gen': <class 'scipy.stats._discrete_distns.betabinom_gen'>, 'binom_gen': <class 'scipy.stats._discrete_distns.binom_gen'>, 'boltzmann_gen': <class 'scipy.stats._discrete_distns.boltzmann_gen'>, 'dlaplace_gen': <class 'scipy.stats._discrete_distns.dlaplace_gen'>, 'geom_gen': <class 'scipy.stats._discrete_distns.geom_gen'>, 'hypergeom_gen': <class 'scipy.stats._discrete_distns.hypergeom_gen'>, 'logser_gen': <class 'scipy.stats._discrete_distns.logser_gen'>, 'nbinom_gen': <class 'scipy.stats._discrete_distns.nbinom_gen'>, 'nchypergeom_fisher_gen': <class 'scipy.stats._discrete_distns.nchypergeom_fisher_gen'>, 'nchypergeom_wallenius_gen': <class 'scipy.stats._discrete_distns.nchypergeom_wallenius_gen'>, 'nhypergeom_gen': <class 'scipy.stats._discrete_distns.nhypergeom_gen'>, 'planck_gen': <class 'scipy.stats._discrete_distns.planck_gen'>, 'poisson_gen': <class 'scipy.stats._discrete_distns.poisson_gen'>, 'randint_gen': <class 'scipy.stats._discrete_distns.randint_gen'>, 'skellam_gen': <class 'scipy.stats._discrete_distns.skellam_gen'>, 'yulesimon_gen': <class 'scipy.stats._discrete_distns.yulesimon_gen'>, 'zipf_gen': <class 'scipy.stats._discrete_distns.zipf_gen'>, 'zipfian_gen': <class 'scipy.stats._discrete_distns.zipfian_gen'>}
 Dictionary of discrete random variables that are supported by scipy.stats. Note this is a dictionary of types, rather than instances.
- metrics_as_scores.data.pregenerate_fit.get_data_tuple(ds: Dataset, qtype: str, dist_transform: DistTransform, continuous_transform: bool = True) list[tuple[str, nptyping.ndarray.NDArray]][source]
 This method is part of the workflow for computing parametric fits. For a specific type of quantity and transform, it creates datasets for all available contexts.
ds:
Dataset- qtype: 
str The type of quantity to get datasets for.
- dist_transform: 
DistTransform The chosen distribution transform.
- continuous_transform: 
bool Whether the transform is real-valued or must be converted to integer.
- Return type:
 list[tuple[str, NDArray[Shape["*"], Float]]]- Returns:
 A list of tuples of three elements. The first element is a key that identifies the context, the quantity type, and whether the data was computed using unique values (see
Dataset.transform()).
- qtype: 
 
- class metrics_as_scores.data.pregenerate_fit.FitResult[source]
 Bases:
TypedDictThis class is derived from
TypedDictand holds all properties related to a single fit result, that is, a single specific configuration that was fit to a 1-D array of data.- grid_idx: int
 
- dist_transform: str
 
- transform_value: Optional[float]
 
- params: dict[str, Union[float, int]]
 
- context: str
 
- qtype: str
 
- rv: str
 
- type: str
 
- stat_tests: StatisticalTestJson
 
- metrics_as_scores.data.pregenerate_fit.fit(ds: ~metrics_as_scores.distribution.distribution.Dataset, fitter_type: type[metrics_as_scores.distribution.fitting.Fitter], grid_idx: int, row, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform, the_data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], the_data_unique: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], transform_value: ~typing.Optional[float], write_temporary_results: bool = False) FitResult[source]
 This is the main stand-alone function that computes a single parametric fit to a single 1-D array of data. This function is used in Parallel contexts and, therefore, lives on module top level so it can be serialized.
- ds: 
Dataset The data, needed for obtaining quantity types and contexts. Also passed forward to
fit().- fitter_type: 
type[Fitter] The class for the fitter to use, either
FitterorFitterPymoo.- grid_idx: 
int This is only used so it can be stored in the :py:class:
FitResult. This method itself does not have access to the grid.- dist_transform: 
DistTransform The transform for which to generate parametric fits for. Later, we will save a single file per transform, containing all related fits.
- the_data: 
NDArray[Shape["*"], Float] The 1-D data used for fitting the RV.
- the_data_unique: 
NDArray[Shape["*"], Float] 1-D Array of data. In case of continuous data, it is the same as
the_data. In case of discrete data, the data in this array contains a slight jitter as to make all data points unique. Using this data is relevant for conducting statistical goodness of fit tests.
- Returns:
 The :py:class:
FitResult. If the RV could not be fitted, then the parameters in the fitting result will have a value ofNone. This is so this method does not throw exceptions. In case of a failure, no statistical tests are computed, either.
- ds: 
 
Module contents
This package contains functions that are needed for the mass-wise fitting of continuous and discrete random variables to own data, as well as functions for pre-generating densities for own datasets that are then used in the interactive web application.