metrics_as_scores.distribution package

Submodules

metrics_as_scores.distribution.distribution module

This module contains the base class for all densities as used in the web application, as well as all of its concrete implementations. Also, it contains enumerations and typings that describe datasets.

class metrics_as_scores.distribution.distribution.DistTransform(value)[source]

Bases: StrEnum

This is an enumeration of transforms applicable to distributions of a quantity. A transform first computes the desired ideal (transform) value from the given density (e.g., the expectation) and then transforms the initial distribution of values into a distribution of distances.

NONE = '<none>': Do not apply any transform.

EXPECTATION = 'E[X] (expectation)': Compute the expectation of the random variable. This is similar to \(\mathbb{E}[X]=\int_{-\infty}^{\infty}x*f_X(x) dx\) for a continuous random variable.

MEDIAN = 'Median (50th percentile)': Compute the median (50th percentile) of the random variable. The median is defined as the value that splits a probability distribution into a lower and higher half.

MODE = 'Mode (most likely value)': The mode of a random variable is the most frequently occurring value, i.e., the value with the highest probability (density).

INFIMUM = 'Infimum (min. observed value)': The infimum is the lowest observed value of some empirical random variable.

SUPREMUM = 'Supremum (max. observed value)': The supremum is the highest observed value of some empirical random variable.

class metrics_as_scores.distribution.distribution.JsonDataset[source]

Bases: TypedDict

This class is the base class for the LocalDataset and the KnownDataset. Each manifest should have a name, id, description, and author.

name: str

desc: str

id: str

author: list[str]

class metrics_as_scores.distribution.distribution.LocalDataset[source]

Bases: dict

This dataset extends the JsonDataset and adds properties that are filled out when locally creating a new dataset.

origin: str

colname_data: str

colname_type: str

colname_context: str

qtypes: dict[str, Literal['continuous', 'discrete']]

desc_qtypes: dict[str, str]

contexts: list[str]

desc_contexts: dict[str, str]

ideal_values: dict[str, Union[int, float]]

name: str

desc: str

id: str

author: list[str]

class metrics_as_scores.distribution.distribution.KnownDataset[source]

Bases: dict

This dataset extends the JsonDataset with properties that are known about datasets that are available to Metrics As Scores online.

info_url: str

download: str

size: int

size_extracted: int

name: str

desc: str

id: str

author: list[str]

class metrics_as_scores.distribution.distribution.Density(range: tuple[float, float], pdf: Callable[[float], float], cdf: Callable[[float], float], ppf: Optional[Callable[[float], float]] = None, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]

Bases: ABC

This is the abstract base class for parametric and empirical densities. A Density represents a concrete instance of some random variable and its PDF, CDF, and PPF. It also stores information about this concrete instance came to be (e.g., by some concrete transform).

This class provides a set of common getters and setters and also provides some often needed conveniences, such as computing the practical domain. As for the PDF, CDF, and PPF, all known sub-classes have a specific way of obtaining these, and this class’ responsibility lies in vectorizing these functions.

__init__(range: tuple[float, float], pdf: Callable[[float], float], cdf: Callable[[float], float], ppf: Optional[Callable[[float], float]] = None, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs) → None[source]

range: tuple[float, float]: The range of the data.
pdf: Callable[[float], float]: The probability density function.
cdf: Callable[[float], float]: The cumulative distribution function.
ppf: Callable[[float], float]: The percent point (quantile) function.
ideal_value: float: Some quantities have an ideal value. It can be provided here.
dist_transform: DistTransform: The data transform that was applied while obtaining this density.
transform_value: float: Optional transform value that was applied during transformation.
qtype: str: The type of quantity for this density.
context: str: The context of this quantity.

property qtype: Optional[str]: Getter for the quantity type.

property context: Optional[str]: Getter for the context.

property ideal_value: Optional[float]: Getter for the ideal value (if any).

property dist_transform: DistTransform: Getter for the data transformation.

property transform_value: Optional[float]: Getter for the used transformation value (if any).

_min_max(x: float) → float[source]

Used to safely vectorize a CDF, such that it returns 0.0 for when x lies before our range, and 1.0 if x lies beyond our range.

x: float: The x to obtain the CDF’s y for.

Returns:: A value in the range \([0,1]\).

compute_practical_domain(cutoff: float = 0.995) → tuple[float, float][source]

It is quite common that domains extend into distant regions to accommodate even the farthest outliers. This is often counter-productive, especially in the web application. There, we often want to show most of the distribution, so we compute a practical range that cuts off the most extreme outliers. This is useful to showing some default window.

cutoff: float: The percentage of values to include. The CDF is optimized to find some x for which it peaks at the cutoff. For the lower bound, we subtract from CDF the cutoff.

Return type:: tuple[float, float]
Returns:: The practical domain, cut off for both directions.

property practical_domain: tuple[float, float]: Getter for the practical domain. This is a lazy getter that only computes the practical domain if it was not done before.

compute_practical_range_pdf() → tuple[float, float][source]

Similar to :py:meth:compute_practical_domain(), this method computes a practical range for the PDF. This method determines the location of the PDF’s highest mode.

Returns:: Returns a tuple where the first element is always 0.0 and the second is the y of the highest mode (i.e., returns the y of the mode, not x, its location).

property practical_range_pdf: tuple[float, float]: Lazy getter for the practical range of the PDF.

__call__(x: ~typing.Union[float, list[float], ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]]) → float64][source]: Allow objects of type Density to be callable. Calls the vectorized CDF under the hood.

class metrics_as_scores.distribution.distribution.KDE_integrate(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]

Bases: Density

The purpose of this class is to use an empirical (typically Gaussian) PDF and to also provide a smooth CDF that is obtained by integrating the PDF: \(F_X(x)=\int_{-\infty}^{x} f_X(t) dt\). While this kind of CDF is smooth and precise, evaluating it is obviously slow. Therefore, KDE_approx is used in practice.

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) → None[source]

range: tuple[float, float]: The range of the data.
pdf: Callable[[float], float]: The probability density function.
cdf: Callable[[float], float]: The cumulative distribution function.
ppf: Callable[[float], float]: The percent point (quantile) function.
ideal_value: float: Some quantities have an ideal value. It can be provided here.
dist_transform: DistTransform: The data transform that was applied while obtaining this density.
transform_value: float: Optional transform value that was applied during transformation.
qtype: str: The type of quantity for this density.
context: str: The context of this quantity.

init_ppf(cdf_samples: int = 100) → KDE_integrate[source]

Initializes the PPF. We get x and y from the CDF. Then, we swap the two and interpolate a PPF. Since obtaining each y from the CDF means we need to compute an integral, be careful with setting a high number of cdf_samples.

cdf_samples: int: The number of samples to take from the CDF (which is computed by integrating the PDF, so be careful).

class metrics_as_scores.distribution.distribution.KDE_approx(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], resample_samples: int = 200000, compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]

Bases: Density

This kind of density uses Kernel Density Estimation to obtain a PDF, and an empirical CDF (ECDF) to provide a cumulative distribution function. The advantage is that both, PDF and CDF, are fast. The PPF is the inverted and interpolated CDF, so it is fast, too. The data used for the PDF is limited to 10_000 samples using deterministic sampling without replacement. The used for CDF is obtained by sampling a large number (typically 200_000) of data points from the Gaussian KDE, in order to make it smooth.

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], resample_samples: int = 200000, compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) → None[source]

For the other parameters, please refer to Density.__init__().

resample_samples: int: The amount of samples to take from the Gaussian KDE. These samples are then used to estimate an as-smooth-as-possible CDF (and PPF thereof).
compute_ranges: bool: Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.

property pval: float: Shortcut getter for the jittered, two-sample KS-test’s p-value.

property stat: float: Shortcut getter for the jittered, two-sample KS-test’s test statistic (D-value).

class metrics_as_scores.distribution.distribution.Empirical(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]

Bases: Density

This kind of density does not apply any smoothing for CDF, but rather uses a straightforward ECDF for the data as given. The PDF is determined using Gaussian KDE.

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) → None[source]

For the other parameters, please refer to Density.__init__().

compute_ranges: bool: Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.

class metrics_as_scores.distribution.distribution.Empirical_discrete(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]

Bases: Empirical

Inherits from Empirical and is used when the underlying quantity is discrete and not continuous. As PDF, this function uses a PMF that is determined by the frequencies of each discrete datum.

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) → None[source]

For the other parameters, please refer to Density.__init__().

compute_ranges: bool: Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.

property is_fit: bool: Returns True if the given data is valid.

static unfitted(dist_transform: DistTransform) → Empirical_discrete[source]: Used to return an explicit unfit instance of Empirical_discrete. This is used when, for example, continuous (real) data is given to the constructor. We still need an instance of this density in the web application to show an error (e.g., that there are no discrete empirical densities for continuous data).

class metrics_as_scores.distribution.distribution.Parametric(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]

Bases: Density

This density encapsulates a parameterized and previously fitted random variable. Random variables in scipy.stats come with PDF/PMF, CDF, PPF, etc. so we just use these and forward calls to them.

__init__(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs) → None[source]

For the other parameters, please refer to Density.__init__().

dist: rv_generic: An instance of the random variable to use.
dist_params: tuple: A tuple of parameters for the random variable. The order of the parameters is important since it is not a dictionary.
stat_tests: dict[str, float]: A (flattened) dictionary of previously conducted statistical tests. This is used later to choose some best-fitting parametric density by a specific test.
use_stat_test: StatTest_Types: The name of the chosen statistical test used to determine the goodnes of fit.
compute_ranges: bool: Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.

static unfitted(dist_transform: DistTransform) → Parametric[source]: Used to return an explicit unfit instance of Parametric. This is used in case when not a single maximum likelihood fit was successful for a number of random variables. We still need an instance of this density in the web application to show an error (e.g., that it was not possible to fit any random variable to the selected quantity).

property use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary']: Getter for the selected statistical test.

property pval: float: Shortcut getter for the p-value of the selected statistical test.

property stat: float: Shortcut getter for the test statistic of the selected statistical test.

property is_fit: bool: Returns True if this instance is not an explicitly unfit instance.

property practical_domain: tuple[float, float]: Overridden to return a practical domain of \([0,0]\) in case this instance is unfit.

property practical_range_pdf: tuple[float, float]: Overridden to return a practical PDF range of \([0,0]\) in case this instance is unfit.

property dist_name: str: Shortcut getter for the this density’s random variable’s class’ name.

pdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) → float64][source]: Overridden to call the encapsulated distribution’s PDF. If this density is unfit, always returns an array of zeros of same shape as the input.

cdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) → float64][source]: Overridden to call the encapsulated distribution’s CDF. If this density is unfit, always returns an array of zeros of same shape as the input.

ppf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) → float64][source]: Overridden to call the encapsulated distribution’s PPF. If this density is unfit, always returns an array of zeros of same shape as the input.

compute_practical_domain(cutoff: float = 0.9985) → tuple[float, float][source]

Overridden to exploit having available a PPF of a fitted random variable. It can be used to find the practical domain instantaneously instead of having to solve an optimization problem.

cutoff: float: The percentage of values to include. The CDF is optimized to find some x for which it peaks at the cutoff. For the lower bound, we subtract from CDF the cutoff. Note that the default value for the cutoff was adjusted here to extend a little beyond what is good for other types of densities.

Return type:: tuple[float, float]
Returns:: The practical domain, cut off for both directions. If this random variable is unfit, returns Density’s compute_practical_domain().

class metrics_as_scores.distribution.distribution.Parametric_discrete(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]

Bases: Parametric

This type of density inherits from Parametric and is its counterpart for discrete (integral) data. It adds an explicit function for the probability mass and makes the inherited PDF return the PMF’s result.

pmf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) → float64][source]: Implemented to call the encapsulated distribution’s PMF. If this density is unfit, always returns an array of zeros of same shape as the input.

pdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) → float64][source]: Overridden to return the result of the pmf(). Note that in any case, a density’s function pdf() is called (i.e., the callers never call the PMF). Therefore, it is easier catch these calls and redirect them to the PMF.

static unfitted(dist_transform: DistTransform) → Parametric_discrete[source]: Used to return an explicit unfit instance of Parametric. This is used in case when not a single maximum likelihood fit was successful for a number of random variables. We still need an instance of this density in the web application to show an error (e.g., that it was not possible to fit any random variable to the selected quantity).

class metrics_as_scores.distribution.distribution.Dataset(ds: LocalDataset, df: DataFrame)[source]

Bases: object

This class encapsulates a local (self created) dataset and provides help with transforming it, as well as giving some convenience getters.

__init__(ds: LocalDataset, df: DataFrame) → None[source]

property quantity_types: list[str]: Shortcut getter for the manifest’s quantity types.

contexts(include_all_contexts: bool = False) → Iterable[str][source]

Returns the manifest’s defined contexts as a generator. Sometimes we need to ignore the context and aggregate a quantity type across all defined contexts. Then, a virtual context called __ALL__ is used.

include_all_contexts: bool: Whether to also yield the virtual __ALL__-context.

property ideal_values: dict[str, Union[float, int, NoneType]]: Shortcut getter for the manifest’s ideal values.

is_qtype_discrete(qtype: str) → bool[source]: Returns whether a given quantity type is discrete.

qytpe_desc(qtype: str) → str[source]: Returns the description associated with a quantity type.

context_desc(context: str) → Optional[str][source]: Returns the description associated with a context (if any).

property quantity_types_continuous: list[str]: Returns a list of quantity types that are continuous (real-valued).

property quantity_types_discrete: list[str]: Returns a list of quantity types that are discrete (integer-valued).

data(qtype: str, context: Union[str, None, Literal['__ALL__']] = None, unique_vals: bool = True, sub_sample: Optional[int] = None) → float64][source]

This method is used to select a subset of the data, that is specific to at least a type of quantity, and optionally to a context, too.

qtype: str: The name of the quantity type to get data for.
context: Union[str, None, Literal['__ALL__']]: You may specify a context to further filter the data by. Data is always specific to a quantity type, and sometimes to a context. If not context-based filtering is desired, pass None or __ALL__.
unique_vals: bool: If True, some small jitter will be added to the data in order to make it unique.
sub_sample: int: Optional unsigned integer with number of samples to take in case the dataset is very large. It is only applied if the number is smaller than the data’s size.

num_observations() → Iterable[tuple[str, str, int]][source]

Returns the number of observations for each quantity type in each context.

Return type:: Iterable[tuple[str, str, int]] The first element is the context, the second the quantity type, and the third is the number of observations.
Returns:: Returns an iterable generator.

has_sufficient_observations(raise_if_not: bool = True) → bool[source]

Helper method to check whether each quantity type in each context has at least two observations.

raise_if_not: bool: If set to True, will raise an exception instead of returning False in case there are insufficiently many observations. The exception is more informative as it includes the the context and quantity type.

Return type:: bool
Returns:: A boolean indicating whether this Dataset has sufficiently many observations for each and every quantity type.

static transform(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, continuous_value: bool = True) → tuple[float, nptyping.ndarray.NDArray][source]

Transforms a distribution using an ideal value. The resulting data, therefore, is a distribution of distances from the designated ideal value.

Given a distribution \(X\) and an ideal value \(i\), the distribution of distances is defined as \(D=\left|X-i\right|\).

data: NDArray[Shape["*"], Float]: 1-D array of float data, the data to be transformed. The data may also hold integers (or floats that are practically integers).
dist_transform: DistTransform: The transform to apply. If DistTransform.NONE, the data is returned as is, None as the transform value. Any of the other transforms are determined from the data (see notes).
continuous_value: bool: Whether or not the to be determined ideal value should be continuous or not. For example, if using the expectation (mean) as transform, even for a discrete distribution, this is likely to be a float. Setting continuous_value to False will round the found mean to the nearest integer, such that the resulting distribution \(D\) is of integral nature, too.

Return type:: tuple[float, NDArray[Shape["*"], Float]]
Returns:: A tuple holding the applied transform value (if the chosen transform was not DistTransform.NONE) and the array of distances.

Notes

The expectation (mean), in the continuous case, is determined by estimating a Gaussian kernel using gaussian_kde, and then integrating it using Density.practical_domain(). In the discrete case, we use the rounded mean of the data. Mode and median are similarly computed in the continuous and discrete cases, except for the discrete mode we use scipy.stats.mode(). Supremum and infimum are simply computed (and rounded in the discrete case) from the data.

analyze_groups(use: Literal['anova', 'kruskal'], qtypes: Iterable[str], contexts: Iterable[str], unique_vals: bool = True) → DataFrame[source]

For each given type of quantity, this method performs an ANOVA across all given contexts.

use: Literal['anova', 'kruskal']: Indicates which method for comparing groups to use. We can either conduct an ANOVA or a Kruskal-Wallis test.
qtypes: Iterable[str]: An iterable of quantity types to conduct the analysis for. For each given type, a separate analysis is performed and the result appended to the returned data frame.
contexts: Iterable[str]: An iterable of contexts across which each of the quantity types shall be analyzed.
unique_vals: bool: Passed to self.data(). If true, than small, random, and unique noise is added to the data before it is analyzed. This will effectively deduplicate any samples in the data (if any).

Return type:: pd.DataFrame
Returns:: A data frame with the columns qtype (name of the quantity type), stat (ANOVA test statistic), pval, and across_contexts, where the latter is a semicolon-separated list of contexts the quantity type was compared across.

analyze_TukeyHSD(qtypes: Iterable[str]) → DataFrame[source]

Calculate all pairwise comparisons for the given quantity types with Tukey’s Honest Significance Test (HSD) and return the confidence intervals. For each type of quantity, this method performs all of its associated contexts pairwise comparisons. For example, given a quantity \(Q\) and its contexts \(C_1,C_2,C_3\), this method will examine the pairs \(\left[\{C_1,C_2\},\{C_1,C_3\},\{C_2,C_3\}\right]\). For a single type of quantity, e.g., this test is useful to understand how different the quantity manifests across contexts. For multiple quantities, it also allows understanding how contexts distinguish from one another, holistically.

qtypes: Iterable[str]: An iterable of quantity types to conduct the analysis for. For each given type, a separate analysis is performed and the result appended to the returned data frame.

Return type:: pd.DataFrame
Returns:: A data frame with the columns group1, group2, meandiff, p-adj, lower, upper, and reject. For details see statsmodels.stats.multicomp.pairwise_tukeyhsd().

analyze_distr(qtypes: Iterable[str], use_ks_2samp: bool = True, ks2_max_samples=40000) → DataFrame[source]

Performs the two-sample Kolmogorov–Smirnov test or Welch’s t-test for two or more types of quantity. Performs the test for all unique pairs of quantity types.

qtypes: Iterable[str]: An iterable of quantity types to test in a pair-wise manner.
use_ks_2samp: bool: If True, use the two-sample Kolmogorov–Smirnov; Welch’s t-test, otherwise.
ks2_max_samples: int: Unsigned integer used to limit the number of samples used in KS2-test. For larger numbers than the default, it may not be possible to compute it exactly.

Return type:: pd.DataFrame
Returns:: A data frame with columns qtype, stat, pval, group1, and group2.

metrics_as_scores.distribution.fitting module

This module is concerned with fitting distributions to data. It supports both, discrete and continuous distributions. Fitting is done by a common helper that unifies the way either type of distribution is fitted.

metrics_as_scores.distribution.fitting.Continuous_RVs: list[scipy.stats._distn_infrastructure.rv_continuous] = [<scipy.stats._continuous_distns.alpha_gen object>, <scipy.stats._continuous_distns.anglit_gen object>, <scipy.stats._continuous_distns.arcsine_gen object>, <scipy.stats._continuous_distns.argus_gen object>, <scipy.stats._continuous_distns.beta_gen object>, <scipy.stats._continuous_distns.betaprime_gen object>, <scipy.stats._continuous_distns.bradford_gen object>, <scipy.stats._continuous_distns.burr_gen object>, <scipy.stats._continuous_distns.burr12_gen object>, <scipy.stats._continuous_distns.cauchy_gen object>, <scipy.stats._continuous_distns.chi_gen object>, <scipy.stats._continuous_distns.chi2_gen object>, <scipy.stats._continuous_distns.cosine_gen object>, <scipy.stats._continuous_distns.crystalball_gen object>, <scipy.stats._continuous_distns.dgamma_gen object>, <scipy.stats._continuous_distns.dweibull_gen object>, <scipy.stats._continuous_distns.erlang_gen object>, <scipy.stats._continuous_distns.expon_gen object>, <scipy.stats._continuous_distns.exponnorm_gen object>, <scipy.stats._continuous_distns.exponpow_gen object>, <scipy.stats._continuous_distns.exponweib_gen object>, <scipy.stats._continuous_distns.f_gen object>, <scipy.stats._continuous_distns.fatiguelife_gen object>, <scipy.stats._continuous_distns.fisk_gen object>, <scipy.stats._continuous_distns.foldcauchy_gen object>, <scipy.stats._continuous_distns.foldnorm_gen object>, <scipy.stats._continuous_distns.gamma_gen object>, <scipy.stats._continuous_distns.gausshyper_gen object>, <scipy.stats._continuous_distns.genexpon_gen object>, <scipy.stats._continuous_distns.genextreme_gen object>, <scipy.stats._continuous_distns.gengamma_gen object>, <scipy.stats._continuous_distns.genhalflogistic_gen object>, <scipy.stats._continuous_distns.genhyperbolic_gen object>, <scipy.stats._continuous_distns.geninvgauss_gen object>, <scipy.stats._continuous_distns.genlogistic_gen object>, <scipy.stats._continuous_distns.gennorm_gen object>, <scipy.stats._continuous_distns.genpareto_gen object>, <scipy.stats._continuous_distns.gibrat_gen object>, <scipy.stats._continuous_distns.gilbrat_gen object>, <scipy.stats._continuous_distns.gompertz_gen object>, <scipy.stats._continuous_distns.gumbel_l_gen object>, <scipy.stats._continuous_distns.gumbel_r_gen object>, <scipy.stats._continuous_distns.halfcauchy_gen object>, <scipy.stats._continuous_distns.halfgennorm_gen object>, <scipy.stats._continuous_distns.halflogistic_gen object>, <scipy.stats._continuous_distns.halfnorm_gen object>, <scipy.stats._continuous_distns.hypsecant_gen object>, <scipy.stats._continuous_distns.invgamma_gen object>, <scipy.stats._continuous_distns.invgauss_gen object>, <scipy.stats._continuous_distns.invweibull_gen object>, <scipy.stats._continuous_distns.johnsonsb_gen object>, <scipy.stats._continuous_distns.johnsonsu_gen object>, <scipy.stats._continuous_distns.kappa3_gen object>, <scipy.stats._continuous_distns.kappa4_gen object>, <scipy.stats._continuous_distns.ksone_gen object>, <scipy.stats._continuous_distns.kstwo_gen object>, <scipy.stats._continuous_distns.kstwobign_gen object>, <scipy.stats._continuous_distns.laplace_gen object>, <scipy.stats._continuous_distns.laplace_asymmetric_gen object>, <scipy.stats._continuous_distns.levy_gen object>, <scipy.stats._continuous_distns.levy_l_gen object>, <scipy.stats._continuous_distns.loggamma_gen object>, <scipy.stats._continuous_distns.logistic_gen object>, <scipy.stats._continuous_distns.loglaplace_gen object>, <scipy.stats._continuous_distns.lognorm_gen object>, <scipy.stats._continuous_distns.reciprocal_gen object>, <scipy.stats._continuous_distns.lomax_gen object>, <scipy.stats._continuous_distns.maxwell_gen object>, <scipy.stats._continuous_distns.mielke_gen object>, <scipy.stats._continuous_distns.moyal_gen object>, <scipy.stats._continuous_distns.nakagami_gen object>, <scipy.stats._continuous_distns.ncf_gen object>, <scipy.stats._continuous_distns.nct_gen object>, <scipy.stats._continuous_distns.ncx2_gen object>, <scipy.stats._continuous_distns.norm_gen object>, <scipy.stats._continuous_distns.norminvgauss_gen object>, <scipy.stats._continuous_distns.pareto_gen object>, <scipy.stats._continuous_distns.pearson3_gen object>, <scipy.stats._continuous_distns.powerlaw_gen object>, <scipy.stats._continuous_distns.powerlognorm_gen object>, <scipy.stats._continuous_distns.powernorm_gen object>, <scipy.stats._continuous_distns.rayleigh_gen object>, <scipy.stats._continuous_distns.rdist_gen object>, <scipy.stats._continuous_distns.recipinvgauss_gen object>, <scipy.stats._continuous_distns.reciprocal_gen object>, <scipy.stats._continuous_distns.rice_gen object>, <scipy.stats._continuous_distns.semicircular_gen object>, <scipy.stats._continuous_distns.skewcauchy_gen object>, <scipy.stats._continuous_distns.skew_norm_gen object>, <scipy.stats._continuous_distns.studentized_range_gen object>, <scipy.stats._continuous_distns.t_gen object>, <scipy.stats._continuous_distns.trapezoid_gen object>, <scipy.stats._continuous_distns.trapezoid_gen object>, <scipy.stats._continuous_distns.triang_gen object>, <scipy.stats._continuous_distns.truncexpon_gen object>, <scipy.stats._continuous_distns.truncnorm_gen object>, <scipy.stats._continuous_distns.truncpareto_gen object>, <scipy.stats._continuous_distns.truncweibull_min_gen object>, <scipy.stats._continuous_distns.tukeylambda_gen object>, <scipy.stats._continuous_distns.uniform_gen object>, <scipy.stats._continuous_distns.vonmises_gen object>, <scipy.stats._continuous_distns.vonmises_gen object>, <scipy.stats._continuous_distns.wald_gen object>, <scipy.stats._continuous_distns.weibull_max_gen object>, <scipy.stats._continuous_distns.weibull_min_gen object>, <scipy.stats._continuous_distns.wrapcauchy_gen object>]: List of continuous random variables that are supported by scipy.stats. Note this list is a list of instances, rather than types.

metrics_as_scores.distribution.fitting.Discrete_RVs: list[scipy.stats._distn_infrastructure.rv_discrete] = [<scipy.stats._discrete_distns.bernoulli_gen object>, <scipy.stats._discrete_distns.betabinom_gen object>, <scipy.stats._discrete_distns.binom_gen object>, <scipy.stats._discrete_distns.boltzmann_gen object>, <scipy.stats._discrete_distns.dlaplace_gen object>, <scipy.stats._discrete_distns.geom_gen object>, <scipy.stats._discrete_distns.hypergeom_gen object>, <scipy.stats._discrete_distns.logser_gen object>, <scipy.stats._discrete_distns.nbinom_gen object>, <scipy.stats._discrete_distns.nchypergeom_fisher_gen object>, <scipy.stats._discrete_distns.nchypergeom_wallenius_gen object>, <scipy.stats._discrete_distns.nhypergeom_gen object>, <scipy.stats._discrete_distns.planck_gen object>, <scipy.stats._discrete_distns.poisson_gen object>, <scipy.stats._discrete_distns.randint_gen object>, <scipy.stats._discrete_distns.skellam_gen object>, <scipy.stats._discrete_distns.yulesimon_gen object>, <scipy.stats._discrete_distns.zipf_gen object>, <scipy.stats._discrete_distns.zipfian_gen object>]: List of discrete random variables that are supported by scipy.stats. Note this list is a list of instances, rather than types.

metrics_as_scores.distribution.fitting.Discrete_Problems: dict[str, type[metrics_as_scores.distribution.fitting_problems.MixedVariableDistributionFittingProblem]] = {'bernoulli_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_bernoulli_gen'>, 'betabinom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_betabinom_gen'>, 'binom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_binom_gen'>, 'boltzmann_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_boltzmann_gen'>, 'dlaplace_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_dlaplace_gen'>, 'geom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_geom_gen'>, 'hypergeom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_hypergeom_gen'>, 'logser_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_logser_gen'>, 'nbinom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nbinom_gen'>, 'nchypergeom_fisher_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_fisher_gen'>, 'nchypergeom_wallenius_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_wallenius_gen'>, 'nhypergeom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nhypergeom_gen'>, 'planck_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_planck_gen'>, 'poisson_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_poisson_gen'>, 'randint_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_randint_gen'>, 'skellam_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_skellam_gen'>, 'yulesimon_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_yulesimon_gen'>, 'zipf_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_zipf_gen'>, 'zipfian_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_zipfian_gen'>}: List of fitting problems used by pymoo for fitting discrete distributions. Metrics As Scores only supports fitting of discrete random variables through pymoo for random variables that have a corresponding problem defined. However, many, if not most, are covered. In case a problem is missing, the ordinary fitter can be used (which uses differential evolution).

class metrics_as_scores.distribution.fitting.DesignSpaceTerminationFixed(tol=0.005, **kwargs)[source]: Bases: DesignSpaceTermination

class metrics_as_scores.distribution.fitting.SingleObjectiveTermination(xtol=1e-08, cvtol=1e-08, ftol=1e-06, period=75, max_time: int = 600, **kwargs)[source]

Bases: DefaultTermination

This class is used as termination criterion for the FitterPymoo.

__init__(xtol=1e-08, cvtol=1e-08, ftol=1e-06, period=75, max_time: int = 600, **kwargs) → None[source]

class metrics_as_scores.distribution.fitting.Fitter(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]])[source]

Bases: object

This class is used as a generic approach to fit any random variable to any kind of data. However, it is recommended to use the more stable FitterPymoo.

Practical_Ranges = {'bernoulli_gen': {'p': [0.0, 1.0]}, 'betabinom_gen': {'a': [5e-308, 1000.0], 'b': [5e-308, 1000.0], 'n': [0, 5000]}, 'binom_gen': {'n': [1, 25000], 'p': [0.0, 1.0]}, 'boltzmann_gen': {'N': [1, 25000], 'lambda': [0.0, 100000.0]}, 'dlaplace_gen': {'a': [5e-308, 10000.0]}, 'geom_gen': {'p': [0.0, 1.0]}, 'hypergeom_gen': {'M': [1, 25000], 'N': [0, 25000], 'n': [0, 25000]}, 'logser_gen': {'p': [0.0, 1.0]}, 'nbinom_gen': {'n': [0, 25000], 'p': [0.0, 1.0]}, 'nchypergeom_fisher_gen': {'M': [0, 25000], 'N': [0, 25000], 'n': [0, 25000], 'odds': [5e-308, 10000.0]}, 'nchypergeom_wallenius_gen': {'M': [0, 25000], 'N': [0, 25000], 'n': [0, 25000], 'odds': [5e-308, 10000.0]}, 'nhypergeom_gen': {'M': [0, 25000], 'n': [0, 25000], 'r': [0, 25000]}, 'planck_gen': {'lambda': [5e-308, 100.0]}, 'poisson_gen': {'mu': [0.0, 1000000.0]}, 'randint_gen': {'high': [-25000, 25000], 'low': [-25000, 25000]}, 'skellam_gen': {'mu1': [5e-308, 5000.0], 'mu2': [5e-308, 5000.0]}, 'yulesimon_gen': {'alpha': [5e-308, 20000.0]}, 'zipf_gen': {'a': [1.000000000001, 20000.0]}, 'zipfian_gen': {'a': [0.0, 20000.0], 'n': [0, 25000]}}: A dictionary of practical bounds for the parameters of discrete distributions. It is used by the Fitter when using differential evolution to optimize the fit of a distribution. Note that the FitterPymoo does not use these. Instead, it relies on separate problems that are defined for each discrete random variable.

__init__(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]]) → None[source]

dist: type[Union[rv_continuous, rv_discrete]]: Specify the class of the random variable you want to fit.

property is_discrete: bool: Shortcut getter to return whether the used random variable is discrete.

property is_continuous: bool: Shortcut getter to return whether the used random variable is continuous.

fit(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs) → dict[str, Union[float, int]][source]

Convenience method to fit the random variable. If it is continuous, calls rv_continuous.fit(), which uses maximum likelihood estimation. If it is discrete, uses differential evolution to find an estimate.

data: NDArray[Shape["*"], Float]: The data to fit to.

Returns:: A dictionary with named parameters and their values.

class metrics_as_scores.distribution.fitting.FitterPymoo(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]])[source]

Bases: Fitter

This class inherits from Fitter and is the modern successor to it. This is because it uses pymoo to fit discrete random variables, together with a set of specially defined fitting problems.

__init__(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]]) → None[source]

dist: type[Union[rv_continuous, rv_discrete]]: Specify the class of the random variable you want to fit.

fit(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], max_samples=10000, minimize_seeds=[1337, 3735928559, 45640321], verbose: bool = True, stop_after_first_res: bool = True) → dict[str, Union[float, int]][source]

Fits the random variable to the given data. For continuous data, calls Fitter.fit(). For discrete random variables, however, it solves a mixed-variable problem using a genetic algorithm. The documentation below apply to fitting a discrete random variable.

data: NDArray[Shape["*"], Float]: The 1-D data to fit the random variable to.
max_samples: int: Used to deterministically sub-sample the data, should it be longer than this.
minimize_seeds: list[int]: A list of integer seeds. For each, the optimization is run once, and the result with smallest solution is returned.
verbose: bool: Passed to minimize().
stop_after_first_res: bool: Whether to stop after the first successful minimization. If True, then no further fits for the remaining seeds are computed. This is the default as optimization usually succeeds and subsequent successful runs are rarely or only insignificantly better.

Raises:: Exception: If the optimization does not find a single solution.
Return type:: dict[str, Union[float, int]]
Returns:: A dictionary with parameter names and values of the best found solution.

class metrics_as_scores.distribution.fitting.TestJson[source]

Bases: TypedDict

Used for serializing a StatisticalTest to JSON. This class represents the result of a single test.

pval: float

stat: float

class metrics_as_scores.distribution.fitting.StatisticalTestJson[source]

Bases: TypedDict

Used for serializing a StatisticalTest to JSON. This class represents a set of tests and their results.

tests: dict[str, metrics_as_scores.distribution.fitting.TestJson]

discrete_data1: bool

discrete_data2: bool

class metrics_as_scores.distribution.fitting.StatisticalTest(data1: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], cdf: ~typing.Callable[[~typing.Union[float, int]], float], ppf_or_data2: ~typing.Union[~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ~typing.Callable[[~typing.Union[float, int]], float]], data2_num_samples: ~typing.Optional[int] = None, method='auto', stat_tests=[<function cramervonmises>, <function cramervonmises_2samp>, <function ks_1samp>, <function ks_2samp>, <function epps_singleton_2samp>], max_samples: int = 10000)[source]

Bases: object

This class is used to conduct various statistical goodness-of-fit tests. Since not every test is always applicable (e.g., the Kolmogorov–Smirnov test is not applicable to continuous data), a variety of tests are conducted and the most suitable test (and its results) are then typically selected at runtime.

__init__(data1: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], cdf: ~typing.Callable[[~typing.Union[float, int]], float], ppf_or_data2: ~typing.Union[~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ~typing.Callable[[~typing.Union[float, int]], float]], data2_num_samples: ~typing.Optional[int] = None, method='auto', stat_tests=[<function cramervonmises>, <function cramervonmises_2samp>, <function ks_1samp>, <function ks_2samp>, <function epps_singleton_2samp>], max_samples: int = 10000) → None[source]

Initializes a new StatisticalTest. Works with real- and integer-valued data. For the latter, additional tests including a deterministic jitter are performed, as these are often more representative.

data1: NDArray[Shape["*"], Float]: A 1-D array of the data, the first sample.
cdf: Callable[[Union[float, int]], float]: A CDF used to determine if data1 comes from it. Used in the one-sample tests.
ppf_or_data2: Union[NDArray[Shape["*"], Float], Callable[[Union[float, int]], float]]: Either a second 1-D array of the second sample, or a PPF that can be used to generate a second sample by inverse sampling.
data2_num_samples: int: The number of samples to draw for the second sample if ppf_or_data2 is a PPF.
method: str: Passed to the statistical tests that support a method-parameter.
stat_tests: [cramervonmises, cramervonmises_2samp, ks_1samp, ks_2samp, epps_singleton_2samp]: List of statistical tests to conduct.
max_samples: int: The maximum amount of data points in either sample. Also limits data2_num_samples.

__iter__() → Sequence[tuple[str, Any]][source]: Implemented so that this instance can be transformed into a dictionary and then subsequently serialized to JSON.

static from_dict(d: MutableMapping[str, Any], key_prefix: str = 'stat_tests_tests') → dict[str, float][source]: Re-creates the dictionary of tests from a previously to-JSON serialized instance.

metrics_as_scores.distribution.fitting_problems module

This module contains pymoo fitting problems that allow fitting distributions to almost arbitrary discrete data. The discrete random variables in scipy do not have a fit()-method, as their fitting often requires a global search. Also, many distributions require discrete parameters, or a mixture of real and integer parameters. The problems in this module provide a generalized way for pymoo to find parameters for all of scipy’s discrete random variables.

class metrics_as_scores.distribution.fitting_problems.MixedVariableDistributionFittingProblem(dist: ~scipy.stats._distn_infrastructure.rv_discrete, data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], vars: dict[str, pymoo.core.variable.Variable], n_ieq_constr: int = 0, **kwargs)[source]

Bases: ElementwiseProblem

This is the base class for fitting all of scipy’s discrete random variables. Therefore, it accepts a dictionary of parameters for each distribution to find optimal values for.

__init__(dist: ~scipy.stats._distn_infrastructure.rv_discrete, data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], vars: dict[str, pymoo.core.variable.Variable], n_ieq_constr: int = 0, **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

_evaluate(X, out, *args, **kwargs) → dict[source]

This is an internal method that evaluates the discrete random variable’s negative log likelihood, given the currently set values for all of its variables (stored in X). This method is called by pymoo, so be sure to check out their references, too. Note that the X-dictionary is used to build \(\theta\), the vector of parameters for the random variable. The order of the parameters in that vector depends on the order of self.vars. This method usually does not need to be overridden, except for when, e.g., it is required to evaluate (in-)equality constraints (that is, whenever something else than ‘F’ in the out-dictionary must be accessed).

X: dict[str, Any]: The (ordered) dictionary with the variables’ names and values.
out: dict[str, Any]: A dictionary used by pymoo to store results in; e.g., in ‘F’ it stores the result of the evaluation, and in ‘G’ it stores the inequality constraints’ values.

Returns:: Returns the out-dictionary. However, the dictionary is accessed by reference, so this method does not have to return anything.
Return type:: dict[Any,Any]

class metrics_as_scores.distribution.fitting_problems.Fit_bernoulli_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Bernoulli distribution using a pymoo problem. It uses scipy’s bernoulli_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

p: (int) \(\left[0,1\right]\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_betabinom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Beta-Binomial distribution using a pymoo problem. It uses scipy’s betabinom_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

n: (int) \(\left(0,1e^{4}\right)\)
a: (float) \(\left(5e^{-308},1e^3\right)\)
b: (float) \(\left(5e^{-308},1e^3\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_binom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Binomial distribution using a pymoo problem. It uses scipy’s binom_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

n: (int) \(\left(1,25e^{3}\right)\)
p: (float) \(\left(0,1\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_boltzmann_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Boltzman distribution using a pymoo problem. It uses scipy’s boltzmann_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

lambda [\(\lambda\)]: (float) \(\left(0,1e^{5}\right)\)
N: (int) \(\left(1,25e^3\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_dlaplace_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Laplacian distribution using a pymoo problem. It uses scipy’s dlaplace_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

a: (float) \(\left(5e^{-308},1e^4\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_geom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Geometric distribution using a pymoo problem. It uses scipy’s geom_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

p: (float) \(\left(0,1\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_hypergeom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Hypergeometric distribution using a pymoo problem. It uses scipy’s hypergeom_gen as base distribution.

Notes

This problem does override _evaluate() and has four inequality constraints. These are:

\(n\geq0\) (or \(-n\leq0\))
\(N\geq0\) (or \(-N\leq0\))
\(n\leq M\) (or \(n-M\leq0\))
\(N\leq M\) (or \(N-M\leq0\))

Calls the super constructor with these variables (in this order):

M: (int) \(\left(1,25e^{3}\right)\)
n: (int) \(\left(0,25e^{3}\right)\)
N: (int) \(\left(0,25e^{3}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

_evaluate(X, out, *args, **kwargs) → dict[source]: Overridden to evaluate the inequality constraints, too. For all other documentaion, check out MixedVariableDistributionFittingProblem._evaluate().

class metrics_as_scores.distribution.fitting_problems.Fit_logser_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Logarithmic Series distribution using a pymoo problem. It uses scipy’s logser_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

p: (float) \(\left(0,1\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_nbinom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Negative Binomial Series distribution using a pymoo problem. It uses scipy’s nbinom_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

n: (int) \(\left(0,25e^{3}\right)\)
p: (float) \(\left[0,1\right]\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_fisher_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Fisher’s Non-central Hypergeometric distribution using a pymoo problem. It uses scipy’s nchypergeom_fisher_gen as base distribution.

Notes

This problem does override _evaluate() and has four inequality constraints. These are:

\(N\leq M\) (or \(N-M\leq0\))
\(n\leq M\) (or \(n-M\leq0\))
\(\max{(\text{data})}\leq N\) (or \(\max{(\text{data})}-N\leq0\))
\(\max{(\text{data})}\leq n\) (or \(\max{(\text{data})}-n\leq0\))

Calls the super constructor with these variables (in this order; note that \(k=\) data.size):

M: (int) \(\left(1,5\times k\right)\)
n: (int) \(\left(1,5\times k\right)\)
N: (int) \(\left(1,5\times k\right)\)
odds: (float) \(\left(5e^{-308},1e^{4}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

_evaluate(X, out, *args, **kwargs) → dict[source]: Overridden to evaluate the inequality constraints, too. For all other documentaion, check out MixedVariableDistributionFittingProblem._evaluate().

class metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_wallenius_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: Fit_nchypergeom_fisher_gen

This class allows to fit the generalized Wallenius Non-central Hypergeometric distribution using a pymoo problem. It uses scipy’s nchypergeom_wallenius_gen as base distribution.

Notes

This distribution has the same parameters and constraints as the nchypergeom_fisher_gen, which is implemented by the problem Fit_nchypergeom_fisher_gen (which this problem inherits from directly).

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_nhypergeom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Negative Hypergeometric distribution using a pymoo problem. It uses scipy’s nhypergeom_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

M: (int) \(\left(0,25e^3\right)\)
n: (int) \(\left(0,25e^3\right)\)
r: (int) \(\left(0,25e^3\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_planck_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Planck distribution using a pymoo problem. It uses scipy’s planck_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

p: (float) \(\left(5e^{-308},1e^2\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_poisson_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Poisson distribution using a pymoo problem. It uses scipy’s poisson_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

mu [\(\mu\)]: (float) \(\left(0,1e^{6}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_randint_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Uniform distribution using a pymoo problem. It uses scipy’s randint_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

low: (int) \(\left(-25e^{3},25e^{3}\right)\)
high: (int) \(\left(-25e^{3},25e^{3}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_skellam_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Skellam distribution using a pymoo problem. It uses scipy’s skellam_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

mu1 [\(\mu_1\)]: (float) \(\left(5e^{-308},5e^{3}\right)\)
mu2 [\(\mu_2\)]: (float) \(\left(5e^{-308},5e^{3}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_yulesimon_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Yule–Simon distribution using a pymoo problem. It uses scipy’s yulesimon_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

alpha [\(\alpha\)]: (float) \(\left(5e^{-308},2e^{4}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_zipf_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Zipf (Zeta) distribution using a pymoo problem. It uses scipy’s zipf_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

a: (float) \(\left(1+1e^{-12},2e^{4}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

class metrics_as_scores.distribution.fitting_problems.Fit_zipfian_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Bases: MixedVariableDistributionFittingProblem

This class allows to fit the generalized Zipfian distribution using a pymoo problem. It uses scipy’s zipfian_gen as base distribution.

Notes

Does not override MixedVariableDistributionFittingProblem._evaluate() and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):

a: (float) \(\left(0,2e^{4}\right)\)
n: (int) \(\left(0,25e^{3}\right)\)

__init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]

Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by pymoo.core.variable.Variable (e.g., Integer, Real, etc.).

Parameters:

dist (rv_discrete) – An instance of the concrete discrete random variable that should be fit to the data.
vars (dict[str, Variable]) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.
data (NDArray[Shape['*'], Float]) – The data the distribution should be fit to.
n_ieq_constr (int) – Number of inequality constraints. If there are any, then the problem also overrides Problem._evaluate() and sets values for each constraint.

Module contents

This package holds main functionality for representing data and features as densities (e.g., discrete/continuous, parametric/empirical, etc.). It also contains a list of fitting problems and the fitters themselves.