metrics_as_scores.distribution package
Submodules
metrics_as_scores.distribution.distribution module
This module contains the base class for all densities as used in the web application, as well as all of its concrete implementations. Also, it contains enumerations and typings that describe datasets.
- class metrics_as_scores.distribution.distribution.DistTransform(value)[source]
Bases:
StrEnum
This is an enumeration of transforms applicable to distributions of a quantity. A transform first computes the desired ideal (transform) value from the given density (e.g., the expectation) and then transforms the initial distribution of values into a distribution of distances.
- NONE = '<none>'
Do not apply any transform.
- EXPECTATION = 'E[X] (expectation)'
Compute the expectation of the random variable. This is similar to \(\mathbb{E}[X]=\int_{-\infty}^{\infty}x*f_X(x) dx\) for a continuous random variable.
- MEDIAN = 'Median (50th percentile)'
Compute the median (50th percentile) of the random variable. The median is defined as the value that splits a probability distribution into a lower and higher half.
- MODE = 'Mode (most likely value)'
The mode of a random variable is the most frequently occurring value, i.e., the value with the highest probability (density).
- INFIMUM = 'Infimum (min. observed value)'
The infimum is the lowest observed value of some empirical random variable.
- SUPREMUM = 'Supremum (max. observed value)'
The supremum is the highest observed value of some empirical random variable.
- class metrics_as_scores.distribution.distribution.JsonDataset[source]
Bases:
TypedDict
This class is the base class for the
LocalDataset
and theKnownDataset
. Each manifest should have a name, id, description, and author.- name: str
- desc: str
- id: str
- author: list[str]
- class metrics_as_scores.distribution.distribution.LocalDataset[source]
Bases:
dict
This dataset extends the
JsonDataset
and adds properties that are filled out when locally creating a new dataset.- origin: str
- colname_data: str
- colname_type: str
- colname_context: str
- qtypes: dict[str, Literal['continuous', 'discrete']]
- desc_qtypes: dict[str, str]
- contexts: list[str]
- desc_contexts: dict[str, str]
- ideal_values: dict[str, Union[int, float]]
- name: str
- desc: str
- id: str
- author: list[str]
- class metrics_as_scores.distribution.distribution.KnownDataset[source]
Bases:
dict
This dataset extends the
JsonDataset
with properties that are known about datasets that are available to Metrics As Scores online.- info_url: str
- download: str
- size: int
- size_extracted: int
- name: str
- desc: str
- id: str
- author: list[str]
- class metrics_as_scores.distribution.distribution.Density(range: tuple[float, float], pdf: Callable[[float], float], cdf: Callable[[float], float], ppf: Optional[Callable[[float], float]] = None, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]
Bases:
ABC
This is the abstract base class for parametric and empirical densities. A
Density
represents a concrete instance of some random variable and its PDF, CDF, and PPF. It also stores information about this concrete instance came to be (e.g., by some concrete transform).This class provides a set of common getters and setters and also provides some often needed conveniences, such as computing the practical domain. As for the PDF, CDF, and PPF, all known sub-classes have a specific way of obtaining these, and this class’ responsibility lies in vectorizing these functions.
- __init__(range: tuple[float, float], pdf: Callable[[float], float], cdf: Callable[[float], float], ppf: Optional[Callable[[float], float]] = None, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs) None [source]
- range:
tuple[float, float]
The range of the data.
- pdf:
Callable[[float], float]
The probability density function.
- cdf:
Callable[[float], float]
The cumulative distribution function.
- ppf:
Callable[[float], float]
The percent point (quantile) function.
- ideal_value:
float
Some quantities have an ideal value. It can be provided here.
- dist_transform:
DistTransform
The data transform that was applied while obtaining this density.
- transform_value:
float
Optional transform value that was applied during transformation.
- qtype:
str
The type of quantity for this density.
- context:
str
The context of this quantity.
- range:
- property qtype: Optional[str]
Getter for the quantity type.
- property context: Optional[str]
Getter for the context.
- property ideal_value: Optional[float]
Getter for the ideal value (if any).
- property dist_transform: DistTransform
Getter for the data transformation.
- property transform_value: Optional[float]
Getter for the used transformation value (if any).
- _min_max(x: float) float [source]
Used to safely vectorize a CDF, such that it returns 0.0 for when x lies before our range, and 1.0 if x lies beyond our range.
- x:
float
The x to obtain the CDF’s y for.
- Returns:
A value in the range \([0,1]\).
- x:
- compute_practical_domain(cutoff: float = 0.995) tuple[float, float] [source]
It is quite common that domains extend into distant regions to accommodate even the farthest outliers. This is often counter-productive, especially in the web application. There, we often want to show most of the distribution, so we compute a practical range that cuts off the most extreme outliers. This is useful to showing some default window.
- cutoff:
float
The percentage of values to include. The CDF is optimized to find some x for which it peaks at the cutoff. For the lower bound, we subtract from CDF the cutoff.
- Return type:
tuple[float, float]
- Returns:
The practical domain, cut off for both directions.
- cutoff:
- property practical_domain: tuple[float, float]
Getter for the practical domain. This is a lazy getter that only computes the practical domain if it was not done before.
- compute_practical_range_pdf() tuple[float, float] [source]
Similar to :py:meth:compute_practical_domain(), this method computes a practical range for the PDF. This method determines the location of the PDF’s highest mode.
- Returns:
Returns a tuple where the first element is always 0.0 and the second is the y of the highest mode (i.e., returns the y of the mode, not x, its location).
- property practical_range_pdf: tuple[float, float]
Lazy getter for the practical range of the PDF.
- class metrics_as_scores.distribution.distribution.KDE_integrate(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]
Bases:
Density
The purpose of this class is to use an empirical (typically Gaussian) PDF and to also provide a smooth CDF that is obtained by integrating the PDF: \(F_X(x)=\int_{-\infty}^{x} f_X(t) dt\). While this kind of CDF is smooth and precise, evaluating it is obviously slow. Therefore,
KDE_approx
is used in practice.- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) None [source]
- range:
tuple[float, float]
The range of the data.
- pdf:
Callable[[float], float]
The probability density function.
- cdf:
Callable[[float], float]
The cumulative distribution function.
- ppf:
Callable[[float], float]
The percent point (quantile) function.
- ideal_value:
float
Some quantities have an ideal value. It can be provided here.
- dist_transform:
DistTransform
The data transform that was applied while obtaining this density.
- transform_value:
float
Optional transform value that was applied during transformation.
- qtype:
str
The type of quantity for this density.
- context:
str
The context of this quantity.
- range:
- init_ppf(cdf_samples: int = 100) KDE_integrate [source]
Initializes the PPF. We get x and y from the CDF. Then, we swap the two and interpolate a PPF. Since obtaining each y from the CDF means we need to compute an integral, be careful with setting a high number of
cdf_samples
.- cdf_samples:
int
The number of samples to take from the CDF (which is computed by integrating the PDF, so be careful).
- cdf_samples:
- class metrics_as_scores.distribution.distribution.KDE_approx(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], resample_samples: int = 200000, compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]
Bases:
Density
This kind of density uses Kernel Density Estimation to obtain a PDF, and an empirical CDF (ECDF) to provide a cumulative distribution function. The advantage is that both, PDF and CDF, are fast. The PPF is the inverted and interpolated CDF, so it is fast, too. The data used for the PDF is limited to 10_000 samples using deterministic sampling without replacement. The used for CDF is obtained by sampling a large number (typically 200_000) of data points from the Gaussian KDE, in order to make it smooth.
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], resample_samples: int = 200000, compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) None [source]
For the other parameters, please refer to
Density.__init__()
.- resample_samples:
int
The amount of samples to take from the Gaussian KDE. These samples are then used to estimate an as-smooth-as-possible CDF (and PPF thereof).
- compute_ranges:
bool
Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.
- resample_samples:
- property pval: float
Shortcut getter for the jittered, two-sample KS-test’s p-value.
- property stat: float
Shortcut getter for the jittered, two-sample KS-test’s test statistic (D-value).
- class metrics_as_scores.distribution.distribution.Empirical(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]
Bases:
Density
This kind of density does not apply any smoothing for CDF, but rather uses a straightforward ECDF for the data as given. The PDF is determined using Gaussian KDE.
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], compute_ranges: bool = False, ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) None [source]
For the other parameters, please refer to
Density.__init__()
.- compute_ranges:
bool
Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.
- compute_ranges:
- class metrics_as_scores.distribution.distribution.Empirical_discrete(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs)[source]
Bases:
Empirical
Inherits from
Empirical
and is used when the underlying quantity is discrete and not continuous. As PDF, this function uses a PMF that is determined by the frequencies of each discrete datum.- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ideal_value: ~typing.Optional[float] = None, dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, transform_value: ~typing.Optional[float] = None, qtype: ~typing.Optional[str] = None, context: ~typing.Optional[str] = None, **kwargs) None [source]
For the other parameters, please refer to
Density.__init__()
.- compute_ranges:
bool
Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.
- compute_ranges:
- property is_fit: bool
Returns True if the given data is valid.
- static unfitted(dist_transform: DistTransform) Empirical_discrete [source]
Used to return an explicit unfit instance of
Empirical_discrete
. This is used when, for example, continuous (real) data is given to the constructor. We still need an instance of this density in the web application to show an error (e.g., that there are no discrete empirical densities for continuous data).
- class metrics_as_scores.distribution.distribution.Parametric(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]
Bases:
Density
This density encapsulates a parameterized and previously fitted random variable. Random variables in
scipy.stats
come with PDF/PMF, CDF, PPF, etc. so we just use these and forward calls to them.- __init__(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs) None [source]
For the other parameters, please refer to
Density.__init__()
.- dist:
rv_generic
An instance of the random variable to use.
- dist_params:
tuple
A tuple of parameters for the random variable. The order of the parameters is important since it is not a dictionary.
- stat_tests:
dict[str, float]
A (flattened) dictionary of previously conducted statistical tests. This is used later to choose some best-fitting parametric density by a specific test.
- use_stat_test:
StatTest_Types
The name of the chosen statistical test used to determine the goodnes of fit.
- compute_ranges:
bool
Whether or not to compute the practical domain of the data and the practical range of the PDF. Both of these use optimization to find the results.
- dist:
- static unfitted(dist_transform: DistTransform) Parametric [source]
Used to return an explicit unfit instance of
Parametric
. This is used in case when not a single maximum likelihood fit was successful for a number of random variables. We still need an instance of this density in the web application to show an error (e.g., that it was not possible to fit any random variable to the selected quantity).
- property use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary']
Getter for the selected statistical test.
- property pval: float
Shortcut getter for the p-value of the selected statistical test.
- property stat: float
Shortcut getter for the test statistic of the selected statistical test.
- property is_fit: bool
Returns True if this instance is not an explicitly unfit instance.
- property practical_domain: tuple[float, float]
Overridden to return a practical domain of \([0,0]\) in case this instance is unfit.
- property practical_range_pdf: tuple[float, float]
Overridden to return a practical PDF range of \([0,0]\) in case this instance is unfit.
- property dist_name: str
Shortcut getter for the this density’s random variable’s class’ name.
- pdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) float64] [source]
Overridden to call the encapsulated distribution’s PDF. If this density is unfit, always returns an array of zeros of same shape as the input.
- cdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) float64] [source]
Overridden to call the encapsulated distribution’s CDF. If this density is unfit, always returns an array of zeros of same shape as the input.
- ppf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) float64] [source]
Overridden to call the encapsulated distribution’s PPF. If this density is unfit, always returns an array of zeros of same shape as the input.
- compute_practical_domain(cutoff: float = 0.9985) tuple[float, float] [source]
Overridden to exploit having available a PPF of a fitted random variable. It can be used to find the practical domain instantaneously instead of having to solve an optimization problem.
- cutoff:
float
The percentage of values to include. The CDF is optimized to find some x for which it peaks at the cutoff. For the lower bound, we subtract from CDF the cutoff. Note that the default value for the cutoff was adjusted here to extend a little beyond what is good for other types of densities.
- Return type:
tuple[float, float]
- Returns:
The practical domain, cut off for both directions. If this random variable is unfit, returns
Density
’scompute_practical_domain()
.
- cutoff:
- class metrics_as_scores.distribution.distribution.Parametric_discrete(dist: rv_generic, dist_params: tuple, range: tuple[float, float], stat_tests: dict[str, float], use_stat_test: Literal['cramervonmises_jittered', 'cramervonmises_ordinary', 'cramervonmises_2samp_jittered', 'cramervonmises_2samp_ordinary', 'epps_singleton_2samp_jittered', 'epps_singleton_2samp_ordinary', 'ks_1samp_jittered', 'ks_1samp_ordinary', 'ks_2samp_jittered', 'ks_2samp_ordinary'] = 'ks_2samp_jittered', compute_ranges: bool = False, ideal_value: Optional[float] = None, dist_transform: DistTransform = DistTransform.NONE, transform_value: Optional[float] = None, qtype: Optional[str] = None, context: Optional[str] = None, **kwargs)[source]
Bases:
Parametric
This type of density inherits from
Parametric
and is its counterpart for discrete (integral) data. It adds an explicit function for the probability mass and makes the inherited PDF return the PMF’s result.- pmf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) float64] [source]
Implemented to call the encapsulated distribution’s PMF. If this density is unfit, always returns an array of zeros of same shape as the input.
- pdf(x: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64]) float64] [source]
Overridden to return the result of the
pmf()
. Note that in any case, a density’s functionpdf()
is called (i.e., the callers never call the PMF). Therefore, it is easier catch these calls and redirect them to the PMF.
- static unfitted(dist_transform: DistTransform) Parametric_discrete [source]
Used to return an explicit unfit instance of
Parametric
. This is used in case when not a single maximum likelihood fit was successful for a number of random variables. We still need an instance of this density in the web application to show an error (e.g., that it was not possible to fit any random variable to the selected quantity).
- class metrics_as_scores.distribution.distribution.Dataset(ds: LocalDataset, df: DataFrame)[source]
Bases:
object
This class encapsulates a local (self created) dataset and provides help with transforming it, as well as giving some convenience getters.
- __init__(ds: LocalDataset, df: DataFrame) None [source]
- property quantity_types: list[str]
Shortcut getter for the manifest’s quantity types.
- contexts(include_all_contexts: bool = False) Iterable[str] [source]
Returns the manifest’s defined contexts as a generator. Sometimes we need to ignore the context and aggregate a quantity type across all defined contexts. Then, a virtual context called __ALL__ is used.
- include_all_contexts:
bool
Whether to also yield the virtual __ALL__-context.
- include_all_contexts:
- property ideal_values: dict[str, Union[float, int, NoneType]]
Shortcut getter for the manifest’s ideal values.
- context_desc(context: str) Optional[str] [source]
Returns the description associated with a context (if any).
- property quantity_types_continuous: list[str]
Returns a list of quantity types that are continuous (real-valued).
- property quantity_types_discrete: list[str]
Returns a list of quantity types that are discrete (integer-valued).
- data(qtype: str, context: Union[str, None, Literal['__ALL__']] = None, unique_vals: bool = True, sub_sample: Optional[int] = None) float64] [source]
This method is used to select a subset of the data, that is specific to at least a type of quantity, and optionally to a context, too.
- qtype:
str
The name of the quantity type to get data for.
- context:
Union[str, None, Literal['__ALL__']]
You may specify a context to further filter the data by. Data is always specific to a quantity type, and sometimes to a context. If not context-based filtering is desired, pass
None
or__ALL__
.- unique_vals:
bool
If True, some small jitter will be added to the data in order to make it unique.
- sub_sample:
int
Optional unsigned integer with number of samples to take in case the dataset is very large. It is only applied if the number is smaller than the data’s size.
- qtype:
- num_observations() Iterable[tuple[str, str, int]] [source]
Returns the number of observations for each quantity type in each context.
- Return type:
Iterable[tuple[str, str, int]]
The first element is the context, the second the quantity type, and the third is the number of observations.- Returns:
Returns an iterable generator.
- has_sufficient_observations(raise_if_not: bool = True) bool [source]
Helper method to check whether each quantity type in each context has at least two observations.
- raise_if_not:
bool
If set to True, will raise an exception instead of returning False in case there are insufficiently many observations. The exception is more informative as it includes the the context and quantity type.
- Return type:
bool
- Returns:
A boolean indicating whether this Dataset has sufficiently many observations for each and every quantity type.
- raise_if_not:
- static transform(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], dist_transform: ~metrics_as_scores.distribution.distribution.DistTransform = DistTransform.NONE, continuous_value: bool = True) tuple[float, nptyping.ndarray.NDArray] [source]
Transforms a distribution using an ideal value. The resulting data, therefore, is a distribution of distances from the designated ideal value.
Given a distribution \(X\) and an ideal value \(i\), the distribution of distances is defined as \(D=\left|X-i\right|\).
- data:
NDArray[Shape["*"], Float]
1-D array of float data, the data to be transformed. The data may also hold integers (or floats that are practically integers).
- dist_transform:
DistTransform
The transform to apply. If
DistTransform.NONE
, the data is returned as is,None
as the transform value. Any of the other transforms are determined from the data (see notes).- continuous_value:
bool
Whether or not the to be determined ideal value should be continuous or not. For example, if using the expectation (mean) as transform, even for a discrete distribution, this is likely to be a float. Setting
continuous_value
toFalse
will round the found mean to the nearest integer, such that the resulting distribution \(D\) is of integral nature, too.
- Return type:
tuple[float, NDArray[Shape["*"], Float]]
- Returns:
A tuple holding the applied transform value (if the chosen transform was not
DistTransform.NONE
) and the array of distances.
Notes
The expectation (mean), in the continuous case, is determined by estimating a Gaussian kernel using
gaussian_kde
, and then integrating it usingDensity.practical_domain()
. In the discrete case, we use the rounded mean of the data. Mode and median are similarly computed in the continuous and discrete cases, except for the discrete mode we usescipy.stats.mode()
. Supremum and infimum are simply computed (and rounded in the discrete case) from the data.- data:
- analyze_groups(use: Literal['anova', 'kruskal'], qtypes: Iterable[str], contexts: Iterable[str], unique_vals: bool = True) DataFrame [source]
For each given type of quantity, this method performs an ANOVA across all given contexts.
- use:
Literal['anova', 'kruskal']
Indicates which method for comparing groups to use. We can either conduct an ANOVA or a Kruskal-Wallis test.
- qtypes:
Iterable[str]
An iterable of quantity types to conduct the analysis for. For each given type, a separate analysis is performed and the result appended to the returned data frame.
- contexts:
Iterable[str]
An iterable of contexts across which each of the quantity types shall be analyzed.
- unique_vals:
bool
Passed to
self.data()
. If true, than small, random, and unique noise is added to the data before it is analyzed. This will effectively deduplicate any samples in the data (if any).
- Return type:
pd.DataFrame
- Returns:
A data frame with the columns
qtype
(name of the quantity type),stat
(ANOVA test statistic),pval
, andacross_contexts
, where the latter is a semicolon-separated list of contexts the quantity type was compared across.
- use:
- analyze_TukeyHSD(qtypes: Iterable[str]) DataFrame [source]
Calculate all pairwise comparisons for the given quantity types with Tukey’s Honest Significance Test (HSD) and return the confidence intervals. For each type of quantity, this method performs all of its associated contexts pairwise comparisons. For example, given a quantity \(Q\) and its contexts \(C_1,C_2,C_3\), this method will examine the pairs \(\left[\{C_1,C_2\},\{C_1,C_3\},\{C_2,C_3\}\right]\). For a single type of quantity, e.g., this test is useful to understand how different the quantity manifests across contexts. For multiple quantities, it also allows understanding how contexts distinguish from one another, holistically.
- qtypes:
Iterable[str]
An iterable of quantity types to conduct the analysis for. For each given type, a separate analysis is performed and the result appended to the returned data frame.
- Return type:
pd.DataFrame
- Returns:
A data frame with the columns
group1
,group2
,meandiff
,p-adj
,lower
,upper
, andreject
. For details seestatsmodels.stats.multicomp.pairwise_tukeyhsd()
.
- qtypes:
- analyze_distr(qtypes: Iterable[str], use_ks_2samp: bool = True, ks2_max_samples=40000) DataFrame [source]
Performs the two-sample Kolmogorov–Smirnov test or Welch’s t-test for two or more types of quantity. Performs the test for all unique pairs of quantity types.
- qtypes:
Iterable[str]
An iterable of quantity types to test in a pair-wise manner.
- use_ks_2samp:
bool
If True, use the two-sample Kolmogorov–Smirnov; Welch’s t-test, otherwise.
- ks2_max_samples:
int
Unsigned integer used to limit the number of samples used in KS2-test. For larger numbers than the default, it may not be possible to compute it exactly.
- Return type:
pd.DataFrame
- Returns:
A data frame with columns qtype, stat, pval, group1, and group2.
- qtypes:
metrics_as_scores.distribution.fitting module
This module is concerned with fitting distributions to data. It supports both, discrete and continuous distributions. Fitting is done by a common helper that unifies the way either type of distribution is fitted.
- metrics_as_scores.distribution.fitting.Continuous_RVs: list[scipy.stats._distn_infrastructure.rv_continuous] = [<scipy.stats._continuous_distns.alpha_gen object>, <scipy.stats._continuous_distns.anglit_gen object>, <scipy.stats._continuous_distns.arcsine_gen object>, <scipy.stats._continuous_distns.argus_gen object>, <scipy.stats._continuous_distns.beta_gen object>, <scipy.stats._continuous_distns.betaprime_gen object>, <scipy.stats._continuous_distns.bradford_gen object>, <scipy.stats._continuous_distns.burr_gen object>, <scipy.stats._continuous_distns.burr12_gen object>, <scipy.stats._continuous_distns.cauchy_gen object>, <scipy.stats._continuous_distns.chi_gen object>, <scipy.stats._continuous_distns.chi2_gen object>, <scipy.stats._continuous_distns.cosine_gen object>, <scipy.stats._continuous_distns.crystalball_gen object>, <scipy.stats._continuous_distns.dgamma_gen object>, <scipy.stats._continuous_distns.dweibull_gen object>, <scipy.stats._continuous_distns.erlang_gen object>, <scipy.stats._continuous_distns.expon_gen object>, <scipy.stats._continuous_distns.exponnorm_gen object>, <scipy.stats._continuous_distns.exponpow_gen object>, <scipy.stats._continuous_distns.exponweib_gen object>, <scipy.stats._continuous_distns.f_gen object>, <scipy.stats._continuous_distns.fatiguelife_gen object>, <scipy.stats._continuous_distns.fisk_gen object>, <scipy.stats._continuous_distns.foldcauchy_gen object>, <scipy.stats._continuous_distns.foldnorm_gen object>, <scipy.stats._continuous_distns.gamma_gen object>, <scipy.stats._continuous_distns.gausshyper_gen object>, <scipy.stats._continuous_distns.genexpon_gen object>, <scipy.stats._continuous_distns.genextreme_gen object>, <scipy.stats._continuous_distns.gengamma_gen object>, <scipy.stats._continuous_distns.genhalflogistic_gen object>, <scipy.stats._continuous_distns.genhyperbolic_gen object>, <scipy.stats._continuous_distns.geninvgauss_gen object>, <scipy.stats._continuous_distns.genlogistic_gen object>, <scipy.stats._continuous_distns.gennorm_gen object>, <scipy.stats._continuous_distns.genpareto_gen object>, <scipy.stats._continuous_distns.gibrat_gen object>, <scipy.stats._continuous_distns.gilbrat_gen object>, <scipy.stats._continuous_distns.gompertz_gen object>, <scipy.stats._continuous_distns.gumbel_l_gen object>, <scipy.stats._continuous_distns.gumbel_r_gen object>, <scipy.stats._continuous_distns.halfcauchy_gen object>, <scipy.stats._continuous_distns.halfgennorm_gen object>, <scipy.stats._continuous_distns.halflogistic_gen object>, <scipy.stats._continuous_distns.halfnorm_gen object>, <scipy.stats._continuous_distns.hypsecant_gen object>, <scipy.stats._continuous_distns.invgamma_gen object>, <scipy.stats._continuous_distns.invgauss_gen object>, <scipy.stats._continuous_distns.invweibull_gen object>, <scipy.stats._continuous_distns.johnsonsb_gen object>, <scipy.stats._continuous_distns.johnsonsu_gen object>, <scipy.stats._continuous_distns.kappa3_gen object>, <scipy.stats._continuous_distns.kappa4_gen object>, <scipy.stats._continuous_distns.ksone_gen object>, <scipy.stats._continuous_distns.kstwo_gen object>, <scipy.stats._continuous_distns.kstwobign_gen object>, <scipy.stats._continuous_distns.laplace_gen object>, <scipy.stats._continuous_distns.laplace_asymmetric_gen object>, <scipy.stats._continuous_distns.levy_gen object>, <scipy.stats._continuous_distns.levy_l_gen object>, <scipy.stats._continuous_distns.loggamma_gen object>, <scipy.stats._continuous_distns.logistic_gen object>, <scipy.stats._continuous_distns.loglaplace_gen object>, <scipy.stats._continuous_distns.lognorm_gen object>, <scipy.stats._continuous_distns.reciprocal_gen object>, <scipy.stats._continuous_distns.lomax_gen object>, <scipy.stats._continuous_distns.maxwell_gen object>, <scipy.stats._continuous_distns.mielke_gen object>, <scipy.stats._continuous_distns.moyal_gen object>, <scipy.stats._continuous_distns.nakagami_gen object>, <scipy.stats._continuous_distns.ncf_gen object>, <scipy.stats._continuous_distns.nct_gen object>, <scipy.stats._continuous_distns.ncx2_gen object>, <scipy.stats._continuous_distns.norm_gen object>, <scipy.stats._continuous_distns.norminvgauss_gen object>, <scipy.stats._continuous_distns.pareto_gen object>, <scipy.stats._continuous_distns.pearson3_gen object>, <scipy.stats._continuous_distns.powerlaw_gen object>, <scipy.stats._continuous_distns.powerlognorm_gen object>, <scipy.stats._continuous_distns.powernorm_gen object>, <scipy.stats._continuous_distns.rayleigh_gen object>, <scipy.stats._continuous_distns.rdist_gen object>, <scipy.stats._continuous_distns.recipinvgauss_gen object>, <scipy.stats._continuous_distns.reciprocal_gen object>, <scipy.stats._continuous_distns.rice_gen object>, <scipy.stats._continuous_distns.semicircular_gen object>, <scipy.stats._continuous_distns.skewcauchy_gen object>, <scipy.stats._continuous_distns.skew_norm_gen object>, <scipy.stats._continuous_distns.studentized_range_gen object>, <scipy.stats._continuous_distns.t_gen object>, <scipy.stats._continuous_distns.trapezoid_gen object>, <scipy.stats._continuous_distns.trapezoid_gen object>, <scipy.stats._continuous_distns.triang_gen object>, <scipy.stats._continuous_distns.truncexpon_gen object>, <scipy.stats._continuous_distns.truncnorm_gen object>, <scipy.stats._continuous_distns.truncpareto_gen object>, <scipy.stats._continuous_distns.truncweibull_min_gen object>, <scipy.stats._continuous_distns.tukeylambda_gen object>, <scipy.stats._continuous_distns.uniform_gen object>, <scipy.stats._continuous_distns.vonmises_gen object>, <scipy.stats._continuous_distns.vonmises_gen object>, <scipy.stats._continuous_distns.wald_gen object>, <scipy.stats._continuous_distns.weibull_max_gen object>, <scipy.stats._continuous_distns.weibull_min_gen object>, <scipy.stats._continuous_distns.wrapcauchy_gen object>]
List of continuous random variables that are supported by scipy.stats. Note this list is a list of instances, rather than types.
- metrics_as_scores.distribution.fitting.Discrete_RVs: list[scipy.stats._distn_infrastructure.rv_discrete] = [<scipy.stats._discrete_distns.bernoulli_gen object>, <scipy.stats._discrete_distns.betabinom_gen object>, <scipy.stats._discrete_distns.binom_gen object>, <scipy.stats._discrete_distns.boltzmann_gen object>, <scipy.stats._discrete_distns.dlaplace_gen object>, <scipy.stats._discrete_distns.geom_gen object>, <scipy.stats._discrete_distns.hypergeom_gen object>, <scipy.stats._discrete_distns.logser_gen object>, <scipy.stats._discrete_distns.nbinom_gen object>, <scipy.stats._discrete_distns.nchypergeom_fisher_gen object>, <scipy.stats._discrete_distns.nchypergeom_wallenius_gen object>, <scipy.stats._discrete_distns.nhypergeom_gen object>, <scipy.stats._discrete_distns.planck_gen object>, <scipy.stats._discrete_distns.poisson_gen object>, <scipy.stats._discrete_distns.randint_gen object>, <scipy.stats._discrete_distns.skellam_gen object>, <scipy.stats._discrete_distns.yulesimon_gen object>, <scipy.stats._discrete_distns.zipf_gen object>, <scipy.stats._discrete_distns.zipfian_gen object>]
List of discrete random variables that are supported by scipy.stats. Note this list is a list of instances, rather than types.
- metrics_as_scores.distribution.fitting.Discrete_Problems: dict[str, type[metrics_as_scores.distribution.fitting_problems.MixedVariableDistributionFittingProblem]] = {'bernoulli_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_bernoulli_gen'>, 'betabinom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_betabinom_gen'>, 'binom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_binom_gen'>, 'boltzmann_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_boltzmann_gen'>, 'dlaplace_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_dlaplace_gen'>, 'geom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_geom_gen'>, 'hypergeom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_hypergeom_gen'>, 'logser_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_logser_gen'>, 'nbinom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nbinom_gen'>, 'nchypergeom_fisher_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_fisher_gen'>, 'nchypergeom_wallenius_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_wallenius_gen'>, 'nhypergeom_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_nhypergeom_gen'>, 'planck_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_planck_gen'>, 'poisson_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_poisson_gen'>, 'randint_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_randint_gen'>, 'skellam_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_skellam_gen'>, 'yulesimon_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_yulesimon_gen'>, 'zipf_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_zipf_gen'>, 'zipfian_gen': <class 'metrics_as_scores.distribution.fitting_problems.Fit_zipfian_gen'>}
List of fitting problems used by pymoo for fitting discrete distributions. Metrics As Scores only supports fitting of discrete random variables through pymoo for random variables that have a corresponding problem defined. However, many, if not most, are covered. In case a problem is missing, the ordinary fitter can be used (which uses differential evolution).
- class metrics_as_scores.distribution.fitting.DesignSpaceTerminationFixed(tol=0.005, **kwargs)[source]
Bases:
DesignSpaceTermination
- class metrics_as_scores.distribution.fitting.SingleObjectiveTermination(xtol=1e-08, cvtol=1e-08, ftol=1e-06, period=75, max_time: int = 600, **kwargs)[source]
Bases:
DefaultTermination
This class is used as termination criterion for the
FitterPymoo
.
- class metrics_as_scores.distribution.fitting.Fitter(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]])[source]
Bases:
object
This class is used as a generic approach to fit any random variable to any kind of data. However, it is recommended to use the more stable
FitterPymoo
.- Practical_Ranges = {'bernoulli_gen': {'p': [0.0, 1.0]}, 'betabinom_gen': {'a': [5e-308, 1000.0], 'b': [5e-308, 1000.0], 'n': [0, 5000]}, 'binom_gen': {'n': [1, 25000], 'p': [0.0, 1.0]}, 'boltzmann_gen': {'N': [1, 25000], 'lambda': [0.0, 100000.0]}, 'dlaplace_gen': {'a': [5e-308, 10000.0]}, 'geom_gen': {'p': [0.0, 1.0]}, 'hypergeom_gen': {'M': [1, 25000], 'N': [0, 25000], 'n': [0, 25000]}, 'logser_gen': {'p': [0.0, 1.0]}, 'nbinom_gen': {'n': [0, 25000], 'p': [0.0, 1.0]}, 'nchypergeom_fisher_gen': {'M': [0, 25000], 'N': [0, 25000], 'n': [0, 25000], 'odds': [5e-308, 10000.0]}, 'nchypergeom_wallenius_gen': {'M': [0, 25000], 'N': [0, 25000], 'n': [0, 25000], 'odds': [5e-308, 10000.0]}, 'nhypergeom_gen': {'M': [0, 25000], 'n': [0, 25000], 'r': [0, 25000]}, 'planck_gen': {'lambda': [5e-308, 100.0]}, 'poisson_gen': {'mu': [0.0, 1000000.0]}, 'randint_gen': {'high': [-25000, 25000], 'low': [-25000, 25000]}, 'skellam_gen': {'mu1': [5e-308, 5000.0], 'mu2': [5e-308, 5000.0]}, 'yulesimon_gen': {'alpha': [5e-308, 20000.0]}, 'zipf_gen': {'a': [1.000000000001, 20000.0]}, 'zipfian_gen': {'a': [0.0, 20000.0], 'n': [0, 25000]}}
A dictionary of practical bounds for the parameters of discrete distributions. It is used by the
Fitter
when using differential evolution to optimize the fit of a distribution. Note that theFitterPymoo
does not use these. Instead, it relies on separate problems that are defined for each discrete random variable.
- __init__(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]]) None [source]
- dist:
type[Union[rv_continuous, rv_discrete]]
Specify the class of the random variable you want to fit.
- dist:
- property is_discrete: bool
Shortcut getter to return whether the used random variable is discrete.
- property is_continuous: bool
Shortcut getter to return whether the used random variable is continuous.
- fit(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs) dict[str, Union[float, int]] [source]
Convenience method to fit the random variable. If it is continuous, calls
rv_continuous.fit()
, which uses maximum likelihood estimation. If it is discrete, uses differential evolution to find an estimate.- data:
NDArray[Shape["*"], Float]
The data to fit to.
- Returns:
A dictionary with named parameters and their values.
- data:
- class metrics_as_scores.distribution.fitting.FitterPymoo(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]])[source]
Bases:
Fitter
This class inherits from
Fitter
and is the modern successor to it. This is because it usespymoo
to fit discrete random variables, together with a set of specially defined fitting problems.- __init__(dist: type[Union[scipy.stats._distn_infrastructure.rv_continuous, scipy.stats._distn_infrastructure.rv_discrete]]) None [source]
- dist:
type[Union[rv_continuous, rv_discrete]]
Specify the class of the random variable you want to fit.
- dist:
- fit(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], max_samples=10000, minimize_seeds=[1337, 3735928559, 45640321], verbose: bool = True, stop_after_first_res: bool = True) dict[str, Union[float, int]] [source]
Fits the random variable to the given data. For continuous data, calls
Fitter.fit()
. For discrete random variables, however, it solves a mixed-variable problem using a genetic algorithm. The documentation below apply to fitting a discrete random variable.- data:
NDArray[Shape["*"], Float]
The 1-D data to fit the random variable to.
- max_samples:
int
Used to deterministically sub-sample the data, should it be longer than this.
- minimize_seeds:
list[int]
A list of integer seeds. For each, the optimization is run once, and the result with smallest solution is returned.
- verbose:
bool
Passed to
minimize()
.- stop_after_first_res:
bool
Whether to stop after the first successful minimization. If True, then no further fits for the remaining seeds are computed. This is the default as optimization usually succeeds and subsequent successful runs are rarely or only insignificantly better.
- Raises:
Exception: If the optimization does not find a single solution.
- Return type:
dict[str, Union[float, int]]
- Returns:
A dictionary with parameter names and values of the best found solution.
- data:
- class metrics_as_scores.distribution.fitting.TestJson[source]
Bases:
TypedDict
Used for serializing a
StatisticalTest
to JSON. This class represents the result of a single test.- pval: float
- stat: float
- class metrics_as_scores.distribution.fitting.StatisticalTestJson[source]
Bases:
TypedDict
Used for serializing a
StatisticalTest
to JSON. This class represents a set of tests and their results.- tests: dict[str, metrics_as_scores.distribution.fitting.TestJson]
- discrete_data1: bool
- discrete_data2: bool
- class metrics_as_scores.distribution.fitting.StatisticalTest(data1: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], cdf: ~typing.Callable[[~typing.Union[float, int]], float], ppf_or_data2: ~typing.Union[~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ~typing.Callable[[~typing.Union[float, int]], float]], data2_num_samples: ~typing.Optional[int] = None, method='auto', stat_tests=[<function cramervonmises>, <function cramervonmises_2samp>, <function ks_1samp>, <function ks_2samp>, <function epps_singleton_2samp>], max_samples: int = 10000)[source]
Bases:
object
This class is used to conduct various statistical goodness-of-fit tests. Since not every test is always applicable (e.g., the Kolmogorov–Smirnov test is not applicable to continuous data), a variety of tests are conducted and the most suitable test (and its results) are then typically selected at runtime.
- __init__(data1: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], cdf: ~typing.Callable[[~typing.Union[float, int]], float], ppf_or_data2: ~typing.Union[~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], ~typing.Callable[[~typing.Union[float, int]], float]], data2_num_samples: ~typing.Optional[int] = None, method='auto', stat_tests=[<function cramervonmises>, <function cramervonmises_2samp>, <function ks_1samp>, <function ks_2samp>, <function epps_singleton_2samp>], max_samples: int = 10000) None [source]
Initializes a new StatisticalTest. Works with real- and integer-valued data. For the latter, additional tests including a deterministic jitter are performed, as these are often more representative.
- data1:
NDArray[Shape["*"], Float]
A 1-D array of the data, the first sample.
- cdf:
Callable[[Union[float, int]], float]
A CDF used to determine if data1 comes from it. Used in the one-sample tests.
- ppf_or_data2:
Union[NDArray[Shape["*"], Float], Callable[[Union[float, int]], float]]
Either a second 1-D array of the second sample, or a PPF that can be used to generate a second sample by inverse sampling.
- data2_num_samples:
int
The number of samples to draw for the second sample if ppf_or_data2 is a PPF.
- method:
str
Passed to the statistical tests that support a method-parameter.
- stat_tests:
[cramervonmises, cramervonmises_2samp, ks_1samp, ks_2samp, epps_singleton_2samp]
List of statistical tests to conduct.
- max_samples:
int
The maximum amount of data points in either sample. Also limits data2_num_samples.
- data1:
metrics_as_scores.distribution.fitting_problems module
This module contains pymoo
fitting problems that allow fitting
distributions to almost arbitrary discrete data. The discrete random
variables in scipy
do not have a fit()
-method, as their fitting
often requires a global search. Also, many distributions require
discrete parameters, or a mixture of real and integer parameters.
The problems in this module provide a generalized way for pymoo to
find parameters for all of scipy
’s discrete random variables.
- class metrics_as_scores.distribution.fitting_problems.MixedVariableDistributionFittingProblem(dist: ~scipy.stats._distn_infrastructure.rv_discrete, data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], vars: dict[str, pymoo.core.variable.Variable], n_ieq_constr: int = 0, **kwargs)[source]
Bases:
ElementwiseProblem
This is the base class for fitting all of
scipy
’s discrete random variables. Therefore, it accepts a dictionary of parameters for each distribution to find optimal values for.- __init__(dist: ~scipy.stats._distn_infrastructure.rv_discrete, data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], vars: dict[str, pymoo.core.variable.Variable], n_ieq_constr: int = 0, **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- _evaluate(X, out, *args, **kwargs) dict [source]
This is an internal method that evaluates the discrete random variable’s negative log likelihood, given the currently set values for all of its variables (stored in
X
). This method is called bypymoo
, so be sure to check out their references, too. Note that theX
-dictionary is used to build \(\theta\), the vector of parameters for the random variable. The order of the parameters in that vector depends on the order ofself.vars
. This method usually does not need to be overridden, except for when, e.g., it is required to evaluate (in-)equality constraints (that is, whenever something else than ‘F’ in theout
-dictionary must be accessed).- X:
dict[str, Any]
The (ordered) dictionary with the variables’ names and values.
- out:
dict[str, Any]
A dictionary used by
pymoo
to store results in; e.g., in ‘F’ it stores the result of the evaluation, and in ‘G’ it stores the inequality constraints’ values.
- Returns:
Returns the
out
-dictionary. However, the dictionary is accessed by reference, so this method does not have to return anything.- Return type:
dict[Any,Any]
- X:
- class metrics_as_scores.distribution.fitting_problems.Fit_bernoulli_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Bernoulli distribution using a
pymoo
problem. It usesscipy
’sbernoulli_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):p
: (int
) \(\left[0,1\right]\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_betabinom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Beta-Binomial distribution using a
pymoo
problem. It usesscipy
’sbetabinom_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):n
: (int
) \(\left(0,1e^{4}\right)\)a
: (float
) \(\left(5e^{-308},1e^3\right)\)b
: (float
) \(\left(5e^{-308},1e^3\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_binom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Binomial distribution using a
pymoo
problem. It usesscipy
’sbinom_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):n
: (int
) \(\left(1,25e^{3}\right)\)p
: (float
) \(\left(0,1\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_boltzmann_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Boltzman distribution using a
pymoo
problem. It usesscipy
’sboltzmann_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):lambda
[\(\lambda\)]: (float
) \(\left(0,1e^{5}\right)\)N
: (int
) \(\left(1,25e^3\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_dlaplace_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Laplacian distribution using a
pymoo
problem. It usesscipy
’sdlaplace_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):a
: (float
) \(\left(5e^{-308},1e^4\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_geom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Geometric distribution using a
pymoo
problem. It usesscipy
’sgeom_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):p
: (float
) \(\left(0,1\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_hypergeom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Hypergeometric distribution using a
pymoo
problem. It usesscipy
’shypergeom_gen
as base distribution.Notes
This problem does override _evaluate() and has four inequality constraints. These are:
\(n\geq0\) (or \(-n\leq0\))
\(N\geq0\) (or \(-N\leq0\))
\(n\leq M\) (or \(n-M\leq0\))
\(N\leq M\) (or \(N-M\leq0\))
Calls the super constructor with these variables (in this order):
M
: (int
) \(\left(1,25e^{3}\right)\)n
: (int
) \(\left(0,25e^{3}\right)\)N
: (int
) \(\left(0,25e^{3}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- _evaluate(X, out, *args, **kwargs) dict [source]
Overridden to evaluate the inequality constraints, too. For all other documentaion, check out
MixedVariableDistributionFittingProblem._evaluate()
.
- class metrics_as_scores.distribution.fitting_problems.Fit_logser_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Logarithmic Series distribution using a
pymoo
problem. It usesscipy
’slogser_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):p
: (float
) \(\left(0,1\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_nbinom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Negative Binomial Series distribution using a
pymoo
problem. It usesscipy
’snbinom_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):n
: (int
) \(\left(0,25e^{3}\right)\)p
: (float
) \(\left[0,1\right]\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_fisher_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Fisher’s Non-central Hypergeometric distribution using a
pymoo
problem. It usesscipy
’snchypergeom_fisher_gen
as base distribution.Notes
This problem does override _evaluate() and has four inequality constraints. These are:
\(N\leq M\) (or \(N-M\leq0\))
\(n\leq M\) (or \(n-M\leq0\))
\(\max{(\text{data})}\leq N\) (or \(\max{(\text{data})}-N\leq0\))
\(\max{(\text{data})}\leq n\) (or \(\max{(\text{data})}-n\leq0\))
Calls the super constructor with these variables (in this order; note that \(k=\)
data.size
):M
: (int
) \(\left(1,5\times k\right)\)n
: (int
) \(\left(1,5\times k\right)\)N
: (int
) \(\left(1,5\times k\right)\)odds
: (float
) \(\left(5e^{-308},1e^{4}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- _evaluate(X, out, *args, **kwargs) dict [source]
Overridden to evaluate the inequality constraints, too. For all other documentaion, check out
MixedVariableDistributionFittingProblem._evaluate()
.
- class metrics_as_scores.distribution.fitting_problems.Fit_nchypergeom_wallenius_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
Fit_nchypergeom_fisher_gen
This class allows to fit the generalized Wallenius Non-central Hypergeometric distribution using a
pymoo
problem. It usesscipy
’snchypergeom_wallenius_gen
as base distribution.Notes
This distribution has the same parameters and constraints as the nchypergeom_fisher_gen, which is implemented by the problem
Fit_nchypergeom_fisher_gen
(which this problem inherits from directly).- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_nhypergeom_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Negative Hypergeometric distribution using a
pymoo
problem. It usesscipy
’snhypergeom_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):M
: (int
) \(\left(0,25e^3\right)\)n
: (int
) \(\left(0,25e^3\right)\)r
: (int
) \(\left(0,25e^3\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_planck_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Planck distribution using a
pymoo
problem. It usesscipy
’splanck_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):p
: (float
) \(\left(5e^{-308},1e^2\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_poisson_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Poisson distribution using a
pymoo
problem. It usesscipy
’spoisson_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):mu
[\(\mu\)]: (float
) \(\left(0,1e^{6}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_randint_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Uniform distribution using a
pymoo
problem. It usesscipy
’srandint_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):low
: (int
) \(\left(-25e^{3},25e^{3}\right)\)high
: (int
) \(\left(-25e^{3},25e^{3}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_skellam_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Skellam distribution using a
pymoo
problem. It usesscipy
’sskellam_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):mu1
[\(\mu_1\)]: (float
) \(\left(5e^{-308},5e^{3}\right)\)mu2
[\(\mu_2\)]: (float
) \(\left(5e^{-308},5e^{3}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_yulesimon_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Yule–Simon distribution using a
pymoo
problem. It usesscipy
’syulesimon_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):alpha
[\(\alpha\)]: (float
) \(\left(5e^{-308},2e^{4}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_zipf_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Zipf (Zeta) distribution using a
pymoo
problem. It usesscipy
’szipf_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):a
: (float
) \(\left(1+1e^{-12},2e^{4}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
- class metrics_as_scores.distribution.fitting_problems.Fit_zipfian_gen(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Bases:
MixedVariableDistributionFittingProblem
This class allows to fit the generalized Zipfian distribution using a
pymoo
problem. It usesscipy
’szipfian_gen
as base distribution.Notes
Does not override
MixedVariableDistributionFittingProblem._evaluate()
and does not have any (in-)equality constraints. Calls the super constructor with these variables (in this order):a
: (float
) \(\left(0,2e^{4}\right)\)n
: (int
) \(\left(0,25e^{3}\right)\)
- __init__(data: ~nptyping.ndarray.NDArray[~nptyping.base_meta_classes.Shape[*], ~numpy.float64], **kwargs)[source]
Constructor for a fitting any discrete random variable with one or more parameters that can be of any type as supported by
pymoo.core.variable.Variable
(e.g.,Integer
,Real
, etc.).- Parameters:
dist (
rv_discrete
) – An instance of the concrete discrete random variable that should be fit to the data.vars (
dict[str, Variable]
) – An ordered dictionary of named variables to optimize. These must correspond one to one with the variable names of those defined for the random variable.data (
NDArray[Shape['*'], Float]
) – The data the distribution should be fit to.n_ieq_constr (
int
) – Number of inequality constraints. If there are any, then the problem also overridesProblem._evaluate()
and sets values for each constraint.
Module contents
This package holds main functionality for representing data and features as densities (e.g., discrete/continuous, parametric/empirical, etc.). It also contains a list of fitting problems and the fitters themselves.