metrics_as_scores.cli package

Submodules

metrics_as_scores.cli.BundleOwn module

This module contains the workflow for bundling own datasets.

class metrics_as_scores.cli.BundleOwn.BundleDatasetWorkflow[source]

Bases: Workflow

This workflow bundles a manually created dataset into a single Zip file that can be uploaded to, e.g., Zenodo, and registered with Metrics As Scores in order to make it available to others as a known dataset.

In order for an own dataset to be publishable, it needs to have all parametric fits, generated densities, references, about, etc. This workflow will check that all requirements are fulfilled.

For an example dataset, check out https://doi.org/10.5281/zenodo.7633950

__init__() → None[source]

bundle() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.Cli module

This is the main entry point for the command line interface (the text user interface, TUI) of Metrics As Scores. It provides access to a set of workflows for handling data and running the Web Application.

metrics_as_scores.cli.Cli.cli()[source]: Main routine for the command line interface. It runs the main menu in a never-terminating loop (except for when the user presses Ctr+c).

metrics_as_scores.cli.CreateDataset module

This module contains the workflow for creating own datasets.

class metrics_as_scores.cli.CreateDataset.CreateDatasetWorkflow(manifest: Optional[LocalDataset] = None, org_df: Optional[DataFrame] = None)[source]

Bases: Workflow

This workflow creates an own, local dataset from a single data source that can be read by Pandas from file or URL. The original dataset needs three columns: For a single sample, one column holds the numeric observation, one holds the ordinal type (name of feature), and one holds the group associated with it. A dataset can hold one or more features, but should hold at least two groups in order to compare distributions of a single sample type across groups.

This workflow creates the entire dataset: The manifest JSON, the parametric fits, the pre-generated distributions. When done, the dataset is installed, such that it can be discovered and used by the local Web-Application. If you wish to publish your dataset so that it can be used by others, start the Bundle-workflow from the previous menu afterwards.

__init__(manifest: Optional[LocalDataset] = None, org_df: Optional[DataFrame] = None) → None[source]

property dataset_dir: Path

property fits_dir: Path

property densities_dir: Path

property tests_dir: Path

property web_dir: Path

property path_manifest: Path

property path_df_data: Path

property path_test_ANOVA: Path

property path_test_KruskalWallis: Path

property path_test_TukeyHSD: Path

property path_test_ks2samp: Path

create_own() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.Download module

This module contains the workflow for downloading known datasets.

class metrics_as_scores.cli.Download.DownloadWorkflow[source]

Bases: Workflow

This workflow access a curated list of known datasets that can be used with Metrics As Scores. With this workflow, a known dataset can be downloaded and installed as a local dataset. Use the workflow for listing the known datasets and then enter the ID here.

Known datasets are loaded from: https://raw.githubusercontent.com/MrShoenel/metrics-as-scores/master/src/metrics_as_scores/datasets/known-datasets.json

__init__() → None[source]

download() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.FitParametric module

This module contains the workflow for fitting parametric distributions to features of own datasets.

class metrics_as_scores.cli.FitParametric.FitParametricWorkflow[source]

Bases: Workflow

This workflow fits distributions to an existing dataset. For each feature, and for each group, a large number of random variables are fit, and a number of statistical tests are carried out such that the best-fitting distribution may be selected/used. Regardless of whether a quantity is continuous or discrete, many continuous random variables are attempted to fit. If a quantity is discrete, however, an additional set of discrete random variables is attempted to fit. Especially the latter might be extraordinarily expensive.

Therefore, you may only select a subset of random variables that you want to attempt to fit. However, if you intend to share your dataset and make it available to others, then you should include and attempt to fit all distributions.

The following process, once begun, will save the result of fitting a single feature (from within a single group) as a separate file. If the file already exists, no new fit is attempted. This is so that this process can be interrupted and resumed.

__init__() → None[source]

fit_parametric() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.GenerateDensities module

This module contains the workflow for generating densities from own datasets. These are required so that the own dataset can be used in the web application.

class metrics_as_scores.cli.GenerateDensities.GenerateDensitiesWorkflow[source]

Bases: Workflow

This workflow generates density-related functions that are used by the Web Application. While those can be large, generating them on-the-fly is usually not possible in acceptable time. Using pre-generated functions is a trade-off between space and user experience, where we sacrifice the former as it is cheaper.

For each feature and each group, we pre-generate functions for the probability density (PDF), the cumulative distribution (CDF) and its complement (CCDF), as well as the quantile (or percent point) function (PPF). So for one feature and one group, we pre-generate one density that unites those four functions.

There are 5 primary classes of densities: Parametric, Parametric_discrete, Empirical, Empirical_discrete, and KDE_approx. Please refer to the documentation for details about these. Generating parametric densities uses the computed fits from another workflow, is cheap, fast, and does not consume much space. KDE_approx makes excessive use of oversampling (by design), which can result in large files. The empirical densities’ size corresponds to the size of the dataset you are using (although there is a high limit beyond which sampling will be applied).

__init__() → None[source]

pre_generate() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.KnownDatasets module

This module contains the workflow for listing known datasets that are available online and may be downloaded.

class metrics_as_scores.cli.KnownDatasets.KnownDatasetsWorkflow[source]

Bases: Workflow

This workflow access a curated online list of known datasets that were designed to work with Metrics As Scores. The list accessed is at:

https://raw.githubusercontent.com/MrShoenel/metrics-as-scores/master/src/metrics_as_scores/datasets/known-datasets.json

If you would like to have your own dataset added to this list, open an issue in the Github repository of Metrics As Scores (also, check out the contributing guidelines).

__init__() → None[source]

show_datasets() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.LocalDatasets module

This module contains the workflow for listing locally available datasets. This list includes downloaded and manually created datasets.

class metrics_as_scores.cli.LocalDatasets.LocalDatasetsWorkflow[source]

Bases: Workflow

This workflow lists all locally available datasets. This includes downloaded and installed datasets, as well as manually created datasets.

__init__() → None[source]

show_datasets() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.LocalWebserver module

This module contains the workflow for running the interactive web application locally.

class metrics_as_scores.cli.LocalWebserver.LocalWebserverWorkflow[source]

Bases: Workflow

This workflow allows you to locally run the interactive web application of Metrics As Scores, using one of the locally available datasets.

__init__() → None[source]

start_server_process() → None[source]

start_server_internally() → None[source]: Start an embedded Bokeh web server with the Metrics As Scores application. This is an experimental feature. Its intended purpose is for development and debugging, so use at your own risk. The embedded application server may make Metrics As Scores and its text-based user-interface unresponsive. Therefore, you may have to manually kill and re- start the process.

start_server() → None[source]: Main entry point for this workflow.

metrics_as_scores.cli.MainWorkflow module

This module contains the main workflow (the main menu) that grants access to all other workflows.

class metrics_as_scores.cli.MainWorkflow.MainWorkflow[source]

Bases: Workflow

The main workflow of the CLI is the main menu of the textual user interface. It provides access to all other workflows.

__init__() → None[source]

print_welcome() → None[source]

main_menu() → Workflow[source]: Show the main menu of the CLI.

metrics_as_scores.cli.Workflow module

This module contains the base class for all workflows.

class metrics_as_scores.cli.Workflow.Workflow[source]

Bases: object

This is the base class for all workflows. It features a few common methods that we use in the derived workflows.

__init__() → None[source]

ask(options: list[str], prompt: str = 'You now have the following options:', rtype: ~metrics_as_scores.cli.Workflow.T = <class 'int'>) → T[source]

Common method to ask for a selection of options among a list of choices. Options are indexed starting from 0. If the chosen return type is int, the index is returned; the option as a string, otherwise.

options: list[str]: A list of options to choose (select) from.
prompt: str: The prompt shown to the user.

Return type:: T Returns either int (the index) or the option itself (of any type)
Returns:: The index of the chosen option or the chosen option itself.

askt(options: list[tuple[str, T]], prompt: str = 'You now have the following options:') → T[source]

Wrapper around ask() that can use any type associated with an option.

options: list[tuple[str, T]]: The options, the text to show and the associated value for each
prompt: str: The prompt shown to the user.

Return type:: T
Returns:: Returns the selected option’s associated value.

print_info(text_normal: str, text_vital: Optional[str] = None, end: Optional[str] = None, arrow: str = ' -> ') → None[source]

Used to print an info that consists of a normal text (without extra styles) and a vital text that has some extra styling applied to emphasize it.

text_normal: str: The text that does not have extra styling
text_vital: str: The text with extra styling for emphasis.
end: str: The string to print at the end of the info.
arrow: str: The string to print at the beginning of the info.

metrics_as_scores.cli.helpers module

This module contains constants and helper functionss that are commonly used in any of the CLI workflows.

metrics_as_scores.cli.helpers.KNOWN_DATASETS_FILE = 'https://raw.githubusercontent.com/MrShoenel/metrics-as-scores/master/src/metrics_as_scores/datasets/known-datasets.json': This is the URL to the curated list of available datasets to be used with Metrics As Scores.

metrics_as_scores.cli.helpers.isint(s: str) → bool[source]

Attempts to convert the string to an integer.

s: str: The string to check.

Return type:: bool
Returns:: True if the string can be converted to an int; False, otherwise.

metrics_as_scores.cli.helpers.isnumeric(s: Union[str, int, float]) → bool[source]

Attempts to convert a string to a float to check whether it is numeric. This is not the same as str::isnumeric(), as this method essentially checks whether s contains something that looks like a number (int, float, scientific notation, etc).

s: str: The string to check.

Return type:: bool
Returns:: True if the string is numeric; False, otherwise.

metrics_as_scores.cli.helpers.get_local_datasets() → Iterable[LocalDataset][source]

Opens the dataset directory and looks for locally available datasets. Locally available means datasets that were installed or created manually. A dataset is only considered to be locally available if it has a manifest.json.

Return type:: Iterable[LocalDataset]

metrics_as_scores.cli.helpers.required_files_folders_local_dataset(local_ds_id: str) → tuple[list[pathlib.Path], list[pathlib.Path]][source]

For a given LocalDataset, returns lists of directories and files that must be present in order for the local dataset to be valid. These lists of directories and files must be checked when, e.g., the dataset is bundled, or when the web application is instructed to use a local dataset.

local_ds_id: str: The ID of a local dataset to check files for.

rtype: tuple[list[Path], list[Path]]

Returns:: A list of paths to required folders and a list of paths to required files.

class metrics_as_scores.cli.helpers.PathStatus(value)[source]

Bases: StrEnum

This is an enumeration of statuses for Path objects that point to directories or files.

OK = 'OK': The file or directory exists and is of correct type.

DOESNT_EXIST = 'Does not exist': The file or directory does not exist.

NOT_A_DIRECTORY = 'Not a directory': The Path exists, but is not a directory.

NOT_A_FILE = 'Not a file': The Path exists, but is not a file.

metrics_as_scores.cli.helpers.validate_local_dataset_files(dirs: list[pathlib.Path], files: list[pathlib.Path]) → tuple[dict[pathlib.Path, metrics_as_scores.cli.helpers.PathStatus], dict[pathlib.Path, metrics_as_scores.cli.helpers.PathStatus]][source]

Takes two lists, one of paths of directories, and one of paths to files of a local dataset from required_files_folders_local_dataset(). Then for each item on each list, checks whether it exists and is of correct type, then associates a PathStatus with each.

dirs: list[Path]: A list of paths to directories needed in a local dataset.
files: list[Path]: A list of paths to files needed in a local dataset.

rtype: tuple[dict[Path, PathStatus], dict[Path, PathStatus]]

Returns:: Transforms either list into a dictionary using the original path as key and a PathStatus as value. Then returns both dictionaries.

metrics_as_scores.cli.helpers.get_known_datasets(use_local_file: bool = False) → list[metrics_as_scores.distribution.distribution.KnownDataset][source]

Reads the file KNOWN_DATASETS_FILE to obtain a list of known datasets.

use_local_file: bool: If true, will attempt to read the known datasets from a local file, instead of the online file. This is only used during development.

Return type:: list[KnownDataset]

metrics_as_scores.cli.helpers.format_file_size(num_bytes: int, digits: int = 2) → str[source]

Formats bytes into a string with a suffix for bytes, kilobite, etc. The number of bytes in the prefix is less than 1000 and may be rounded to two decimals. For example: 780 B, 1.22 KB, 43 GB, etc. Does NOT use SI-2 suffixes as that would be non-sensical (e.g., what is 1.27 GiB?).

num_bytes: int: Unsigned integer with amount of bytes.
digits: int: The number of digits for rounding. If set to 0, the rounded value is cast to integer.

Returns:: The size in bytes, formatted.

Module contents

This package contains the text-based user interface of Metrics As Scores. It is implemented as a command line interface (CLI). The main menu consists of numerous workflows for the user to choose from. Please refer to each workflow individually to understand what it does.