pyproteonet.data.dataset_sample.DatasetSample
- class pyproteonet.data.dataset_sample.DatasetSample(dataset: Dataset, values: Dict[str, DataFrame], name: str)
Representing a dataset samples holding a set of values for every molecule. Can be thought of as a dictionary of pandas dataframes with one dataframe for each molecule.
- __init__(dataset: Dataset, values: Dict[str, DataFrame], name: str)
Create a dataset samples holding a set of values for every molecule
- Parameters:
dataset (Dataset) – The dataset this sample belongs to.
values (Dict[str, pd.DataFrame]) – Values for every molecule in the dataset.
name (str) – Name of the sample.
Methods
__init__(dataset, values, name)Create a dataset samples holding a set of values for every molecule
apply(fn, *args, **kwargs)Applies a function to the dataset sample.
copy([columns, molecule_ids])Creates a copy of the dataset sample.
get_index_for(molecule_type)returns the index of molecule ids for the given molecule type
get_node_values_for_graph(graph[, ...])Returns the values for the given graph.
missing_mask(molecule[, column])Returns a boolean mask indicating which values are missing for the given molecule and column.
missing_molecules(molecule[, column])Returns all molecules of the given molecule type that are missing for the given column.
non_missing_mask(molecule[, column])Returns a boolean mask indicating which values are non-missing for the given molecule and column.
non_missing_molecules(molecule[, column])Returns all molecules of the given molecule type that are not missing for the given column.
plot_hist([bins])Plots a histogram of the values for every molecule type.
Attributes
gene_mappingmissing_label_valuemissing_valuemolecule_setmolecules- apply(fn: Callable, *args, **kwargs) object
Applies a function to the dataset sample. Only exists to match the interface of the Dataset class.
- Parameters:
fn (Callable) – the function to apply
- Returns:
the result of the function
- Return type:
object
- copy(columns: Iterable[str] | Dict[str, str | Iterable[str]] | None = None, molecule_ids: Dict[str, Index] = {}) DatasetSample
Creates a copy of the dataset sample.
- Parameters:
columns (Optional[ Union[Iterable[str], Dict[str, Union[str, Iterable[str]]]] ], optional) – Columns to copy. When given as list of strings the same columns are copied for every molecule, when given as dictionary the key specific columns can be specific per molecule type. Defaults to None.
molecule_ids (Dict[str, pd.Index], optional) – Dictionay specifying for every molecule type the molecule ids that will be copied. If a molecule type is not part of the dictionary all molecule ids will be copied for this molecule type. Defaults to {}.
- Returns:
A copy of the dataset sample.
- Return type:
DatasetSamples
- get_index_for(molecule_type: str) Index
returns the index of molecule ids for the given molecule type
- Parameters:
molecule_type (str) – The molecule type to get the index for
- Returns:
The index of molecule ids for the given molecule type
- Return type:
pd.Index
- get_node_values_for_graph(graph: MoleculeGraph, include_id_and_type: bool = True) DataFrame
Returns the values for the given graph.
- Parameters:
graph (MoleculeGraph) – the graph to get the values for
include_id_and_type (bool, optional) – Whether to include the molecule ids and molecule type into the result. Defaults to True.
- Returns:
the values for the given graph
- Return type:
pd.DataFrame
- missing_mask(molecule: str, column: str = 'abundance') ndarray
Returns a boolean mask indicating which values are missing for the given molecule and column.
- Parameters:
molecule (str) – the molecule type (e.g. ‘protein’ or ‘peptide’)
column (str, optional) – the value column. Defaults to “abundance”.
- Returns:
the boolean mask indicating which values are missing for the given molecule and column.
- Return type:
np.ndarray
- missing_molecules(molecule: str, column: str = 'abundance') DataFrame
Returns all molecules of the given molecule type that are missing for the given column.
- Parameters:
molecule (str) – the molecule type (e.g. ‘protein’ or ‘peptide’)
column (str, optional) – the value column. Defaults to “abundance”.
- Returns:
the dataframe containing the missing molecules and their additional information for the given molecule and column.
- Return type:
pd.DataFrame
- non_missing_mask(molecule: str, column: str = 'abundance')
Returns a boolean mask indicating which values are non-missing for the given molecule and column.
- Parameters:
molecule (str) – the molecule type (e.g. ‘protein’ or ‘peptide’)
column (str, optional) – the value column. Defaults to “abundance”.
- Returns:
the boolean mask indicating which values are non-missing for the given molecule and column.
- Return type:
np.ndarray
- non_missing_molecules(molecule: str, column: str = 'abundance')
Returns all molecules of the given molecule type that are not missing for the given column.
- Parameters:
molecule (str) – the molecule type (e.g. ‘protein’ or ‘peptide’)
column (str, optional) – the value column. Defaults to “abundance”.
- Returns:
the dataframe containing the non-missing molecules and their additional information for the given molecule and column.
- Return type:
pd.DataFrame
- plot_hist(bins: List[float] | str = 'auto')
Plots a histogram of the values for every molecule type.
- Parameters:
bins (str, optional) – The bins for the histogram (passed to seaborn.histplot). Defaults to “auto”.