pyproteonet.simulation.protein_peptide.simulate_protein_peptide_dataset

pyproteonet.simulation.protein_peptide.simulate_protein_peptide_dataset(molecule_set: MoleculeSet, mapping: str, samples: int | List[str] = 10, log_abundance_mu: float = 10, log_abundance_sigma: float = 2, log_protein_error_sigma: float = 0.3, log_peptide_error_sigma: float = 0, simulate_flyability: bool = True, flyability_alpha: float = 5, flyability_beta: float = 2.5, peptide_noise_mu: float = 0, peptide_noise_sigma: float = 100, peptide_poisson_error: bool = True, condition_samples: List[float | List[str]] = [], condition_affected: List[float | int | Iterable] = [], log2_condition_means: List[float] = [], log2_condition_stds: List[float] = [], protein_column: str = 'abundance_gt', peptide_column: str = 'abundance', protein_molecule: str = 'protein', peptide_molecule: str = 'peptide', calculate_peptide_gt: bool = True, peptide_gt_column: str | None = None, random_seed: Generator | int | None = None, print_parameters: bool = False) Dataset

High-level wrapper for the simulation of a protein-pepide dataset wrapping multiple simulation steps.

Details about single steps can be found in the corresponding simulation functions.

Parameters:
  • molecule_set (MoleculeSet) – Underlying molecule set specifying the proteins and peptides as well as the mapping between them.

  • mapping (str) – MoleculeSet mapping that defines the protein to peptide relation.

  • samples (Union[int, List[str]], optional) – How many samples to generate. Defaults to 10.

  • log_abundance_mu (float, optional) – Mean of protein abundance in log space. Defaults to 10.

  • log_abundance_sigma (float, optional) – Standard deviation of protein abundance in log space. Defaults to 2.

  • log_protein_error_sigma (float, optional) – Standard deviation of normal distributed, zero centered protein error in log space. Defaults to 0.3. log_peptide_error_sigma (float, optional): Standard deviation of 0 centered, normal peptide error in log space. Defaults to 0.

  • simulate_flyability (bool, optional) – Whether every peptide should have a simulated flyability, defining which fraction of the peptide abundance is measured. Defaults to True.

  • flyability_alpha (float, optional) – Alpha parameter of beta distribution used to sample peptide flyability values. Defaults to 5.

  • flyability_beta (float, optional) – Beta prameter of beta distribution used to sample peptide flyability values. Defaults to 2.5.

  • peptide_noise_mu (float, optional) – Mean of normal distributed positive noise term to peptide abundances. To assure the noise is always positive the absolute value of the value sampled from the normal distribution is taken. Attention, NOT in log space. Defaults to 0.

  • peptide_noise_sigma (float, optional) – Standard deviation of positive peptide noise normal distribution (see above). Attention, NOT in log space. Defaults to 100.

  • peptide_poisson_error (bool, optional) – Whether to sample the final peptide abundance from a poisson distribution centered at the computed abundance value to simulate counting effects when measuring peptides. Defaults to True.

  • condition_samples (List[Union[float, List[str]]], optional) – Condition groups. If given as list of values in the [0,1] interval every value describes the fraction of samples affected by the condition. If given as list of lists of strings, every list of strings represents a condition group and the strings are the names of the samples affected by this condition group. Defaults to [].

  • condition_affected (List[Union[float, int, Iterable]], optional) – List of condition affected proteins. If a list of floats/ints is given every value is interpreted as the fraction/absolute number of condition affected proteins and those are sampled randomly. If a list of iterables if given (e.g. a list of pandas Series) every iterable is interpreted as the protein indices of proteins affected in the corresponding condition group. Defaults to [].

  • log2_condition_means (List[float], optional) – List of mean values for the condition factor distributions for each condition group. Defaults to [].

  • log2_condition_stds (List[float], optional) – List of standard deviatoin values for the condition factor distributions for each condition group. Defaults to [].

  • protein_column (str, optional) – Column to write ground truth protein values to. Defaults to “abundance_gt”.

  • peptide_column (str, optional) – Column to write peptide values to. Defaults to “abundance”.

  • protein_molecule (str, optional) – Molecule name used for protein molecule type in the MoleculeSet. Defaults to “protein”.

  • peptide_molecule (str, optional) – Molecule name of the peptide molecule type in the MoleculeSet. Defaults to “peptide”.

  • random_seed (Optional[Union[int, float]], optional) – Random seed to use for random value generation. Defaults to None.

Returns:

_description_

Return type:

Dataset