pyproteonet.masking.missing_values.mask_non_missing

pyproteonet.masking.missing_values.mask_non_missing(dataset: Dataset, molecule: str, column: str, ids: Index | None = None, frac: float | None = None, random_seed: Generator | int | None = None)

Masks non-missing values in a dataset for specified molecule and column.

Parameters:
  • dataset (Dataset) – The dataset to mask.

  • molecule (str) – The molecule to consider.

  • column (str) – The column to consider.

  • ids (Optional[pd.Index], optional) – If given only mask molecules with those ids. Defaults to None.

  • frac (Optional[float], optional) – If given only masks this fraction of all molecule valid for masking. Defaults to None.

  • random_seed (Optional[Union[int, np.random.Generator]], optional) – The random seed for reproducibility. Defaults to None.

Returns:

The masked dataset.

Return type:

MaskedDataset