pyproteonet.imputation.dnn.gnn.heterogeneous.impute_heterogeneous_gnn

pyproteonet.imputation.dnn.gnn.heterogeneous.impute_heterogeneous_gnn(dataset: Dataset, molecule: str, column: str, mapping: str, partner_column: str, molecule_result_column: str | None = None, molecule_uncertainty_column: str | None = None, partner_result_column: str | None = None, partner_uncertainty_column: str | None = None, max_epochs: int = 5000, training_fraction: float = 0.25, train_sample_wise: bool = False, log_every_n_steps: int | None = None, early_stopping_patience: int = 7, logger: Logger | None = None, epoch_size: int = 30, missing_substitute_value: int = -2) Series

Impute missing values using a heterogeneous graph neural network applied on the molecule graph created from two molecule types like proteins and their assigned peptides.

Parameters:
  • dataset (Dataset) – The dataset to impute.

  • molecule (str) – The main molecule type to impute (e.g. “protein”).

  • column (str) – The value column of the main molecule type to impute (e.g. “abundance”).

  • mapping (str) – The name of the mapping, connecting the main molecule type with a partner molecule type (e.g. “protein-peptide”).

  • partner_column (str) – The value column of the partner molecule type to impute.

  • molecule_result_column (Optional[str], optional) – If given imputed values for the molecule are stored under this name. Defaults to None.

  • molecule_uncertainty_column (Optional[str], optional) – If given predicted uncertainty values for the main molecule are stored under this name. Defaults to None.

  • partner_result_column (Optional[str], optional) – If given imputed values for the partner molecule are stored under this name. Defaults to None.

  • partner_uncertainty_column (Optional[str], optional) – If given predicted uncertainty values for the main molecule are stored under this name. Defaults to None.

  • max_epochs (int, optional) – Maximum number of training epochs. Defaults to 5000.

  • training_fraction (float, optional) – Mean fraction of molecules masked during training (The masking fraction for every epoch is randomly drawn from the (0.5 * training_fraction, 1.5 * training_fraction) interval). Defaults to 0.25.

  • train_sample_wise (bool, optional) – Whether a training step operates only on a single sample or the whole dataset. Defaults to False.

  • log_every_n_steps (Optional[int], optional) – How often to log during training. Defaults to None.

  • early_stopping_patience (int, optional) – Number of epochs after which the training is stopped if the training loss does not improve. Defaults to 7.

  • logger (Optional[Logger], optional) – The lightning logger used for logging. If not given logs will be printed to consose. Defaults to None.

  • epoch_size (int, optional) – Number of training runs on the dataset that make up an epoch. Defaults to 30.

  • missing_substitute_value (float, optional) – Value to replace missing or masked values with. Defaults to -3.

Returns:

the imputed values.

Return type:

pd.Series