pyproteonet.imputation.dnn.autoencoder.auto_encoder_impute

pyproteonet.imputation.dnn.autoencoder.auto_encoder_impute(dataset: Dataset, molecule: str, column: str, result_column: str | None = None, validation_fraction: float = 0.1, batch_size: int = 26, model_type: Literal['VAE', 'DAE'] = 'DAE', hidden_layer_dimensions: List[int] = [512], latent_dimension: int = 50, cuda: bool | None = None) DataFrame

Impute missing values using an autoencoder. Implementation based on PIMMS (https://github.com/RasmussenLab/pimms)

Parameters:
  • dataset (Dataset) – Dataset to imputed.

  • molecule (str) – Molecule type to impute (e.g. “protein” or “peptide”).

  • column (str) – Value column to impute.

  • result_column (Optional[str], optional) – Value column to score the results in. Defaults to None.

  • validation_fraction (float, optional) – Fraction of non-missing values used as validation set. Defaults to 0.1.

  • batch_size (int, optional) – Batch size for training and prediction. Defaults to 26.

  • model_type (Literal["VAE", "DAE"], optional) – “VAE” to use a variational autoencoder “DAE” to use a denoising autoencoder. Defaults to “DAE”.

  • hidden_layer_dimensions (List[int], optional) – Size of the hidden neurall network layer. Defaults to [512].

  • latent_dimension (int, optional) – Size of the latent representation used for encoding/decoding each sammples. Defaults to 50.

  • cuda (Optional[bool], optional) – Whether to run on the gpu/cuda. If not given cuda is chosen if available. Defaults to None.

Returns:

the imputed values.

Return type:

pd.Series