pyproteonet.imputation.high_level_api.impute_molecule

pyproteonet.imputation.high_level_api.impute_molecule(dataset: Dataset, molecule: str, column: str, methods: List[str] | None = None, result_columns: List[str] | None = None, mnar_percentile: float = 1, knn_k: int = 5, measure_runtime: bool = True)

Imputes missing values in a specific molecule and column of a dataset using various imputation methods. Currently supported methods are:

Method String

Details

mindet

MinDet imputation (see min_det_impute())

minprob

MinProb imputation (see min_prob_impute())

mean

Mean imputation across samples (see across_sample_aggregate_impute() with method argument set to “mean”)

bpca

BPCA imputation (see pca_methods() with method parameter set to “bpca”)

bpca_t

BPCA imputation on transpose data (see pca_methods() with method argument set to “bpca” and molecules_as_variables set to True)

missforest

MissForest imputation (see random_forest_impute())

missforest_t

MissForest imputation on transpose data (see miss_forest_impute() with molecule_as_variables argument set to True)

knn

KNN imputation (see knn_impute())

isvd

ISVD imputation (see iterative_svd_impute())

iterative

Iterative imputation from scikit-learn (see iterative_impute())

dae

Denoising Autoencoder imputation (see auto_encoder_impute() with model_type argument set to “DAE”)

vae

Variational Autoencoder imputation (see auto_encoder_impute() with model_type argument set to “VAE”)

cf

Collaborative Filtering imputation (see collaborative_filtering_impute())

Parameters:
  • dataset (Dataset) – The dataset containing the values to be imputed.

  • molecule (str) – The molecule for which missing values will be imputed.

  • column (str) – The column in the dataset corresponding to the molecule.

  • methods (Optional[List[str]], optional) – List of imputation methods to be used. Defaults to None.

  • result_columns (Optional[List[str]], optional) – List of column names to store the imputed values. Defaults to None.

  • mnar_percentile (float, optional) – Percentile value for missing not at random (MNAR) imputation methods. Defaults to 1.

  • knn_k (int, optional) – Number of nearest neighbors to consider for k-nearest neighbors (KNN) imputation. Defaults to 5.

Raises:

ValueError – If the number of methods and result columns is not the same.

Returns:

None