pyproteonet.imputation.high_level_api.impute_molecule
- pyproteonet.imputation.high_level_api.impute_molecule(dataset: Dataset, molecule: str, column: str, methods: List[str] | None = None, result_columns: List[str] | None = None, mnar_percentile: float = 1, knn_k: int = 5, measure_runtime: bool = True)
Imputes missing values in a specific molecule and column of a dataset using various imputation methods. Currently supported methods are:
Method String
Details
mindet
MinDet imputation (see
min_det_impute())minprob
MinProb imputation (see
min_prob_impute())mean
Mean imputation across samples (see
across_sample_aggregate_impute()with method argument set to “mean”)bpca
BPCA imputation (see
pca_methods()with method parameter set to “bpca”)bpca_t
BPCA imputation on transpose data (see
pca_methods()with method argument set to “bpca” and molecules_as_variables set to True)missforest
MissForest imputation (see
random_forest_impute())missforest_t
MissForest imputation on transpose data (see
miss_forest_impute()with molecule_as_variables argument set to True)knn
KNN imputation (see
knn_impute())isvd
ISVD imputation (see
iterative_svd_impute())iterative
Iterative imputation from scikit-learn (see
iterative_impute())dae
Denoising Autoencoder imputation (see
auto_encoder_impute()with model_type argument set to “DAE”)vae
Variational Autoencoder imputation (see
auto_encoder_impute()with model_type argument set to “VAE”)cf
Collaborative Filtering imputation (see
collaborative_filtering_impute())- Parameters:
dataset (Dataset) – The dataset containing the values to be imputed.
molecule (str) – The molecule for which missing values will be imputed.
column (str) – The column in the dataset corresponding to the molecule.
methods (Optional[List[str]], optional) – List of imputation methods to be used. Defaults to None.
result_columns (Optional[List[str]], optional) – List of column names to store the imputed values. Defaults to None.
mnar_percentile (float, optional) – Percentile value for missing not at random (MNAR) imputation methods. Defaults to 1.
knn_k (int, optional) – Number of nearest neighbors to consider for k-nearest neighbors (KNN) imputation. Defaults to 5.
- Raises:
ValueError – If the number of methods and result columns is not the same.
- Returns:
None