pyproteonet.imputation.high_level_api.impute_molecule

pyproteonet.imputation.high_level_api.impute_molecule(dataset: Dataset, molecule: str, column: str, methods: List[str] | None = None, result_columns: List[str] | None = None, mnar_percentile: float = 1, knn_k: int = 5, measure_runtime: bool = True)

Imputes missing values in a specific molecule and column of a dataset using various imputation methods. Currently supported methods are:

Method String	Details
mindet	MinDet imputation (see `min_det_impute()`)
minprob	MinProb imputation (see `min_prob_impute()`)
mean	Mean imputation across samples (see `across_sample_aggregate_impute()` with method argument set to “mean”)
bpca	BPCA imputation (see `pca_methods()` with method parameter set to “bpca”)
bpca_t	BPCA imputation on transpose data (see `pca_methods()` with method argument set to “bpca” and molecules_as_variables set to True)
missforest	MissForest imputation (see `random_forest_impute()`)
missforest_t	MissForest imputation on transpose data (see `miss_forest_impute()` with molecule_as_variables argument set to True)
knn	KNN imputation (see `knn_impute()`)
isvd	ISVD imputation (see `iterative_svd_impute()`)
iterative	Iterative imputation from scikit-learn (see `iterative_impute()`)
dae	Denoising Autoencoder imputation (see `auto_encoder_impute()` with model_type argument set to “DAE”)
vae	Variational Autoencoder imputation (see `auto_encoder_impute()` with model_type argument set to “VAE”)
cf	Collaborative Filtering imputation (see `collaborative_filtering_impute()`)

Parameters:

dataset (Dataset) – The dataset containing the values to be imputed.
molecule (str) – The molecule for which missing values will be imputed.
column (str) – The column in the dataset corresponding to the molecule.
methods (Optional[List[str]], optional) – List of imputation methods to be used. Defaults to None.
result_columns (Optional[List[str]], optional) – List of column names to store the imputed values. Defaults to None.
mnar_percentile (float, optional) – Percentile value for missing not at random (MNAR) imputation methods. Defaults to 1.
knn_k (int, optional) – Number of nearest neighbors to consider for k-nearest neighbors (KNN) imputation. Defaults to 5.

Raises:

ValueError – If the number of methods and result columns is not the same.

Returns:

None