pyproteonet.imputation.r.miss_forest.miss_forest_impute

pyproteonet.imputation.r.miss_forest.miss_forest_impute(dataset: Dataset, molecule: str, column: str, result_column: str | None = None, molecules_as_variables: bool = True, ntree=100, **kwds)

Impute using the MissForest method as implemented by the missForest R package which uses a random forest for missing value prediction.

Parameters:
  • dataset (Dataset) – Dataset to impute.

  • molecule (str) – Molecule type to impute (e.g. protein, peptide etc.).

  • column (str) – Name of the value column to impute.

  • result_column (Optional[str], optional) – If given, name of the value column to store the imputed values in. Defaults to None.

  • molecules_as_variables (bool, optional) – Whether to transpose the input matrix before imputation (treating molecules instead of samples as variables for the random forest). Defaults to True.

  • ntree (int, optional) – Number of trees to use for the random forest. Defaults to 100.

Returns:

The imputed values.

Return type:

pd.Series