pyproteonet.metrics.differential_expression.evaluate_des

pyproteonet.metrics.differential_expression.evaluate_des(dataset: Dataset, molecule: str, columns: str | List[str], numerator_samples: List[str], denominator_samples: List[str], gt_fc: Series, min_fc: float = 1.5, max_pvalue: float = 0.05, is_log: bool = False, absolute_metrics: bool = False) DataFrame
Compares the results of finding differentially expressed molecule to known ground troth differential expressiosn according to a ground truth fold change.

Evaluation is done with respect to Precision, Recall, Specificity, Accuracy, FP Rate and F1 Score (or corresponding absolute metrics if absolute_metrics is True).

Parameters:
  • dataset (Dataset) – The dataset to find differentially expressed molecules in.

  • molecule (str) – The molecule type to find differentially expressed molecules for.

  • columns (Union[str, List[str]]) – The value column(s) containing the abundance values of potentially differentially expressed molecules.

  • nominator_samples (List[str]) – List of samples names to use as nominator when computing fold change.

  • denominator_samples (List[str]) – List of samples names to use as denominator when computing fold change.

  • gt_fc (pd.Series) – Ground truth fold change for each molecule.

  • min_fc (int, optional) – Minimum fold change required to be considered as differentially expressed. Works for both increase and decrease in abundance (e.g. a min. fold change of 2 results in both a fold change of 2 and 0.5 being considered as potentially differentially expressed.). Defaults to 2.

  • max_pvalue (float, optional) – P value to as significance threshold . Defaults to 0.05.

  • is_log (bool, optional) – Whether the column values are logarithmized. Defaults to False.

  • absolute_metrics (bool, optional) – Whether to return absolute numbers of correctly/incorrectly found DEs or whether to use relative metrics (Precision, Recall …). Defaults to False.

Returns:

The evaluation results according to the calculated metrics.

Return type:

pd.DataFrame