Dokumentation (english)

Precision-Recall Curve

Evaluate model performance on imbalanced datasets by visualizing precision vs recall trade-offs

Use me when your dataset has far more negatives than positives — and a good-looking ROC curve is hiding how badly your model actually performs on the rare class you care about. I'm the honest evaluation tool for fraud detection, rare disease screening, anomaly detection, and any other problem where positives are precious and the baseline is easy to beat by simply predicting "no" every time.

Overview

A Precision-Recall (PR) curve plots Precision (the fraction of positive predictions that are actually positive) on the Y axis against Recall (the fraction of actual positives that were correctly found) on the X axis, sweeping across every possible classification threshold. Each point on the curve answers: "If I set my threshold here, how confident can I be in each positive prediction, and what proportion of real positives am I catching?"

The Average Precision (AP) summarises the curve as the weighted mean of precisions at each threshold, equivalent to the area under the PR curve:

  • High AP (near 1.0) — model maintains high precision even at high recall; strong separator
  • Low AP (near class prevalence) — model barely outperforms a random guesser that always predicts positive

The baseline for a PR curve is a horizontal line at the class prevalence (e.g., y = 0.1 if 10% of samples are positive). Any useful classifier must sit consistently above this line.

Requires a trained model. This plot belongs to the evaluation category and uses training data. You must have a trained model node upstream in your pipeline before this plot can be generated.

Best used for:

  • Evaluating classifiers on datasets with rare positive classes
  • Choosing a threshold that balances precision against recall for your specific cost structure
  • Comparing models in settings where false positives and false negatives have very different business costs
  • Diagnosing whether a high ROC AUC is masking poor positive-class performance
  • Communicating trade-offs between catching more cases vs. avoiding false alarms

Common Use Cases

Imbalanced Classification

  • Fraud detection — fraudulent transactions may be < 0.1% of all transactions; ROC AUC can be high even when the model misses most fraud. AP reveals the real story.
  • Medical screening for rare conditions — catching every positive case (high recall) may be mandatory, but generating too many false referrals (low precision) wastes clinical resources.
  • Anomaly detection in manufacturing — defects are rare; precision tells you how many alarms require actual investigation.
  • Content moderation — spam, abuse, or harmful content is a small fraction of all content; PR curves help tune the trade-off between over- and under-blocking.

Threshold Selection

Different applications have very different costs for false positives vs. false negatives. The PR curve lets you visualise every possible threshold and pick the operating point that fits your cost structure before deployment.

Model Comparison

Plot multiple models' PR curves on the same axes. The model with a curve consistently closer to the top-right corner — and a higher AP — better handles the positive class across all thresholds.

Settings

Show Average Precision

Optional — Display the AP (area under the PR curve) score in the legend.

Default: On

When enabled, the AP score is appended to the trace name in the legend (e.g., PR Curve (AP = 0.74)). AP is a threshold-independent, single-number summary of positive-class performance and is the standard reporting metric for PR curve quality.

Interpreting the Precision-Recall Curve

Reading the Curve

Top-right corner (ideal): A model that achieves both high precision and high recall simultaneously sits near the top-right corner. In practice, there is always a trade-off — as recall increases (lower threshold), precision typically drops.

Baseline (horizontal dashed line): The baseline represents a random classifier that scores every example with the same constant. Its precision equals the class prevalence (e.g., 0.2 if 20% of samples are positive). Any curve above this line beats random chance; the higher above it, the better.

Steep drop-off: A curve that holds high precision until moderate recall, then drops sharply, indicates the model is highly confident about its top predictions but struggles to find the remaining positive cases without collecting many false positives.

The Precision-Recall Trade-off

Changing the classification threshold moves you along the PR curve in opposite directions:

Threshold directionRecallPrecision
Lower threshold (predict positive more often)IncreasesTends to decrease
Higher threshold (predict positive less often)DecreasesTends to increase

This fundamental trade-off means you cannot simply maximise both — you must choose based on which type of error is more costly in your application.

Choosing the Optimal Threshold

  1. Equal-cost criterion — pick the point closest to (Recall=1, Precision=1), the ideal corner.
  2. Precision-priority (few false alarms) — move to higher threshold; accept lower recall.
  3. Recall-priority (catch everything) — move to lower threshold; accept lower precision.
  4. F1-score maximum — find the point on the curve that maximises 2 × (P × R) / (P + R).

Average Precision as a Single Number

AP is computed as the weighted mean of precisions at each recall level, weighted by the change in recall between consecutive thresholds. It is equivalent to the area under the PR curve and ranges from the class prevalence (worst) to 1.0 (best). AP is preferred over ROC AUC when reporting results on imbalanced benchmarks.

PR Curve vs ROC Curve

SituationPrefer
Balanced classesROC Curve
Highly imbalanced classes (rare positives)PR Curve
You care about both classes equallyROC Curve
You mostly care about the positive classPR Curve
Comparing across datasets with different prevalencePR Curve

The key insight: on imbalanced data, a model can achieve high ROC AUC simply because it correctly classifies the abundant negative class. The PR curve ignores true negatives entirely, so it cannot be gamed this way.

Tips for Effective Use

  1. Always show the baseline — the horizontal line at class prevalence is your anchor. A curve that barely rises above it signals a model that has learned very little about the positive class.

  2. Report AP, not just the curve — stakeholders need a single number for model comparison. AP is the standard choice for imbalanced evaluation benchmarks.

  3. Cross-check with the ROC curve — if ROC AUC is high but AP is low, your model is doing well on the majority class but poorly on the minority class you care about.

  4. Use the F1-score to pick a threshold — for balanced precision/recall importance, the F1-maximum point on the PR curve is a principled default threshold.

  5. Consider class-weighted AP for multi-class problems — macro-average AP weights all classes equally; micro-average AP weights by class frequency. Choose based on whether rare classes matter as much as common ones.

  6. Combine with the Confusion Matrix — once you select a threshold from the PR curve, validate TP/FP/FN counts in the Confusion Matrix to ensure the threshold performs as expected in absolute terms.

  • ROC Curve — evaluates classifier quality across thresholds; more optimistic on imbalanced data; use alongside PR for a complete picture
  • Confusion Matrix — shows the full error breakdown at a single chosen threshold
  • SHAP Feature Impact — explains which features drive the classifier's positive-class predictions
  • SHAP Dependence Plot — examines how individual feature values push predictions toward or away from the positive class

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern
STRG + BSidepanel umschalten

Software-Details
Kompiliert vor etwa 3 Stunden
Release: v4.0.0-production
Buildnummer: master@4f04153
Historie: 70 Items