Precision-Recall Curve

Evaluate model performance on imbalanced datasets by visualizing precision vs recall trade-offs

Use me when your dataset has far more negatives than positives — and a good-looking ROC curve is hiding how badly your model actually performs on the rare class you care about. I'm the honest evaluation tool for fraud detection, rare disease screening, anomaly detection, and any other problem where positives are precious and the baseline is easy to beat by simply predicting "no" every time.

Overview

A Precision-Recall (PR) curve plots Precision (the fraction of positive predictions that are actually positive) on the Y axis against Recall (the fraction of actual positives that were correctly found) on the X axis, sweeping across every possible classification threshold. Each point on the curve answers: "If I set my threshold here, how confident can I be in each positive prediction, and what proportion of real positives am I catching?"

The Average Precision (AP) summarises the curve as the weighted mean of precisions at each threshold, equivalent to the area under the PR curve:

High AP (near 1.0) — model maintains high precision even at high recall; strong separator
Low AP (near class prevalence) — model barely outperforms a random guesser that always predicts positive

The baseline for a PR curve is a horizontal line at the class prevalence (e.g., y = 0.1 if 10% of samples are positive). Any useful classifier must sit consistently above this line.

Requires a trained model. This plot belongs to the evaluation category and uses training data. You must have a trained model node upstream in your pipeline before this plot can be generated.

Best used for:

Evaluating classifiers on datasets with rare positive classes
Choosing a threshold that balances precision against recall for your specific cost structure
Comparing models in settings where false positives and false negatives have very different business costs
Diagnosing whether a high ROC AUC is masking poor positive-class performance
Communicating trade-offs between catching more cases vs. avoiding false alarms

Common Use Cases

Imbalanced Classification

Fraud detection — fraudulent transactions may be < 0.1% of all transactions; ROC AUC can be high even when the model misses most fraud. AP reveals the real story.
Medical screening for rare conditions — catching every positive case (high recall) may be mandatory, but generating too many false referrals (low precision) wastes clinical resources.
Anomaly detection in manufacturing — defects are rare; precision tells you how many alarms require actual investigation.
Content moderation — spam, abuse, or harmful content is a small fraction of all content; PR curves help tune the trade-off between over- and under-blocking.

Threshold Selection

Different applications have very different costs for false positives vs. false negatives. The PR curve lets you visualise every possible threshold and pick the operating point that fits your cost structure before deployment.

Model Comparison

Plot multiple models' PR curves on the same axes. The model with a curve consistently closer to the top-right corner — and a higher AP — better handles the positive class across all thresholds.

Settings

Show Average Precision

Optional — Display the AP (area under the PR curve) score in the legend.

Default: On

When enabled, the AP score is appended to the trace name in the legend (e.g., PR Curve (AP = 0.74)). AP is a threshold-independent, single-number summary of positive-class performance and is the standard reporting metric for PR curve quality.

Interpreting the Precision-Recall Curve

Reading the Curve

Top-right corner (ideal): A model that achieves both high precision and high recall simultaneously sits near the top-right corner. In practice, there is always a trade-off — as recall increases (lower threshold), precision typically drops.

Baseline (horizontal dashed line): The baseline represents a random classifier that scores every example with the same constant. Its precision equals the class prevalence (e.g., 0.2 if 20% of samples are positive). Any curve above this line beats random chance; the higher above it, the better.

Steep drop-off: A curve that holds high precision until moderate recall, then drops sharply, indicates the model is highly confident about its top predictions but struggles to find the remaining positive cases without collecting many false positives.

The Precision-Recall Trade-off

Changing the classification threshold moves you along the PR curve in opposite directions:

Threshold direction	Recall	Precision
Lower threshold (predict positive more often)	Increases	Tends to decrease
Higher threshold (predict positive less often)	Decreases	Tends to increase

This fundamental trade-off means you cannot simply maximise both — you must choose based on which type of error is more costly in your application.

Choosing the Optimal Threshold

Equal-cost criterion — pick the point closest to (Recall=1, Precision=1), the ideal corner.
Precision-priority (few false alarms) — move to higher threshold; accept lower recall.
Recall-priority (catch everything) — move to lower threshold; accept lower precision.
F1-score maximum — find the point on the curve that maximises 2 × (P × R) / (P + R).

Average Precision as a Single Number

AP is computed as the weighted mean of precisions at each recall level, weighted by the change in recall between consecutive thresholds. It is equivalent to the area under the PR curve and ranges from the class prevalence (worst) to 1.0 (best). AP is preferred over ROC AUC when reporting results on imbalanced benchmarks.

PR Curve vs ROC Curve

Situation	Prefer
Balanced classes	ROC Curve
Highly imbalanced classes (rare positives)	PR Curve
You care about both classes equally	ROC Curve
You mostly care about the positive class	PR Curve
Comparing across datasets with different prevalence	PR Curve

The key insight: on imbalanced data, a model can achieve high ROC AUC simply because it correctly classifies the abundant negative class. The PR curve ignores true negatives entirely, so it cannot be gamed this way.

Tips for Effective Use

Always show the baseline — the horizontal line at class prevalence is your anchor. A curve that barely rises above it signals a model that has learned very little about the positive class.
Report AP, not just the curve — stakeholders need a single number for model comparison. AP is the standard choice for imbalanced evaluation benchmarks.
Cross-check with the ROC curve — if ROC AUC is high but AP is low, your model is doing well on the majority class but poorly on the minority class you care about.
Use the F1-score to pick a threshold — for balanced precision/recall importance, the F1-maximum point on the PR curve is a principled default threshold.
Consider class-weighted AP for multi-class problems — macro-average AP weights all classes equally; micro-average AP weights by class frequency. Choose based on whether rare classes matter as much as common ones.
Combine with the Confusion Matrix — once you select a threshold from the PR curve, validate TP/FP/FN counts in the Confusion Matrix to ensure the threshold performs as expected in absolute terms.

ROC Curve — evaluates classifier quality across thresholds; more optimistic on imbalanced data; use alongside PR for a complete picture
Confusion Matrix — shows the full error breakdown at a single chosen threshold
SHAP Feature Impact — explains which features drive the classifier's positive-class predictions
SHAP Dependence Plot — examines how individual feature values push predictions toward or away from the positive class

Precision-Recall Curve

Overview

Common Use Cases

Imbalanced Classification

Threshold Selection

Model Comparison

Settings

Show Average Precision

Interpreting the Precision-Recall Curve

Reading the Curve

The Precision-Recall Trade-off

Choosing the Optimal Threshold

Average Precision as a Single Number

PR Curve vs ROC Curve

Tips for Effective Use

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Precision-Recall Curve

Overview

Common Use Cases

Imbalanced Classification

Threshold Selection

Model Comparison

Settings

Show Average Precision

Interpreting the Precision-Recall Curve

Reading the Curve

The Precision-Recall Trade-off

Choosing the Optimal Threshold

Average Precision as a Single Number

PR Curve vs ROC Curve

Tips for Effective Use

Related Visualizations

On this page

Command Palette