ROC Curve
Visualize the trade-off between true positive rate and false positive rate across decision thresholds
Use me when you need to understand how well your classifier separates classes — at every possible decision threshold, not just the default one. I show you the full performance envelope of your model, let you compare multiple models on the same axes, and help you pick the operating point that balances the cost of false alarms against the cost of missed detections.
Overview
An ROC (Receiver Operating Characteristic) curve plots the True Positive Rate (TPR, also called sensitivity or recall) on the Y axis against the False Positive Rate (FPR, also called 1 − specificity) on the X axis, sweeping across every possible classification threshold from 0 to 1. Each point on the curve answers: "If I set my threshold here, how many real positives do I catch, and how many negatives do I accidentally flag?"
The Area Under the Curve (AUC) summarises the entire curve into one number:
- 0.5 — random classifier (the diagonal chance-level line)
- 0.7–0.8 — acceptable
- 0.8–0.9 — good
- > 0.9 — excellent
- 1.0 — perfect (likely overfitting)
Requires a trained model. This plot belongs to the evaluation category and uses training data. You must have a trained model node upstream in your pipeline before this plot can be generated.
Best used for:
- Evaluating binary or multi-class classifiers independent of any fixed threshold
- Comparing multiple models side-by-side on the same axes
- Selecting the optimal decision threshold for your application
- Diagnosing whether a model is better than random chance
- Communicating model quality to stakeholders with a single AUC number
Common Use Cases
Binary Classification
- Fraud detection: tune the threshold to control false-alarm rate for investigators
- Medical screening: find the threshold that catches a target percentage of positive cases
- Spam filtering: balance precision against recall for end-user experience
- Credit scoring: compare classifier families (logistic regression, gradient boosting, neural networks) on the same data
Multi-Class Classification
AICU generates one ROC curve per class (one-vs-rest), plus optional micro-average and macro-average curves, letting you see per-class performance at a glance.
Model Selection & Comparison
Plot multiple models on the same axes. The model whose curve sits furthest toward the upper-left corner — and carries the highest AUC — is generally the stronger performer, independent of any threshold choice.
Settings
Show AUC
Optional — Display the Area Under the Curve value in the legend.
Default: On
When enabled, the AUC score is appended to the trace name in the legend (e.g., ROC Curve (AUC = 0.89)). This gives a quick, threshold-independent summary of classifier quality alongside the visual curve.
Show Chance Level
Optional — Draw a diagonal dashed reference line from (0, 0) to (1, 1).
Default: On
The chance-level line represents a random classifier with AUC = 0.5. Any curve above this line performs better than chance; curves close to it indicate the model has learned very little. Keeping this visible provides an immediate visual anchor for interpreting curve quality.
Interpreting the ROC Curve
Reading the Curve
Upper-left corner (ideal): A curve that hugs the top-left corner achieves high TPR while keeping FPR near zero — i.e., the model catches most positives while raising few false alarms. The closer the curve bows to the corner, the better.
Diagonal line (random): A model that simply guesses based on class prevalence sits on the diagonal. This is your baseline — any useful classifier must consistently lie above it.
Below the diagonal: Rare in practice; it means the model is anti-correlated with the true labels. Flipping its predictions would outperform the original — usually a sign of a labelling or data-leakage bug.
Choosing the Optimal Threshold
The ROC curve contains every possible threshold operating point. To choose one:
- Equal-cost criterion — pick the point on the curve closest to (FPR=0, TPR=1), the perfect corner.
- Cost-sensitive criterion — if false negatives are more costly than false positives (e.g., disease screening), shift the operating point leftward on the curve to increase TPR at the expense of FPR.
- Youden's J statistic — maximise
TPR − FPR; the point where the curve is farthest above the diagonal.
AUC as a Single Number
AUC has a probabilistic interpretation: it equals the probability that the model ranks a randomly chosen positive example higher than a randomly chosen negative one. An AUC of 0.89 means there is an 89% chance the model scores a true positive above a true negative — useful for comparing classifiers regardless of dataset imbalance.
ROC vs Precision-Recall Curve
| Situation | Prefer |
|---|---|
| Balanced classes | ROC / AUC |
| Highly imbalanced classes (rare positives) | Precision-Recall Curve |
| You care equally about both classes | ROC / AUC |
| You mostly care about the positive class | Precision-Recall Curve |
Tips for Effective Use
-
Always show the chance-level line — it provides instant visual context for whether the model is useful at all.
-
Report AUC alongside the curve — stakeholders often need a single number; AUC is threshold-independent and widely understood.
-
Compare models on the same axes — plot multiple ROC curves together. Curves that cross indicate one model is better at low FPR, another at high FPR; AUC alone won't reveal this.
-
Check AUC on held-out data — a training-set AUC near 1.0 with a validation AUC near 0.5 is a strong overfitting signal.
-
For imbalanced data, cross-check with PR curve — ROC curves can look optimistic when negatives vastly outnumber positives. The Precision-Recall Curve is more informative in that regime.
-
Use with the Confusion Matrix — once you pick a threshold from the ROC curve, validate the resulting TP/FP/TN/FN counts in the Confusion Matrix.
Related Visualizations
- Precision-Recall Curve — better suited to imbalanced datasets; plots precision vs recall instead of TPR vs FPR
- Confusion Matrix — shows the full error breakdown at a single chosen threshold
- SHAP Feature Impact — explains which features drive the classifier's decisions
- SHAP Dependence Plot — examines how individual feature values affect predictions