Actual vs Predicted
Visualize regression model performance by comparing predictions against ground truth
Use me when you want to see how close your model's guesses are to reality. I'll show you whether your regression model is nailing it (points hugging the diagonal), systematically overshooting or undershooting (band shifted above or below the line), or just all over the place. One glance tells you more about model quality than a raw number ever can.
Overview
An actual vs predicted plot is a scatter plot where each point represents one sample from your dataset. The x-axis shows the true (actual) value and the y-axis shows the value your model predicted. A perfect model would have every point sitting exactly on the diagonal y = x line — the further points stray from it, the larger the prediction error.
Best used for:
- Assessing overall regression model performance at a glance
- Detecting systematic bias (all predictions consistently too high or too low)
- Identifying ranges where the model struggles or excels
- Spotting heteroscedasticity (errors that grow with the target value)
- Comparing multiple model versions side by side
Common Use Cases
Regression Model Evaluation
- House price prediction — are expensive properties under-predicted?
- Demand forecasting — does the model lose accuracy in peak seasons?
- Medical measurements — checking predicted lab values against ground truth
- Energy consumption estimates — validating building energy models
Bias Detection
- Identifying whether a model consistently over-predicts (all points above the line)
- Detecting under-prediction in a specific range (cluster of points below the line)
- Spotting fan-shaped spread that signals heteroscedastic errors
Model Comparison
- Placing two models side by side to see which keeps points tighter around the diagonal
- Verifying that a retrained model does not regress on any sub-range
Options
Show Perfect Line
Default: ON — Draws the y = x diagonal reference line across the plot.
The perfect prediction line is the most important visual anchor. Keep it enabled so readers can instantly judge how far predictions deviate from ground truth. Disable it only when the axis scales are very different and the line would compress the point cloud.
Interpreting the Plot
Reading the Diagonal
- Points on the line — prediction equals actual; zero error
- Points above the line — model over-predicted (predicted > actual)
- Points below the line — model under-predicted (predicted < actual)
- Tight cluster around the line — high R², good model fit
- Wide scatter — high variance, weak predictive power
Detecting Systematic Bias
If the entire point cloud sits above the diagonal, the model consistently over-estimates. This is a systematic bias that cannot be fixed by adding more data — the model or its features need revisiting. A cloud shifted below the diagonal indicates consistent under-estimation.
R² and the Visual Spread
R² (coefficient of determination) measures the fraction of variance explained by the model. Visually, a higher R² means points hug the diagonal tightly. An R² of 1.0 would produce a perfect line; an R² of 0.0 means the scatter is as wide as if you had simply predicted the mean for every sample.
Heteroscedasticity
If the scatter around the line grows as actual values increase (fan shape), the model errors are not uniform — they scale with the target. This often means a log transformation of the target variable would improve fit.
Tips for Effective Use
-
Keep the axes equal — Use the same scale and range on both axes so the perfect prediction line sits at 45 degrees. Unequal axes distort the visual impression of accuracy.
-
Look at both ends — Regression models often perform well on the middle range but poorly at extremes. Zoom into the high-value tail to check for systematic under-prediction.
-
Pair with the residual plot — This plot shows the raw predictions; the residual plot (predicted − actual vs actual) amplifies small deviations and is complementary for spotting patterns.
-
Color by a categorical variable — If predictions for one sub-group are consistently off, coloring by that group immediately surfaces the issue.
-
Add a trend line — A LOWESS smooth through the points shows whether errors are random or follow a systematic curve. A flat line through zero on a residual plot is the ideal.
-
Check for outliers — Points far from the diagonal are the largest errors. Investigate whether they represent genuine edge cases or data quality problems.
Related Visualizations
- Scatter Plot — General-purpose scatter for exploring raw data relationships
- Correlation Plot — Understand feature-to-feature relationships before modelling
- Global Feature Importance — See which features drive predictions
- SHAP Dependence Plot — Understand how individual feature values affect model output