Dokumentation (english)

Actual vs Predicted

Visualize regression model performance by comparing predictions against ground truth

Use me when you want to see how close your model's guesses are to reality. I'll show you whether your regression model is nailing it (points hugging the diagonal), systematically overshooting or undershooting (band shifted above or below the line), or just all over the place. One glance tells you more about model quality than a raw number ever can.

Overview

An actual vs predicted plot is a scatter plot where each point represents one sample from your dataset. The x-axis shows the true (actual) value and the y-axis shows the value your model predicted. A perfect model would have every point sitting exactly on the diagonal y = x line — the further points stray from it, the larger the prediction error.

Best used for:

  • Assessing overall regression model performance at a glance
  • Detecting systematic bias (all predictions consistently too high or too low)
  • Identifying ranges where the model struggles or excels
  • Spotting heteroscedasticity (errors that grow with the target value)
  • Comparing multiple model versions side by side

Common Use Cases

Regression Model Evaluation

  • House price prediction — are expensive properties under-predicted?
  • Demand forecasting — does the model lose accuracy in peak seasons?
  • Medical measurements — checking predicted lab values against ground truth
  • Energy consumption estimates — validating building energy models

Bias Detection

  • Identifying whether a model consistently over-predicts (all points above the line)
  • Detecting under-prediction in a specific range (cluster of points below the line)
  • Spotting fan-shaped spread that signals heteroscedastic errors

Model Comparison

  • Placing two models side by side to see which keeps points tighter around the diagonal
  • Verifying that a retrained model does not regress on any sub-range

Options

Show Perfect Line

Default: ON — Draws the y = x diagonal reference line across the plot.

The perfect prediction line is the most important visual anchor. Keep it enabled so readers can instantly judge how far predictions deviate from ground truth. Disable it only when the axis scales are very different and the line would compress the point cloud.

Interpreting the Plot

Reading the Diagonal

  • Points on the line — prediction equals actual; zero error
  • Points above the line — model over-predicted (predicted > actual)
  • Points below the line — model under-predicted (predicted < actual)
  • Tight cluster around the line — high R², good model fit
  • Wide scatter — high variance, weak predictive power

Detecting Systematic Bias

If the entire point cloud sits above the diagonal, the model consistently over-estimates. This is a systematic bias that cannot be fixed by adding more data — the model or its features need revisiting. A cloud shifted below the diagonal indicates consistent under-estimation.

R² and the Visual Spread

R² (coefficient of determination) measures the fraction of variance explained by the model. Visually, a higher R² means points hug the diagonal tightly. An R² of 1.0 would produce a perfect line; an R² of 0.0 means the scatter is as wide as if you had simply predicted the mean for every sample.

Heteroscedasticity

If the scatter around the line grows as actual values increase (fan shape), the model errors are not uniform — they scale with the target. This often means a log transformation of the target variable would improve fit.

Tips for Effective Use

  1. Keep the axes equal — Use the same scale and range on both axes so the perfect prediction line sits at 45 degrees. Unequal axes distort the visual impression of accuracy.

  2. Look at both ends — Regression models often perform well on the middle range but poorly at extremes. Zoom into the high-value tail to check for systematic under-prediction.

  3. Pair with the residual plot — This plot shows the raw predictions; the residual plot (predicted − actual vs actual) amplifies small deviations and is complementary for spotting patterns.

  4. Color by a categorical variable — If predictions for one sub-group are consistently off, coloring by that group immediately surfaces the issue.

  5. Add a trend line — A LOWESS smooth through the points shows whether errors are random or follow a systematic curve. A flat line through zero on a residual plot is the ideal.

  6. Check for outliers — Points far from the diagonal are the largest errors. Investigate whether they represent genuine edge cases or data quality problems.


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern
STRG + BSidepanel umschalten

Software-Details
Kompiliert vor etwa 3 Stunden
Release: v4.0.0-production
Buildnummer: master@4f04153
Historie: 70 Items