Dokumentation (english)

Residual Plot

Analyze regression model errors by plotting residuals against predicted values

Use me when you want to verify that your regression model's errors are well-behaved — random, small, and free of hidden patterns. I turn abstract prediction errors into a visual diagnostic: if I look like a flat cloud of points around zero, your model is healthy; if I show curves, funnels, or drift, I'm pointing directly at the problem to fix.

Overview

A residual plot is a scatter plot where the x-axis shows the predicted values from a regression model and the y-axis shows the residuals — the signed difference between each actual value and its corresponding prediction (residual = actual − predicted). A horizontal reference line at residual = 0 marks the ideal: a perfect prediction has zero residual. The spatial distribution of points around this line tells you almost everything about the quality of the model's assumptions.

Best used for:

  • Checking whether a linear regression model is appropriate for the data
  • Detecting heteroscedasticity (non-constant error variance)
  • Identifying systematic bias or nonlinear patterns the model has missed
  • Validating the normality of residuals before reporting confidence intervals
  • Spotting influential outliers that distort model fit
  • Comparing residual behavior before and after feature engineering

Common Use Cases

Regression Model Validation

  • Verifying linear regression assumptions before finalizing a model
  • Checking that a polynomial regression degree is sufficient (no curved residual pattern)
  • Validating that log-transforming the target variable has resolved heteroscedasticity
  • Confirming that adding interaction terms has eliminated a systematic residual pattern

Feature Engineering & Model Improvement

  • Identifying where in the prediction range the model consistently under- or over-predicts
  • Discovering that a missing feature causes residuals to correlate with predicted value
  • Checking whether outlier removal or Winsorization has cleaned up influential points
  • Comparing residual spread before and after applying a Box-Cox transformation

Diagnostic Reporting

  • Including a residual plot in model documentation to demonstrate assumption compliance
  • Presenting error behavior to stakeholders alongside RMSE and R² metrics
  • Validating assumptions required for statistical inference (confidence intervals, p-values)
  • Satisfying audit or regulatory requirements for explainable model behavior

Settings

Show Zero Line

Optional — Draws a horizontal red reference line at residual = 0. Default: ON.

The zero line is the target: a perfect prediction sits exactly on it. Keeping this line visible makes it immediately obvious which predictions are over-estimates (points above the line, residual > 0) and which are under-estimates (points below, residual < 0). Only disable it if you are overlaying this plot on another chart where the line adds visual clutter.

Advanced Options

Show Histogram

Optional — Adds a histogram of residuals alongside the scatter plot. Default: OFF.

When enabled, the residual distribution appears as a marginal histogram on the y-axis. This makes it easy to check whether residuals are approximately normally distributed — a core assumption of ordinary least squares regression. A roughly bell-shaped histogram centered at zero supports the normality assumption; a skewed or multi-modal histogram suggests the model is systematically missing something.

Interpreting the Residual Plot

What a Good Residual Plot Looks Like

Points should be scattered randomly in a horizontal band around the zero line, with roughly equal density above and below across the full range of predicted values. There should be no obvious curves, fans, or trends. This "cloud of randomness" confirms that:

  • The model has captured the underlying relationship correctly
  • Errors are independent of the magnitude of the prediction
  • No important variables have been omitted

Common Patterns and What They Mean

U-shape or inverted U-shape (curved pattern): Residuals are systematically positive at the extremes and negative in the middle, or vice versa. This means the true relationship is nonlinear and your linear model is missing a quadratic or higher-order term. Fix: add polynomial features, apply a log/square-root transformation, or switch to a nonlinear model.

Funnel or fan shape (heteroscedasticity): Residuals are small for low predicted values and large for high predicted values (or the reverse). Error variance is not constant — it grows or shrinks with the prediction. Fix: apply a log or square-root transformation to the target variable, use weighted least squares, or switch to a model that handles heteroscedasticity natively (e.g., Poisson regression for count data).

Diagonal trend (systematic bias): Residuals trend upward or downward as predicted values increase. The model consistently under-predicts at one end of the scale and over-predicts at the other. Fix: check for a missing feature, try a different functional form, or reconsider the target variable scale.

Isolated outliers: A small number of points sit far from the zero line. These are high-residual observations that may be data entry errors, legitimate extreme cases, or influential points that are pulling the regression line away from the majority of the data. Fix: investigate each outlier individually — remove only confirmed errors, not legitimate observations.

Perfectly random cloud: No discernible pattern. This is what you want. Proceed with confidence intervals, hypothesis tests, and predictions.

Using the Histogram Mode

When the histogram is enabled, look for:

  • Bell shape centered at 0 — residuals are normally distributed; OLS assumptions hold.
  • Skewed distribution — systematic over- or under-prediction; consider a transformation.
  • Heavy tails — outliers are more frequent than expected; robust regression may be appropriate.
  • Two humps (bimodal) — the data may contain two distinct subgroups the model is treating as one.

Tips for Effective Use

  1. Check the zero line first — before looking for patterns, confirm that the bulk of points is centered on zero, not shifted above or below it. A systematic offset means the model has a constant bias.

  2. Scan horizontally, not vertically — move your eye from left to right across the predicted-value axis. Ask: "Does the spread of residuals change as predicted values increase?" A changing spread is heteroscedasticity.

  3. Use the histogram to double-check normality — if you plan to report confidence intervals or p-values, the normality assumption matters. The histogram gives you a fast visual check.

  4. Look at outliers in context — a residual of 15 is only large if the typical residual is 2. Always read residual magnitude relative to the overall spread of the cloud.

  5. Compare before and after transformations — apply a log transform to the target, re-fit the model, and generate a new residual plot. If the funnel shape disappears, the transformation worked.

  6. Pair with R² and RMSE — a residual plot showing a curved pattern is a problem even if R² looks acceptable. Always use both numeric metrics and this visual diagnostic together.


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern
STRG + BSidepanel umschalten

Software-Details
Kompiliert vor etwa 3 Stunden
Release: v4.0.0-production
Buildnummer: master@4f04153
Historie: 70 Items