Learning Curve

Use me when you want to watch your model learn in real time and immediately spot whether it's improving, plateauing, or quietly memorizing the training set. I show both accuracy and loss across every epoch for train and validation splits side-by-side, so you can catch overfitting the moment it starts and pick the exact checkpoint worth saving.

Overview

A learning curve plots model performance metrics against training epochs, using two synchronized subplots: accuracy (top) and loss (bottom). Each subplot draws a train line and a validation line so you can compare how the model behaves on data it has seen versus data it has not. The gap between those two lines is the key signal — a widening gap signals overfitting, no improvement in either signals underfitting, and parallel lines trending in the right direction signals a good fit.

Best used for:

Monitoring training runs to catch overfitting or underfitting early
Selecting the best model checkpoint based on validation performance
Comparing the effect of hyperparameter changes across runs
Deciding when to stop training (early stopping reference)
Communicating training health to stakeholders without sharing raw logs
Diagnosing whether more data or a different architecture is needed

Common Use Cases

Training Monitoring & Early Stopping

Watching loss decrease and accuracy increase epoch by epoch during live training
Identifying the epoch at which validation loss stops improving (early stopping point)
Setting checkpoint callbacks to save weights at the best validation epoch
Deciding whether to resume training or cut it short based on the trend

Overfitting Diagnosis

Detecting the exact epoch where the train/val gap starts widening
Comparing regularization strategies (dropout, weight decay) by how much the gap narrows
Validating that data augmentation is reducing overfitting
Checking that a larger model is worth the overfitting risk

Underfitting & Capacity Analysis

Confirming that both train and val loss are still high after many epochs
Deciding whether to increase model capacity, train longer, or tune the learning rate
Identifying plateau behavior where loss stops decreasing regardless of epochs
Evaluating whether the learning rate is too low (very slow descent) or too high (noisy/diverging loss)

Hyperparameter & Architecture Comparison

Side-by-side comparison of runs with different learning rates or batch sizes
Evaluating the effect of adding or removing layers
Checking whether a learning rate schedule (e.g., cosine annealing) produces a smoother curve
Validating transfer learning fine-tuning: the curve should start high and converge quickly

Settings

Visualization Settings

EMA Smoothing

Optional — Exponential Moving Average smoothing factor applied to all curves. Range: 0 to 0.99, default: 0 (no smoothing).

A value close to 0 shows raw epoch-by-epoch values. Higher values (e.g., 0.9) smooth out noise and make the overall trend clearer, which is especially useful for noisy mini-batch metrics. When smoothing is active, enable Show Original Data to keep the raw lines visible as a faint background reference.

Show Best Epoch Marker

Optional — Draws a vertical dashed line at the epoch with the best validation performance. Default: OFF.

Use this to immediately see which checkpoint is worth loading for inference or further fine-tuning. "Best" is determined by the lowest validation loss.

Show Original Data

Optional — When smoothing is greater than 0, overlays the raw (unsmoothed) data alongside the EMA-smoothed line. Default: OFF.

Useful for verifying that the smoothed trend faithfully represents the underlying data and that no important spikes are hidden.

Axis & Scale

Loss Axis Scale

Optional — Controls the y-axis scale of the loss subplot. Options: Linear (default) / Logarithmic.

Switch to Logarithmic when loss values span several orders of magnitude — for example, when starting loss is ~10 and final loss is ~0.001. The log scale keeps both the early rapid descent and the late fine-grained improvement visible in the same plot.

Interpreting the Learning Curve

Reading the Two Subplots

Accuracy subplot (top):

Both lines should trend upward over epochs.
Train accuracy (solid) typically rises faster than validation accuracy (dashed).
The gap at convergence shows how much the model has overfit: a small gap is healthy.

Loss subplot (bottom):

Both lines should trend downward over epochs.
Train loss (solid) should decrease steadily.
Validation loss (dashed) should follow the same downward trend; if it starts rising while train loss keeps falling, overfitting has begun.

Diagnosing Common Patterns

Good fit: Both train and val metrics converge to strong values with a small, stable gap. The loss curves are smooth and parallel.

Overfitting: Train loss continues to fall while val loss flattens or increases. Train accuracy is significantly higher than val accuracy. The gap widens over time. Remedies: add regularization, use dropout, augment data, or stop training earlier.

Underfitting: Both train and val loss are high and barely decreasing. Both accuracy values are low. The model lacks capacity or has not been trained long enough. Remedies: increase model size, train longer, reduce regularization, or tune the learning rate upward.

High variance (noisy curves): Loss bounces dramatically between epochs. Usually caused by a learning rate that is too high or a batch size that is too small. Use EMA Smoothing to see the trend through the noise, then adjust accordingly.

Diverging loss: Loss increases instead of decreasing. The learning rate is almost certainly too large, or gradients are exploding. Reduce the learning rate or add gradient clipping.

Tips for Effective Use

Enable EMA Smoothing for noisy runs — a smoothing factor of 0.8–0.95 is usually enough to reveal the trend without obscuring real plateaus.
Use the best epoch marker before evaluating — always load the checkpoint from the best validation epoch, not the final epoch, to avoid evaluating an overfit model.
Compare loss and accuracy together — accuracy can appear flat while loss is still meaningfully decreasing; the loss subplot is often the more sensitive early-warning signal.
Switch to log scale when loss starts below 0.1 — differences between 0.05 and 0.01 are invisible on a linear scale but clearly visible on a log scale.
Overlay multiple runs — when comparing architectures or hyperparameters, use the same axis ranges so the visual comparison is fair.
Watch the validation curve, not the training curve — a beautiful training curve means nothing if validation is lagging; always prioritize the dashed lines.

Learning Curve

Overview

Common Use Cases

Training Monitoring & Early Stopping

Overfitting Diagnosis

Underfitting & Capacity Analysis

Hyperparameter & Architecture Comparison

Settings

Visualization Settings

EMA Smoothing

Show Best Epoch Marker

Show Original Data

Axis & Scale

Loss Axis Scale

Interpreting the Learning Curve

Reading the Two Subplots

Diagnosing Common Patterns

Tips for Effective Use

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Learning Curve

Overview

Common Use Cases

Training Monitoring & Early Stopping

Overfitting Diagnosis

Underfitting & Capacity Analysis

Hyperparameter & Architecture Comparison

Settings

Visualization Settings

EMA Smoothing

Show Best Epoch Marker

Show Original Data

Axis & Scale

Loss Axis Scale

Interpreting the Learning Curve

Reading the Two Subplots

Diagnosing Common Patterns

Tips for Effective Use

On this page

Command Palette