Dokumentation (english)

Box Plot

Visualize distribution, quartiles, and outliers

Use me when you want a quick statistical snapshot in a compact box. I'll show you the median (the middle line), where the middle 50% of your data lives (the box), and who the outliers are (the dots floating outside). Perfect for comparing distributions across groups without drowning in details - like comparing test scores across classrooms or salaries across departments. Some say the Violin Plot is better suited for every data, but let me tell you that I am a great plot. I would just recommend to show the data samples and not just a box.

Overview

A box plot (also called box-and-whisker plot) is a standardized way of displaying the distribution of data based on five key statistics: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. The "box" represents the interquartile range (IQR) containing the middle 50% of data, while "whiskers" extend to show the range of the data.

Best used for:

  • Comparing distributions across groups or categories
  • Identifying outliers and extreme values
  • Understanding data spread and quartiles
  • Detecting skewness in distributions
  • Quality control and process variation analysis
  • Comparing multiple datasets side-by-side

Common Use Cases

Statistical Analysis

  • Distribution comparison across groups
  • Identifying data outliers
  • Quartile analysis
  • Variance and spread assessment
  • Skewness detection

Quality Control & Manufacturing

  • Process capability analysis
  • Batch comparison
  • Defect rate distribution
  • Measurement consistency
  • Specification limit checking

Business & Research

  • Salary distributions by department
  • Performance metrics by team
  • Survey response distributions
  • A/B test result comparison
  • Regional performance comparison

Understanding Box Plot Components


      ├── Maximum (or Q3 + 1.5×IQR)

    ┌─┴─┐
    │   │ ← Q3 (75th percentile)
    │ ─ │ ← Median (50th percentile)
    │   │ ← Q1 (25th percentile)
    └─┬─┘

      ├── Minimum (or Q1 - 1.5×IQR)

      ● ← Outliers
  • Box: Contains middle 50% of data. The Interquartile Range (IQR) is calculated as:

$$IQR = Q_3 - Q_1$$

Where $Q_3$ is the third quartile (75th percentile) and $Q_1$ is the first quartile (25th percentile).

  • Median line: Middle value (50th percentile)
  • Whiskers: Extend to 1.5 × IQR or min/max
  • Outliers: Points beyond the whiskers

Tips for Effective Box Plots

  1. Compare Distributions:

    • Use side-by-side boxes to compare groups
    • Look for differences in medians, spread, and outliers
    • Notches help identify significant median differences
  2. Identify Skewness:

    • Symmetric box and whiskers = symmetric distribution
    • Longer upper whisker = right-skewed
    • Longer lower whisker = left-skewed
  3. Handle Outliers:

    • Investigate outliers - are they errors or real extreme values?
    • Consider showing all points to see distribution detail
    • Use different colors for outliers to highlight them
  4. Choose Right Settings:

    • Show all points for small datasets (< 50 points)
    • Use jitter to prevent point overlap
    • Enable meanline when mean differs significantly from median
  5. Scale Appropriately:

    • Use log scale for data spanning orders of magnitude
    • Ensure Y-axis range doesn't hide important features
    • Consider normalizing when comparing different units

Options

X-Axis

Optional - Select a categorical column to group boxes.

When specified, creates separate box plots for each category, allowing side-by-side comparison of distributions. Leave empty for a single box plot.

Y-Axis

Required - Select one or more numerical columns to analyze.

The values in these columns will be used to calculate the box plot statistics (quartiles, median, outliers). You can select multiple columns to compare their distributions.

Aggregation Column

Optional - Apply aggregation before creating box plot.

If you need to aggregate your data first (e.g., sum sales by region before plotting distribution), specify the column and aggregation function here.

Column

Select the column to aggregate.

Aggregation Function

Choose the aggregation method:

Options:

  • Sum - Total values
  • Count - Count occurrences
  • Mean - Average values
  • Median - Middle value
  • Min - Minimum value
  • Max - Maximum value
  • Std - Standard deviation
  • Var - Variance
  • First - First value
  • Last - Last value

Settings

Hide Empty Values

Optional - Exclude categories with no data.

Use Logarithmic Scale For X Axis

Optional - Apply log scale to X-axis.

Useful when X-axis values span multiple orders of magnitude.

Use Logarithmic Scale For Y Axis

Optional - Apply log scale to Y-axis.

Useful when data values span multiple orders of magnitude or are exponentially distributed.

Display Meanline

Optional - Show a line indicating the mean value.

Adds a line inside the box showing the mean (average), which complements the median line.

Linear

Optional - Use linear method for quartile calculation.

Controls the interpolation method used for calculating quartiles.

Exclusive

Optional - Use exclusive method for quartile calculation.

Excludes median from the calculation of quartiles (Type 6 R quantile).

Inclusive

Optional - Use inclusive method for quartile calculation.

Includes median in the calculation of quartiles (Type 7 R quantile, default in many tools).

Show Outliers

Optional - Display points that fall outside the whiskers.

Outliers are defined as points beyond 1.5 × IQR from the quartiles.

All Points

Optional - Show all individual data points.

Overlays all data points on the box plot, useful for seeing the actual distribution and sample size.

Show Suspected Outliers

Optional - Highlight extreme outliers. Shows suspected outliers (points < 4Q1-3Q3 or > 4Q3-3Q1), which are more extreme than regular outliers.

Box Mean Display

Optional - How to display the mean in the box.

Options:

  • None - Don't show mean
  • Mean Only - Show mean as a line
  • Mean + Standard Deviation - Show mean with SD bars

Notched Box

Optional - Display notched box with confidence intervals.

Notches represent the confidence interval around the median. If notches of two boxes don't overlap, medians are likely significantly different.

Jitter

Optional - Random horizontal offset for overlapping points.

Options:

  • None (0.0) - No jitter
  • Low (0.1) - Small offset
  • Medium (0.3) - Moderate offset (default)
  • High (0.5) - Large offset

Useful when showing all points to prevent overlap.

Point Position

Optional - Horizontal position of data points relative to box.

Options:

  • Far Left (-1.8)
  • Left (-1.0)
  • Center (0.0) - Inside the box
  • Right (1.0)
  • Far Right (1.8)

Whisker Width

Optional - Visual width of the whisker lines.

Options:

  • Thin (0.2)
  • Medium (0.5) - Default
  • Thick (0.8)
  • Full (1.0) - Full box width

Outlier Color

Optional - Color for outlier points.

Marker Size

Optional - Size of data point markers.

Options:

  • Small (2)
  • Medium (6) - Default
  • Large (10)

Line Width

Optional - Width of box and whisker lines.

Options:

  • Thin (1)
  • Medium (2) - Default
  • Thick (3)

Opacity

Optional - Transparency of the box fill.

Options:

  • 100% - Fully opaque (default)
  • 80%
  • 65%
  • 50% - Half transparent

Box Mode

Optional - How to display multiple boxes.

Options:

  • Group (Side-by-side) - Default, best for comparison
  • Overlay (On top) - Useful with transparency

Show Legend

Optional - Display legend for multiple series.

Troubleshooting

Issue: Can't see individual data points

  • Solution: Enable "All Points" setting to show individual observations. Use jitter to prevent overlap, especially with discrete data.

Issue: Outliers dominate the visualization

  • Solution: Use log scale if data spans orders of magnitude, filter extreme outliers, or adjust Y-axis limits to focus on the main distribution.

Issue: Boxes are too narrow or too wide

  • Solution: This depends on number of categories and plot width. Reduce number of categories, adjust plot dimensions, or use horizontal orientation.

Issue: Can't tell if median differences are significant

  • Solution: Enable "Notched Box" option. If notches don't overlap between boxes, medians are likely significantly different (95% confidence).

Issue: Mean and median are very different

  • Solution: This indicates skewed distribution. Consider showing both meanline and median, investigate outliers, or use log transformation.

Issue: Points overlap and are hard to distinguish

  • Solution: Increase jitter value (try 0.3-0.5), adjust point position (move points outside box), or reduce marker size.

Issue: Need to compare many categories

  • Solution: Sort boxes by median value for easier comparison, use horizontal orientation for long category names, or create separate plots (small multiples).

Issue: Whiskers extend to unexpected values

  • Solution: Check whisker calculation method (1.5×IQR is standard). Whiskers extend to data points, not calculated values, so they stop at actual min/max within range.

Issue: Distribution looks unusual or suspicious

  • Solution: Show all points to verify, check for data entry errors, investigate outliers individually, consider if transformation is needed.

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items