Dokumentation (english)

Histogram

Visualize the distribution of numerical data

Use me when you want to see the shape of your data - where the mountains and valleys are. I'll group your numbers into bins and show you the distribution. Are most values clustered in the middle? Skewed to one side? Multiple peaks? I'll reveal if your data is bell-shaped, wonky, or hiding surprises.

Overview

A histogram is a graphical representation of the distribution of numerical data. It groups values into bins (intervals) and displays the frequency or count of observations falling into each bin using bars.

Best used for:

  • Understanding data distribution patterns (normal, skewed, bimodal)
  • Identifying the central tendency and spread of data
  • Detecting outliers and unusual patterns
  • Comparing distributions across different groups
  • Quality control and process monitoring

Common Use Cases

Statistics & Data Analysis

  • Age distribution in a population
  • Test score distributions
  • Income or salary ranges
  • Measurement error analysis

Quality Control & Manufacturing

  • Measurement variation analysis
  • Process capability studies
  • Defect distribution patterns
  • Tolerance compliance checking

Data Science & Machine Learning

  • Feature distribution analysis before modeling
  • Identifying need for data transformations
  • Detecting skewness and outliers
  • Understanding target variable distribution

Options

Target Columns

Required - Select one or more numerical columns to visualize.

You can add multiple columns to compare their distributions side-by-side on the same plot. Each column will be shown in a different color.

Note: You can add multiple columns using the "+" button to compare distributions.

Settings

Show Frequency

Optional - Display count or frequency on Y-axis.

  • On: Shows actual count of observations in each bin
  • Off: Shows probability density (normalized)

Show Legend

Optional - Display legend when multiple columns are shown.

Useful when comparing distributions of multiple variables.

Show Axis Labels

Optional - Display axis labels.

Annotate Bars

Optional - Show values on top of each bar.

Displays the count or frequency for each bin directly on the histogram.

Show KDE

Optional - Overlay a Kernel Density Estimate curve.

A KDE provides a smooth, continuous estimate of the probability density function.

Number of Bins

Optional - Specify how many bins to use.

Enter a number to control the granularity of the histogram. More bins show more detail but may introduce noise; fewer bins show broader patterns.

Auto-calculated if not specified using Sturges' rule or Freedman-Diaconis rule.

Bin Size

Optional - Specify the width of each bin.

Alternative to "Number of Bins". Sets a fixed width for bins (e.g., bins of width 5 for ages: 0-5, 5-10, 10-15, etc.).

Cumulative

Optional - Show cumulative distribution.

Instead of showing frequency in each bin, shows cumulative frequency up to that bin.

Normalization

Optional - How to normalize the histogram.

Options:

  • None - Show raw counts
  • Probability - Normalize so bars sum to 1
  • Probability Density - Normalize to show probability density
  • Percent - Show as percentages

Histogram Function

Optional - Statistical function to apply.

Options:

  • Count - Number of observations (default)
  • Sum - Sum of values
  • Average - Mean of values
  • Min - Minimum value
  • Max - Maximum value

Bar Mode

Optional - How to display multiple histograms.

Options:

  • Overlay - Overlay histograms with transparency
  • Group - Place bars side-by-side
  • Stack - Stack bars on top of each other

Opacity

Optional - Transparency of bars (0-1).

Lower values make bars more transparent, useful when overlaying multiple distributions.

Understanding Distributions

Normal Distribution (Bell Curve)

Symmetric distribution with most values near the mean.

Characteristics:

  • Symmetric around the mean
  • Mean ≈ Median ≈ Mode
  • 68% of data within 1 standard deviation
  • 95% within 2 standard deviations

Right-Skewed Distribution

Long tail on the right side.

Characteristics:

  • Mean > Median
  • Common in: income data, response times, sizes
  • May need log transformation for analysis

Left-Skewed Distribution

Long tail on the left side.

Characteristics:

  • Mean < Median
  • Less common than right-skewed
  • Example: test scores (when most score high)

Bimodal Distribution

Two distinct peaks.

Characteristics:

  • Two modes (peaks)
  • Suggests two different groups or processes
  • Consider separating and analyzing groups individually

Uniform Distribution

Approximately equal frequency across bins.

Characteristics:

  • Flat appearance
  • All values equally likely
  • Example: random number generators

Tips for Effective Histograms

  1. Choose Appropriate Bins:

    • Too few bins hide important features
    • Too many bins create noise
    • Start with auto-calculated bins, then adjust
  2. Consider Bin Width:

    • Use meaningful intervals (e.g., $10,000 for income, 5 years for age)
    • Ensure bins don't hide important patterns
  3. Handle Outliers:

    • Outliers can compress the main distribution
    • Consider filtering extreme values or using log scale
    • Or show outliers separately
  4. Compare Distributions:

    • Use overlay mode with transparency
    • Or use small multiples (facets)
    • Normalize when counts differ greatly
  5. Add Context:

    • Show mean/median lines
    • Add KDE for smooth overview
    • Annotate important bins
  6. Check for Artifacts:

    • Gaps might indicate data collection issues
    • Spikes might indicate rounding or discrete values
    • Verify patterns make domain sense

Troubleshooting

Issue: Distribution looks choppy or irregular

  • Solution: Increase number of bins or use KDE for smoother view

Issue: Can't see the pattern

  • Solution: Try log scale, adjust bin size, or filter outliers

Issue: Multiple distributions hard to compare

  • Solution: Use normalization (probability or percent) so heights are comparable

Issue: Bars are too thin or wide

  • Solution: Adjust number of bins or specify custom bin size

Issue: Peak is cut off

  • Solution: Check Y-axis range in advanced settings

Issue: Data appears discrete but using continuous bins

  • Solution: Adjust bins to align with discrete values (e.g., integer ages)

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items