Dokumentation (english)

Summary

Get a complete data profile — distributions, counts, nulls, and statistics for every column at a glance

Use me when you first open a new dataset and want to know what you're dealing with. I'm your data profile card — one view that tells you the type, completeness, cardinality, and shape of every column in your dataset. Spot missing values, outliers, and skewed distributions before you write a single line of analysis.

Overview

A summary plot is a tabular data profile that displays key descriptive information for each column in your dataset. Each row in the table represents one column (feature), showing its data type, total count, number of unique values, null count, and — for numerical columns — basic statistics like mean, standard deviation, minimum, and maximum.

It does not require a trained model. Point it at any dataset to get an instant, structured overview.

Best used for:

  • Getting a first-look data profile when exploring a new dataset
  • Identifying columns with high null rates or no variance
  • Spotting numerical columns with suspicious ranges or extreme values
  • Checking cardinality of categorical columns before encoding
  • Communicating data quality issues to stakeholders
  • Validating that a data pipeline produced expected output

Common Use Cases

Initial Data Exploration

  • Profiling a freshly uploaded CSV or database export
  • Checking row counts and column coverage after an ETL job
  • Understanding the mix of numerical vs categorical features
  • Verifying that date or ID columns were parsed correctly

Data Quality Assessment

  • Finding columns with unacceptably high null rates
  • Detecting constant or near-constant columns (Unique = 1 or 2)
  • Identifying numerical columns whose Min/Max suggest bad data (e.g. negative age)
  • Flagging categorical columns with unexpected cardinality (too many or too few unique values)

Pre-Modelling Checklist

  • Confirming the target column looks plausible before training
  • Checking that feature ranges are reasonable for the chosen algorithm
  • Spotting columns that need imputation, encoding, or scaling
  • Documenting the dataset state before feature engineering

Stakeholder Reporting

  • Sharing a one-page dataset overview with non-technical audiences
  • Documenting data completeness as part of a data quality report
  • Communicating dataset size and column coverage in a project handoff

Options

Target Column

Required - Select the column that is your primary variable of interest.

Choose any numerical or categorical column. The summary always profiles every column in the dataset, but the selected target column is highlighted and placed first in the output table, making it easy to evaluate the response variable before examining the predictors.

Tip: For classification tasks pick the class label; for regression pick the continuous outcome; for data exploration pick the column you care most about.

Settings

Display Statistics

Optional — Off by default.

When switched on, the table expands to show the full set of descriptive statistics for numerical columns:

StatisticDescription
CountNumber of non-null values
MeanArithmetic average
StdStandard deviation
MinSmallest observed value
25%First quartile
50%Median
75%Third quartile
MaxLargest observed value

For categorical columns, the quartile and mean/std cells are shown as since they are not applicable.

Leave this off for a compact overview when you only need counts, nulls, and unique values. Turn it on when you need to evaluate distributions, spot outliers, or assess scaling requirements before modelling.

Reading the Summary Table

Column Profiles at a Glance

Each row tells a story about one column:

Typenumerical columns support the full statistics block; categorical columns show count, unique, and nulls only.

Count — total non-null rows. If this is less than the dataset size, the difference is your null count.

Unique — number of distinct values.

  • For a numerical column: high unique count is normal; very low (1–3) may indicate a near-constant feature.
  • For a categorical column: this is the cardinality. Very high cardinality (thousands of unique strings) may cause problems with one-hot encoding.

Nulls — number of missing values. Even a small null count in the target column can invalidate a training run if not handled.

Mean / Std / Min / Max — only shown when Display Statistics is on.

  • A large gap between Mean and Median (50%) indicates skew.
  • Min or Max far from the IQR (25%–75%) signals outliers.
  • Std ≈ 0 means the column is nearly constant and likely useless as a feature.

Warning Signs to Look For

PatternWhat it means
Nulls > 20% of CountHigh missingness — impute or drop
Unique = 1Constant column — no predictive value
Unique = CountPossible ID column — exclude from modelling
Min or Max far outside mean ± 3×stdPotential outlier or data error
Categorical Unique very highHigh-cardinality feature — consider target encoding

Tips for Effective Use

  1. Run it first, every time. Before any visualisation, model, or transformation, run Summary to understand what you have. It prevents surprises later.

  2. Use the Target Column selector deliberately. Placing your target column first makes it easy to check its distribution before diving into predictors.

  3. Enable Display Statistics for numerical columns. The compact view is fine for categorical scans, but you need Min/Max/Std to assess scaling needs and spot outliers.

  4. Pay attention to Nulls in the target column. Rows with null targets are typically dropped by training nodes. A high null count here will silently reduce your training set.

  5. Cross-check Unique vs Count for suspicious columns. A column where Unique equals Count is almost certainly an identifier (row ID, email, UUID) and should be excluded from predictors.

  6. Low Std is a red flag. A standard deviation near zero means nearly all values are identical. Such a column contributes nothing to a model and may cause numerical instability in distance-based algorithms.

  7. Combine with a histogram or box plot. Summary gives you the numbers; a histogram or box plot gives you the shape. Use them together on columns flagged by Summary for deeper investigation.

Use together with Summary:

  • Histogram — visualise the distribution of a numerical column flagged in Summary
  • Box Plot — compare distributions and outliers across groups
  • Correlation — explore pairwise relationships after confirming column quality

Instead of Summary, consider:

  • Table — when you need to inspect individual rows rather than column profiles
  • Heatmap — when you want a visual representation of null patterns across the full dataset

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern
STRG + BSidepanel umschalten

Software-Details
Kompiliert vor etwa 3 Stunden
Release: v4.0.0-production
Buildnummer: master@4f04153
Historie: 70 Items