Summary

Get a complete data profile — distributions, counts, nulls, and statistics for every column at a glance

Use me when you first open a new dataset and want to know what you're dealing with. I'm your data profile card — one view that tells you the type, completeness, cardinality, and shape of every column in your dataset. Spot missing values, outliers, and skewed distributions before you write a single line of analysis.

Overview

A summary plot is a tabular data profile that displays key descriptive information for each column in your dataset. Each row in the table represents one column (feature), showing its data type, total count, number of unique values, null count, and — for numerical columns — basic statistics like mean, standard deviation, minimum, and maximum.

It does not require a trained model. Point it at any dataset to get an instant, structured overview.

Best used for:

Getting a first-look data profile when exploring a new dataset
Identifying columns with high null rates or no variance
Spotting numerical columns with suspicious ranges or extreme values
Checking cardinality of categorical columns before encoding
Communicating data quality issues to stakeholders
Validating that a data pipeline produced expected output

Common Use Cases

Initial Data Exploration

Profiling a freshly uploaded CSV or database export
Checking row counts and column coverage after an ETL job
Understanding the mix of numerical vs categorical features
Verifying that date or ID columns were parsed correctly

Data Quality Assessment

Finding columns with unacceptably high null rates
Detecting constant or near-constant columns (Unique = 1 or 2)
Identifying numerical columns whose Min/Max suggest bad data (e.g. negative age)
Flagging categorical columns with unexpected cardinality (too many or too few unique values)

Pre-Modelling Checklist

Confirming the target column looks plausible before training
Checking that feature ranges are reasonable for the chosen algorithm
Spotting columns that need imputation, encoding, or scaling
Documenting the dataset state before feature engineering

Stakeholder Reporting

Sharing a one-page dataset overview with non-technical audiences
Documenting data completeness as part of a data quality report
Communicating dataset size and column coverage in a project handoff

Options

Target Column

Required - Select the column that is your primary variable of interest.

Choose any numerical or categorical column. The summary always profiles every column in the dataset, but the selected target column is highlighted and placed first in the output table, making it easy to evaluate the response variable before examining the predictors.

Tip: For classification tasks pick the class label; for regression pick the continuous outcome; for data exploration pick the column you care most about.

Settings

Display Statistics

Optional — Off by default.

When switched on, the table expands to show the full set of descriptive statistics for numerical columns:

Statistic	Description
Count	Number of non-null values
Mean	Arithmetic average
Std	Standard deviation
Min	Smallest observed value
25%	First quartile
50%	Median
75%	Third quartile
Max	Largest observed value

For categorical columns, the quartile and mean/std cells are shown as — since they are not applicable.

Leave this off for a compact overview when you only need counts, nulls, and unique values. Turn it on when you need to evaluate distributions, spot outliers, or assess scaling requirements before modelling.

Reading the Summary Table

Column Profiles at a Glance

Each row tells a story about one column:

Type — numerical columns support the full statistics block; categorical columns show count, unique, and nulls only.

Count — total non-null rows. If this is less than the dataset size, the difference is your null count.

Unique — number of distinct values.

For a numerical column: high unique count is normal; very low (1–3) may indicate a near-constant feature.
For a categorical column: this is the cardinality. Very high cardinality (thousands of unique strings) may cause problems with one-hot encoding.

Nulls — number of missing values. Even a small null count in the target column can invalidate a training run if not handled.

Mean / Std / Min / Max — only shown when Display Statistics is on.

A large gap between Mean and Median (50%) indicates skew.
Min or Max far from the IQR (25%–75%) signals outliers.
Std ≈ 0 means the column is nearly constant and likely useless as a feature.

Warning Signs to Look For

Pattern	What it means
Nulls > 20% of Count	High missingness — impute or drop
Unique = 1	Constant column — no predictive value
Unique = Count	Possible ID column — exclude from modelling
Min or Max far outside mean ± 3×std	Potential outlier or data error
Categorical Unique very high	High-cardinality feature — consider target encoding

Tips for Effective Use

Run it first, every time. Before any visualisation, model, or transformation, run Summary to understand what you have. It prevents surprises later.
Use the Target Column selector deliberately. Placing your target column first makes it easy to check its distribution before diving into predictors.
Enable Display Statistics for numerical columns. The compact view is fine for categorical scans, but you need Min/Max/Std to assess scaling needs and spot outliers.
Pay attention to Nulls in the target column. Rows with null targets are typically dropped by training nodes. A high null count here will silently reduce your training set.
Cross-check Unique vs Count for suspicious columns. A column where Unique equals Count is almost certainly an identifier (row ID, email, UUID) and should be excluded from predictors.
Low Std is a red flag. A standard deviation near zero means nearly all values are identical. Such a column contributes nothing to a model and may cause numerical instability in distance-based algorithms.
Combine with a histogram or box plot. Summary gives you the numbers; a histogram or box plot gives you the shape. Use them together on columns flagged by Summary for deeper investigation.

Use together with Summary:

Histogram — visualise the distribution of a numerical column flagged in Summary
Box Plot — compare distributions and outliers across groups
Correlation — explore pairwise relationships after confirming column quality

Instead of Summary, consider:

Table — when you need to inspect individual rows rather than column profiles
Heatmap — when you want a visual representation of null patterns across the full dataset

Summary

Overview

Common Use Cases

Initial Data Exploration

Data Quality Assessment

Pre-Modelling Checklist

Stakeholder Reporting

Options

Target Column

Settings

Display Statistics

Reading the Summary Table

Column Profiles at a Glance

Warning Signs to Look For

Tips for Effective Use

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Summary

Overview

Common Use Cases

Initial Data Exploration

Data Quality Assessment

Pre-Modelling Checklist

Stakeholder Reporting

Options

Target Column

Settings

Display Statistics

Reading the Summary Table

Column Profiles at a Glance

Warning Signs to Look For

Tips for Effective Use

Related Visualizations

On this page

Command Palette