Correlation

Use me when you want to see all your variables' relationships at once, in one beautiful grid. I'll show you which variables are best friends (high positive correlation), which are enemies (negative correlation), and which ignore each other (near zero). Essential for data exploration - I'll reveal the hidden connections in your dataset like a relationship therapist for numbers.

Overview

A correlation plot (correlation matrix) displays the correlation coefficients between multiple numerical variables in a heatmap format. Each cell shows how strongly two variables are related, with values ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Color intensity makes patterns immediately visible, helping identify relationships, redundancies, and potential predictors.

Best used for:

Exploring relationships between multiple variables
Feature selection for machine learning
Identifying multicollinearity in regression
Understanding dataset structure
Finding redundant or highly correlated features
Detecting unexpected relationships in data

Common Use Cases

Data Science & Machine Learning

Feature selection and engineering
Multicollinearity detection
Variable redundancy analysis
Understanding feature relationships
Identifying potential predictors

Statistical Analysis

Exploratory data analysis (EDA)
Understanding variable dependencies
Hypothesis generation
Data validation and quality checks
Relationship strength assessment

Business Analytics

Customer behavior patterns
Product affinity analysis
KPI relationship analysis
Marketing channel effectiveness
Sales driver identification

Options

Columns of Interest

Required - Select numerical columns to analyze.

Choose 2 or more numerical columns to calculate correlations between all pairs. The plot will show an N×N matrix where N is the number of selected columns.

Settings

Annotate Segments With Value

Optional - Display correlation values in each cell.

When enabled, shows the numerical correlation coefficient in each cell, making exact values easy to read.

Understanding Correlation Values

Correlation Coefficient Range

+1.0: Perfect positive correlation
+0.7 to +0.9: Strong positive correlation
+0.4 to +0.6: Moderate positive correlation
+0.1 to +0.3: Weak positive correlation
0.0: No correlation
-0.1 to -0.3: Weak negative correlation
-0.4 to -0.6: Moderate negative correlation
-0.7 to -0.9: Strong negative correlation
-1.0: Perfect negative correlation

Positive Correlation

Variables move together in the same direction:

When one increases, the other tends to increase
Example: Height and weight
Example: Study time and test scores

Negative Correlation

Variables move in opposite directions:

When one increases, the other tends to decrease
Example: Speed and travel time
Example: Price and demand

No Correlation

Variables have no linear relationship:

Changes in one don't predict changes in the other
Value near 0
May still have non-linear relationships

Interpreting the Correlation Matrix

Diagonal

Always shows 1.0 (perfect self-correlation)
Each variable is perfectly correlated with itself
Usually displayed in distinct color

Symmetry

Matrix is symmetric across diagonal
Correlation(A, B) = Correlation(B, A)
Only need to examine one triangle

Color Intensity

Darker/brighter colors indicate stronger correlations
Near-zero correlations appear as neutral color
Look for patterns and clusters

Tips for Effective Correlation Analysis

Variable Selection:
- Include relevant numerical variables
- Remove variables with no variance
- Consider standardizing beforeonly if scales differ greatly
- Limit to 15-20 variables for readability
Interpretation Guidelines:
- Correlation ≠ causation
- Only measures linear relationships
- Outliers can distort correlations
- Consider sample size and significance
Multicollinearity Detection:
- Look for high correlations (> 0.8 or < -0.8)
- In regression, remove one of highly correlated predictors
- Or use dimensionality reduction (PCA)
- Keep variables with different information
Feature Selection:
- Identify features highly correlated with target
- Remove redundant features (highly correlated with each other)
- Balance between information and collinearity
- Consider domain knowledge
Data Quality Checks:
- Unexpected correlations may indicate data issues
- Very low correlations everywhere may suggest problems
- Check for data entry errors or wrong units
- Verify variable relationships make sense
Visualization Tips:
- Enable value annotations for precise reading
- Use color scale appropriate for audience
- Consider showing only lower/upper triangle
- Reorder variables to group related ones

Common Correlation Patterns

Strong Positive Clusters

Groups of variables all positively correlated - may indicate redundancy.

Mixed Relationships

Complex pattern of positive and negative correlations.

Block Diagonal Pattern

Distinct groups of correlated variables with weak between-group correlation.

Target Correlation

One row/column showing which features correlate with target variable.

Statistical Considerations

Sample Size

Small samples (<30): Correlations unreliable
Medium samples (30-100): Use with caution
Large samples (>100): More reliable estimates
Very large samples: Even tiny correlations may be "significant"

Assumptions

Linear relationship between variables
Continuous or ordinal numerical data
No extreme outliers
Bivariate normal distribution (for significance tests)

Limitations

Only detects linear relationships
Outliers can heavily influence results
Does not imply causation
May miss non-linear relationships

After Correlation Analysis

Scatter plots - Visualize specific pairwise relationships
Partial correlation - Remove effect of confounding variables
Regression analysis - Model relationships
Principal Component Analysis - Reduce dimensions
Cluster analysis - Group correlated variables

Example Scenarios

Machine Learning Feature Selection

Identify features correlated with target and remove redundant features.

Financial Data Analysis

Find relationships between economic indicators.

Healthcare Data

Understand relationships between patient measurements and outcomes.

Marketing Analytics

Identify which metrics move together for campaign optimization.

Troubleshooting

Issue: All correlations are very weak (near 0)

Solution: Variables may truly be independent, or relationships may be non-linear. Check scatter plots. Verify data quality and that variables are numerical.

Issue: Correlation matrix is not symmetric

Solution: This should not happen. Check data and report as bug. Matrix must be symmetric by definition.

Issue: Perfect correlations (1.0) off the diagonal

Solution: Two variables are perfectly linearly related - one is redundant or they're measuring the same thing. Remove one.

Issue: Cannot see color differences

Solution: Enable "Annotate Segments With Value" to see exact numbers. Consider different color scale.

Issue: Too many variables to read

Solution: Reduce number of columns (limit to 10-15), or create multiple correlation matrices for subsets of variables.

Issue: Unexpected correlations appear

Solution: May indicate data issues, confounding variables, or interesting relationships. Investigate with scatter plots and domain expertise.

Issue: Negative correlation where positive expected (or vice versa)

Solution: Verify variable coding (e.g., satisfaction coded as 1=bad, 5=good). Check for data entry errors.

Issue: Very strong correlations everywhere

Solution: Variables may be measuring similar constructs. Check if variables need to be transformed or if multicollinearity is present.

Correlation

Overview

Common Use Cases

Data Science & Machine Learning

Statistical Analysis

Business Analytics

Options

Columns of Interest

Settings

Annotate Segments With Value

Understanding Correlation Values

Correlation Coefficient Range

Positive Correlation

Negative Correlation

No Correlation

Interpreting the Correlation Matrix

Diagonal

Symmetry

Color Intensity

Tips for Effective Correlation Analysis

Common Correlation Patterns

Strong Positive Clusters

Mixed Relationships

Block Diagonal Pattern

Target Correlation

Statistical Considerations

Sample Size

Assumptions

Limitations

After Correlation Analysis

Example Scenarios

Machine Learning Feature Selection

Financial Data Analysis

Healthcare Data

Marketing Analytics

Troubleshooting

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Correlation

Overview

Common Use Cases

Data Science & Machine Learning

Statistical Analysis

Business Analytics

Options

Columns of Interest

Settings

Annotate Segments With Value

Understanding Correlation Values

Correlation Coefficient Range

Positive Correlation

Negative Correlation

No Correlation

Interpreting the Correlation Matrix

Diagonal

Symmetry

Color Intensity

Tips for Effective Correlation Analysis

Common Correlation Patterns

Strong Positive Clusters

Mixed Relationships

Block Diagonal Pattern

Target Correlation

Statistical Considerations

Sample Size

Assumptions

Limitations

Related Analyses

After Correlation Analysis

Example Scenarios

Machine Learning Feature Selection

Financial Data Analysis

Healthcare Data

Marketing Analytics

Troubleshooting

On this page

Command Palette