Correlation
Visualize correlations between multiple numerical variables
Use me when you want to see all your variables' relationships at once, in one beautiful grid. I'll show you which variables are best friends (high positive correlation), which are enemies (negative correlation), and which ignore each other (near zero). Essential for data exploration - I'll reveal the hidden connections in your dataset like a relationship therapist for numbers.
Overview
A correlation plot (correlation matrix) displays the correlation coefficients between multiple numerical variables in a heatmap format. Each cell shows how strongly two variables are related, with values ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Color intensity makes patterns immediately visible, helping identify relationships, redundancies, and potential predictors.
Best used for:
- Exploring relationships between multiple variables
- Feature selection for machine learning
- Identifying multicollinearity in regression
- Understanding dataset structure
- Finding redundant or highly correlated features
- Detecting unexpected relationships in data
Common Use Cases
Data Science & Machine Learning
- Feature selection and engineering
- Multicollinearity detection
- Variable redundancy analysis
- Understanding feature relationships
- Identifying potential predictors
Statistical Analysis
- Exploratory data analysis (EDA)
- Understanding variable dependencies
- Hypothesis generation
- Data validation and quality checks
- Relationship strength assessment
Business Analytics
- Customer behavior patterns
- Product affinity analysis
- KPI relationship analysis
- Marketing channel effectiveness
- Sales driver identification
Options
Columns of Interest
Required - Select numerical columns to analyze.
Choose 2 or more numerical columns to calculate correlations between all pairs. The plot will show an N×N matrix where N is the number of selected columns.
Settings
Annotate Segments With Value
Optional - Display correlation values in each cell.
When enabled, shows the numerical correlation coefficient in each cell, making exact values easy to read.
Understanding Correlation Values
Correlation Coefficient Range
- +1.0: Perfect positive correlation
- +0.7 to +0.9: Strong positive correlation
- +0.4 to +0.6: Moderate positive correlation
- +0.1 to +0.3: Weak positive correlation
- 0.0: No correlation
- -0.1 to -0.3: Weak negative correlation
- -0.4 to -0.6: Moderate negative correlation
- -0.7 to -0.9: Strong negative correlation
- -1.0: Perfect negative correlation
Positive Correlation
Variables move together in the same direction:
- When one increases, the other tends to increase
- Example: Height and weight
- Example: Study time and test scores
Negative Correlation
Variables move in opposite directions:
- When one increases, the other tends to decrease
- Example: Speed and travel time
- Example: Price and demand
No Correlation
Variables have no linear relationship:
- Changes in one don't predict changes in the other
- Value near 0
- May still have non-linear relationships
Interpreting the Correlation Matrix
Diagonal
- Always shows 1.0 (perfect self-correlation)
- Each variable is perfectly correlated with itself
- Usually displayed in distinct color
Symmetry
- Matrix is symmetric across diagonal
- Correlation(A, B) = Correlation(B, A)
- Only need to examine one triangle
Color Intensity
- Darker/brighter colors indicate stronger correlations
- Near-zero correlations appear as neutral color
- Look for patterns and clusters
Tips for Effective Correlation Analysis
-
Variable Selection:
- Include relevant numerical variables
- Remove variables with no variance
- Consider standardizing beforeonly if scales differ greatly
- Limit to 15-20 variables for readability
-
Interpretation Guidelines:
- Correlation ≠ causation
- Only measures linear relationships
- Outliers can distort correlations
- Consider sample size and significance
-
Multicollinearity Detection:
- Look for high correlations (> 0.8 or < -0.8)
- In regression, remove one of highly correlated predictors
- Or use dimensionality reduction (PCA)
- Keep variables with different information
-
Feature Selection:
- Identify features highly correlated with target
- Remove redundant features (highly correlated with each other)
- Balance between information and collinearity
- Consider domain knowledge
-
Data Quality Checks:
- Unexpected correlations may indicate data issues
- Very low correlations everywhere may suggest problems
- Check for data entry errors or wrong units
- Verify variable relationships make sense
-
Visualization Tips:
- Enable value annotations for precise reading
- Use color scale appropriate for audience
- Consider showing only lower/upper triangle
- Reorder variables to group related ones
Common Correlation Patterns
Strong Positive Clusters
Groups of variables all positively correlated - may indicate redundancy.
Mixed Relationships
Complex pattern of positive and negative correlations.
Block Diagonal Pattern
Distinct groups of correlated variables with weak between-group correlation.
Target Correlation
One row/column showing which features correlate with target variable.
Statistical Considerations
Sample Size
- Small samples (<30): Correlations unreliable
- Medium samples (30-100): Use with caution
- Large samples (>100): More reliable estimates
- Very large samples: Even tiny correlations may be "significant"
Assumptions
- Linear relationship between variables
- Continuous or ordinal numerical data
- No extreme outliers
- Bivariate normal distribution (for significance tests)
Limitations
- Only detects linear relationships
- Outliers can heavily influence results
- Does not imply causation
- May miss non-linear relationships
Related Analyses
After Correlation Analysis
- Scatter plots - Visualize specific pairwise relationships
- Partial correlation - Remove effect of confounding variables
- Regression analysis - Model relationships
- Principal Component Analysis - Reduce dimensions
- Cluster analysis - Group correlated variables
Example Scenarios
Machine Learning Feature Selection
Identify features correlated with target and remove redundant features.
Financial Data Analysis
Find relationships between economic indicators.
Healthcare Data
Understand relationships between patient measurements and outcomes.
Marketing Analytics
Identify which metrics move together for campaign optimization.
Troubleshooting
Issue: All correlations are very weak (near 0)
- Solution: Variables may truly be independent, or relationships may be non-linear. Check scatter plots. Verify data quality and that variables are numerical.
Issue: Correlation matrix is not symmetric
- Solution: This should not happen. Check data and report as bug. Matrix must be symmetric by definition.
Issue: Perfect correlations (1.0) off the diagonal
- Solution: Two variables are perfectly linearly related - one is redundant or they're measuring the same thing. Remove one.
Issue: Cannot see color differences
- Solution: Enable "Annotate Segments With Value" to see exact numbers. Consider different color scale.
Issue: Too many variables to read
- Solution: Reduce number of columns (limit to 10-15), or create multiple correlation matrices for subsets of variables.
Issue: Unexpected correlations appear
- Solution: May indicate data issues, confounding variables, or interesting relationships. Investigate with scatter plots and domain expertise.
Issue: Negative correlation where positive expected (or vice versa)
- Solution: Verify variable coding (e.g., satisfaction coded as 1=bad, 5=good). Check for data entry errors.
Issue: Very strong correlations everywhere
- Solution: Variables may be measuring similar constructs. Check if variables need to be transformed or if multicollinearity is present.