Correlations
Pairwise Pearson/Spearman correlation analysis between numerical columns
Use me when you need to measure exactly how strongly your numerical variables are related — and in which direction. I'll compute pairwise Pearson correlation coefficients across every column pair you choose, then lay them out in a color-coded matrix so you can instantly spot which variables climb together, which pull against each other, and which are strangers.
Overview
The Correlations plot computes pairwise linear (Pearson) correlation coefficients between selected numerical columns and renders them as a symmetric heatmap. Each cell holds a value from -1 to +1: values near +1 mean the two variables rise and fall together; values near -1 mean they move in opposite directions; values near 0 indicate no linear relationship. A diverging blue-white-red colorscale maps those extremes to color so patterns stand out at a glance.
Best used for:
- Quickly scanning all pairwise relationships across many numerical columns at once
- Identifying strong predictors of a target variable
- Detecting multicollinearity before building regression or ML models
- Flagging redundant features that carry duplicate information
- Generating hypotheses for deeper pairwise investigation
- Validating that expected relationships (or the absence of them) hold in your data
Common Use Cases
Data Science & Machine Learning
- Feature selection: find columns most correlated with your target variable
- Multicollinearity detection before linear regression or logistic regression
- Pruning redundant features from high-dimensional datasets
- Understanding which inputs a model is likely to conflate
Statistical Analysis
- Exploratory data analysis (EDA) as a first pass over a new dataset
- Hypothesis generation — strong correlations raise questions worth testing
- Data quality checks — suspiciously perfect correlations may signal copy-paste errors
- Confirming that theoretically independent variables are in fact uncorrelated
Healthcare & Life Sciences
- Understanding relationships between lab values and clinical outcomes
- Identifying biomarkers that co-vary with disease progression
- Flagging redundant measurements in a clinical panel
Business Analytics
- Discovering which KPIs move together across time periods
- Identifying product or channel affinities from behavioral data
- Validating that marketing spend and revenue are positively linked
Options
Columns of Interest
Optional — Select which numerical columns to include in the analysis.
When left empty, all numerical columns in the dataset are used. Selecting a subset focuses the matrix on the variables you care about and improves readability. Choose two or more numerical columns; the plot will produce an N×N matrix where N is the number of selected columns.
Settings
Annotate Segments With Value
Optional — Off by default.
When switched on, the Pearson correlation coefficient is printed inside every cell of the matrix. This is especially useful when:
- Color differences between nearby values (e.g. 0.62 vs 0.71) are hard to distinguish visually
- You need exact numbers for a report or presentation
- You are working with a small matrix (≤ 8 columns) where text fits comfortably
For large matrices with many columns, leave this off to keep the chart readable and rely on the color gradient for a high-level overview.
Understanding Correlation Values
The Pearson Coefficient
The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two numerical variables. It always falls in the range -1 to +1:
| Range | Interpretation |
|---|---|
| +0.8 to +1.0 | Strong positive correlation |
| +0.5 to +0.8 | Moderate positive correlation |
| +0.2 to +0.5 | Weak positive correlation |
| -0.2 to +0.2 | Little or no linear correlation |
| -0.5 to -0.2 | Weak negative correlation |
| -0.8 to -0.5 | Moderate negative correlation |
| -1.0 to -0.8 | Strong negative correlation |
Positive Correlation (red cells)
Both variables tend to increase together.
- Example: Albumin and N_Days — patients with higher albumin levels tend to survive longer.
- On the chart: warm red color, value closer to +1.
Negative Correlation (blue cells)
One variable increases as the other decreases.
- Example: Bilirubin and N_Days — higher bilirubin is associated with shorter survival.
- On the chart: cool blue color, value closer to -1.
No Linear Correlation (white/near-white cells)
No consistent linear trend between the two variables.
- Example: Alk_Phos and Platelets in the sample above (r = -0.06).
- Note: a near-zero Pearson r does not rule out non-linear relationships.
Interpreting the Matrix
Diagonal
Every cell on the diagonal represents a variable's correlation with itself, which is always exactly 1.0. These cells serve as anchors — the diagonal of perfect self-correlation divides the matrix into two symmetric triangles.
Symmetry
The matrix is symmetric: the value at row A, column B equals the value at row B, column A. You only need to read one triangle; both carry the same information.
Color Scale
- Deep blue → strong negative correlation (approaching -1)
- White → no linear relationship (near 0)
- Deep red → strong positive correlation (approaching +1)
The colorscale is fixed from -1 to +1 so comparisons across different datasets remain consistent.
Interpretation Tips
-
Correlation is not causation. A strong r value means two variables co-vary linearly — it does not tell you which one drives the other, or whether a third variable drives both.
-
Pearson r only captures linear relationships. A variable pair with a curved or U-shaped relationship may show r ≈ 0 even though a strong pattern exists. Use scatter plots to investigate further.
-
Outliers can inflate or deflate r. A single extreme point can create or destroy an apparent correlation. Check distributions before drawing conclusions.
-
Sample size matters. With fewer than 30 observations, individual r values are unreliable. With very large samples, even r = 0.05 may be statistically significant while being practically meaningless.
-
Enable annotations for precision. When exact values matter — for feature selection thresholds, reports, or multicollinearity checks — turn on "Annotate Segments With Value" so you can read the numbers directly without guessing from color.
-
Watch the off-diagonal extremes. Cells with |r| > 0.8 outside the diagonal often indicate redundant features. In regression contexts, consider dropping one of the pair or combining them with PCA.