Machine Learning Pills

Machine Learning Pills

Share this post

Machine Learning Pills
Machine Learning Pills
Issue #96 - Correlation Heatmaps

Issue #96 - Correlation Heatmaps

David Andrés's avatar
Muhammad Anas's avatar
David Andrés
and
Muhammad Anas
Jun 01, 2025
∙ Paid
9

Share this post

Machine Learning Pills
Machine Learning Pills
Issue #96 - Correlation Heatmaps
1
Share

💊 Pill of the Week

Before training any ML model, one of your most important steps is understanding your dataset. Enter: the correlation heatmap — a visual map of relationships between variables.

It doesn’t just look cool — it helps you engineer better features, catch multi-collinearity, and make smarter preprocessing decisions.

Let’s go deeper.

TL;DR

A correlation heatmap shows how features relate using colors.

  • Helps you spot multi-collinearity, select relevant features, and reduce noise.

  • Built on Pearson’s correlation coefficient.

  • Use it before modeling to clean and optimize your dataset.

What’s a Heatmap?

A heatmap is a two-dimensional grid in which each cell is filled with a color whose intensity reflects the magnitude of a numerical value. Darker (or more saturated) shades represent stronger relationships or higher values, while lighter tones signal weaker relationships or lower values. Because the human eye is exceptionally good at comparing colors, heatmaps turn dense tables of numbers—such as correlation matrices, sales figures, or website click counts—into intuitive pictures that reveal patterns, clusters, and outliers at a glance.

Why use one?

  • Instant pattern recognition: Correlations that might take minutes to parse in a numeric table “pop” immediately as blocks of dark or light color.

  • Positive vs. negative ties: Diverging color palettes (e.g., blues for negative, reds for positive) make it easy to see which variables move together and which move in opposite directions.

  • Scalability: Whether your matrix is 5 × 5 or 500 × 500, the visual grammar stays the same—no extra cognitive load.

  • Versatile context: Beyond correlations, heatmaps are common for gene-expression levels, server-load monitoring, A/B-test click maps, and time-series activity calendars (think GitHub commit charts).

Reading one effectively

  1. Check the color scale first—know what counts as “high,” “low,” or “neutral.”

  2. Scan for uniform blocks of intense color; these often indicate clusters of variables that behave similarly.

  3. Look for isolated extremes (single dark or bright squares) that may signal anomalies worth investigating.

  4. Layer extra cues (row/column dendrograms, numeric labels, or tooltips) when you need precise values without sacrificing readability.

In short, a heatmap is a fast-track from raw numbers to visual insight—turning a boring table into a vivid map of relationships you can grasp in seconds.

What’s Pearson Correlation?

Pearson’s r (also called the Pearson product–moment correlation) is a statistic that quantifies the strength and direction of a linear relationship between two continuous variables.

\(r \;=\; \frac{\operatorname{Cov}(X,Y)}{\sigma_X \;\sigma_Y}\)

  • Cov(X, Y) is the covariance between X and Y.

  • σₓ and σᵧ are the standard deviations of X and Y.

How to interpret r

  • +1.0 → perfect positive: as X rises, Y rises in exact lock-step.

  • +0.7 to +0.9 → strong positive: tight upward cluster in a scatterplot.

  • +0.3 to +0.6 → moderate positive: looser upward trend.

  • 0 → no linear relationship: random cloud.

  • -0.3 to -0.6 → moderate negative: looser downward trend.

  • -0.7 to -0.9 → strong negative: tight downward cluster.

  • -1.0 → perfect negative: X rises exactly as Y falls.

(Exact cut-offs vary by discipline; always pair the number with a scatterplot.)

Why analysts love Pearson’s r

  • Single-number summary: Compresses thousands of data-point pairs into one intuitive score.

  • Scale-invariant: Unaffected by the units of measurement—kilograms vs. pounds, dollars vs. euros.

  • Foundation for other tools: Underpins linear regression, factor analysis, and many machine-learning similarity metrics.

Assumptions & caveats

  1. Linearity: r only captures straight-line trends. Curved relationships can hide behind a small r.

  2. Continuous variables: Ordinal or categorical data need other measures (Spearman, Cramér’s V, etc.).

  3. Outlier sensitivity: A single extreme point can inflate or deflate r dramatically—always plot your data.

  4. No causation guarantee: A high r signals association, not cause-and-effect. Hidden variables or reverse causality may lurk.

  5. Homoscedasticity: Variability of Y should be similar across the range of X; funnel shapes in a scatterplot violate this.

Best practices when using Pearson’s r

  1. Visual check first: Plot a scatterplot to verify linearity and spot outliers.

  2. Report confidence intervals or p-values, not just the point estimate, for inferential work.

  3. Complement with domain insight: A “moderate” correlation in psychology might be “weak” in physics—context matters.

  4. Compare to alternatives: If data are ranked, skewed, or contain ties, consider Spearman’s ρ or Kendall’s τ.

In short, Pearson correlation is a workhorse statistic for quickly judging how tightly two variables move together in a straight line. Use it wisely—paired with visuals, assumptions checks, and domain knowledge—to turn raw data pairs into actionable insight.

Covariance vs. Correlation

What each one measures

  • Covariance looks at the raw co-movement of two variables. If they tend to rise and fall together, the number is positive; if they move in opposite directions, it’s negative.

  • Correlation takes that same co-movement but divides it by the product of the variables’ standard deviations, turning the result into a unit-free score of “how straight-line strong, and in which direction.”

Scale and interpretability

  • Covariance can be any real number and carries the combined units of X and Y (e.g., “centimetres × kilograms”). A value of +200 or −500 is meaningless until you know the underlying scales.

  • Correlation is always between −1 and +1. A quick glance tells you:

    • +1 → perfect positive linear link

    • 0 → no linear link

    • −1 → perfect negative linear link

Why analysts usually prefer correlation

  • Unit independence: Because it is normalized, a correlation of +0.82 means the same thing whether you measured weight in pounds or kilos.

  • Instant comparability: You can line up dozens of features and instantly spot which pairs have the strongest relationships.

  • Visualization-friendly: Heatmaps and pair-plots map neatly onto the −1↔+1 range, giving intuitive colour scales without manual tweaking.

  • Model hygiene: High correlations expose multicollinearity, help you drop redundant predictors, and guide effective feature engineering.

Situations where covariance still matters

  • Portfolio-risk math: Modern Portfolio Theory and CAPM use the covariance matrix of asset returns to compute total portfolio variance.

  • Principal-Component Analysis (PCA): The classic PCA algorithm diagonalizes the covariance matrix; you only switch to the correlation matrix if you standardize variables first.

  • Physical interpretation: In engineering or physics, keeping the original dimensional units (e.g., newton-metres) can be essential for understanding the magnitude of co-movement.

Mini numeric example

Let’s imagine three height–weight observations:

  1. (150 cm, 48 kg)

  2. (160 cm, 55 kg)

  3. (170 cm, 63 kg)

From these pairs:

  • Covariance comes out to roughly +30 cm·kg—informative only if you know centimetre–kilogram products well.

  • Correlation is about +0.96, instantly telling you the two variables are almost perfectly, positively, and linearly linked.

Bottom line

Think of covariance as the raw ingredient—it feeds directly into variance-based math and preserves physical units. Correlation is the refined product: bounded, unit-free, and purpose-built for quick comparison, intuitive visuals, and sound feature selection.


🚨Attention!

Packt is organizing the Machine Learning Summit 2025, a 3-day virtual event starting July 16. It’s all about turning ML theory into real-world impact—with hands-on workshops, expert talks, and live sessions. A must-attend for applied ML folks!

Use this code to get a 40% off! SS40

Check it out!


Visualizing Correlation with a Heatmap

Understanding relationships between features in your dataset is crucial for effective data analysis and machine learning. One great way to explore these relationships is by visualizing the correlation matrix using a heatmap.

Here’s a quick and clean way to do this in Python using NumPy and mlxtend.plotting.heatmap.

🔍 Full Code:

import numpy as np
from mlxtend.plotting import heatmap
import matplotlib.pyplot as plt 

# Step 1: Compute the correlation matrix
cm = np.corrcoef(df.values.T)

# Step 2: Create the heatmap
hm = heatmap(cm, 
             row_names=df.columns, 
             column_names=df.columns)

# Step 3: Display the heatmap nicely
plt.tight_layout()
plt.show()

Full code

What does the code do?

Get the Pearson correlation coefficients

  • np.corrcoef(...) then calculates the Pearson correlation coefficients between all feature pairs, resulting in a square correlation matrix.

  • Each element [i, j] in this matrix represents how strongly column i is correlated with column j. Values close to 1 or -1 indicate strong positive or negative correlation respectively.

Plot the heatmap

  • The heatmap function from mlxtend.plotting is used to visualize the matrix.

  • This makes the heatmap readable and directly maps the color-coded correlation values to your actual dataset features.

Interpretation Guide

What to do with this?

  • OverallQual and GrLivArea are strongly related to price → keep them.

  • OverallCond has a low or negative correlation → might be useless.

  • GrLivArea and TotalBsmtSF may be correlated with each other (redundant info).

Summary

Here’s why correlation heatmaps are a must before you model:


Introducing Project Gamma

Hey folks— It’s

Muhammad Anas
,
I’m building Gamma, a real-time, multiplayer coding editor where you can tackle LeetCode-style problems together—live pair-programming, shared dashboards, streaks, and leaderboards, plus easy import from LeetCode/HackerRank.

We’re in beta—jump in, try the demo, and tell me what you think! Thanks!

Sign up!


⚡Power-Up Corner

Correlation Pitfalls Every ML Practitioner Should Know

Understanding how and why variables move together is critical—because the wrong assumption can torpedo an otherwise solid model. Below are three of the most common traps, plus quick fixes you can put into practice today.

1. Correlation ≠ Causation

Keep reading with a 7-day free trial

Subscribe to Machine Learning Pills to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 MLPills
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share