Skip to content

Cheatsheet: When two things move together: correlation

Correlation measures how tightly two variables move together. It never proves one causes the other. Both halves matter in AI, where models are correlation engines.

r ranges from -1 to +1.
sign = direction (+ rise together, - move oppositely)
magnitude = strength (near +/-1 = tight straight line, near 0 = no linear drift)
Examples: +0.95 strong positive -0.9 strong negative
+0.5 moderate positive 0.05 essentially no linear relationship
Built from z-scores: roughly the average of the products of each point's
two z-scores (above-average-on-both or below-on-both pushes r up).
A U-shaped relationship (high at both ends, low in the middle) is STRONG
but NONLINEAR, so r is near 0. Near-zero r = no LINEAR relationship,
not "no relationship." Always look at the scatterplot.

Four explanations for any correlation between X and Y

Section titled “Four explanations for any correlation between X and Y”
ExplanationExample
X causes YStudying raises exam scores
Y causes XThe arrow runs the other way
A confounder causes bothHot weather behind ice cream sales and drownings
CoincidenceTwo unrelated series that happen to track over a span

Observing the correlation cannot tell you which. Causation usually needs a controlled experiment.

UseWhat it means
Spotting redundant featuresTwo highly correlated inputs carry the same information; one may be dropped
Spotting spurious signalsA model chases correlation and can latch onto a confounder that fails in the world
BoundaryCorrelation DESCRIBES the relationship; REGRESSION predicts from it (Classical ML track, not here)
  • Reading causation into a correlation (run through the four explanations first).
  • Treating r near 0 as “no relationship” (it means no linear one).
  • Forgetting a single outlier can swing r (the scatterplot shows it).
  • Extrapolating a relationship far past the data range.
  • Confusing measuring a relationship (correlation) with predicting from it (regression).
  • Scatterplot: one dot per observation, placed by its two values.
  • Correlation coefficient (r): a number in [-1, +1]; sign is direction, magnitude is strength of the linear relationship.
  • Confounder: a hidden variable causing both correlated variables.
  • Correlation vs causation: moving together is not the same as one driving the other.
  • Regression: fitting a line/curve to predict one variable from another (Classical ML track).