Skip to content

References: When two things move together: correlation

Source curriculum (structural mirror, cited as further study):
• Khan Academy, "Exploring bivariate numerical data" (Statistics & Probability)
Author: Sal Khan and the Khan Academy team
Unit page: https://www.khanacademy.org/math/statistics-probability/scatterplots-and-correlation
License: CC BY-NC-SA 4.0
Clawdemy's lessons are original prose that follows the pedagogical arc of this
unit. We do not embed, reproduce, or transcribe Khan's text or videos; we link
out to the relevant unit as recommended further study. The non-commercial
clause aligns with Clawdemy's free, zero-revenue posture. All rights to the
original materials remain with their authors.
Source-scope note: this lesson mirrors Khan's treatment of scatterplots and
the correlation coefficient and restates it in Clawdemy's voice with original
examples. It deliberately stops at DESCRIBING a relationship (correlation) and
does NOT teach least-squares regression or line-fitting as a predictive
algorithm; that material lives in the Classical Machine Learning track. Khan's
own unit also introduces least-squares regression lines; Clawdemy splits that
boundary on purpose to avoid duplicating the Classical ML track. The
correlation-is-not-causation discipline and the ML connections (redundant
features, spurious signals) are Clawdemy framing. Exact per-unit URLs are
verified at promotion.
  • Khan Academy: Exploring bivariate numerical data by Sal Khan and the Khan Academy team. The full unit this lesson mirrors, with videos and practice on scatterplots and the correlation coefficient, free and CC-licensed. (Its later sections introduce regression lines, which Clawdemy covers in the Classical Machine Learning track instead.)

A short, durable list. Both are free.

  • Khan Academy, “Study design” (within the course above). The home of the question this lesson raises but does not answer: how do you actually establish causation? Controlled experiments and the difference between observational and experimental data. The natural follow-up to “correlation is not causation.”
  • Khan Academy, “Summarizing quantitative data” (within the course above). Revisit the z-score and standardization material; the correlation coefficient is built directly from those standardized distances, so it makes the formula’s intuition click.

Where this sits inside this track and beyond.

  • The shape of data: distributions and histograms. The previous lesson. Shape was about one variable; correlation is the first look at two variables together, closing Phase 1 (Describing data).
  • Probability foundations. The next lesson and the start of Phase 2 (The laws of chance). The track shifts from describing data to reasoning about uncertainty.
  • Classical Machine Learning (separate track). Where prediction proper lives: fitting a line or curve to predict one variable from another (regression) builds on the correlation idea but is a different job, kept out of this track on purpose.