Skip to content

References: The shape of data: distributions and histograms

Source curriculum (structural mirror, cited as further study):
• Khan Academy, "Displaying and comparing quantitative data" and
"Modeling data distributions" (Statistics & Probability)
Author: Sal Khan and the Khan Academy team
Unit page: https://www.khanacademy.org/math/statistics-probability/displaying-describing-data
License: CC BY-NC-SA 4.0
Clawdemy's lessons are original prose that follows the pedagogical arc of these
units. We do not embed, reproduce, or transcribe Khan's text or videos; we link
out to the relevant units as recommended further study. The non-commercial
clause aligns with Clawdemy's free, zero-revenue posture. All rights to the
original materials remain with their authors.
Source-scope note: this lesson mirrors Khan's treatment of histograms and
distribution shape and restates it in Clawdemy's voice with original
hand-drawn examples. The machine-learning connections (skew transforms, hidden
subpopulations, class imbalance as a visible base-rate problem) are Clawdemy
framing. The bell shape is only introduced here; the normal distribution gets
its own Track 9 lesson. Exact per-unit URLs are verified at promotion.
  • Khan Academy: Displaying and comparing quantitative data by Sal Khan and the Khan Academy team. The full unit this lesson mirrors, with videos and interactive practice on building and reading histograms, dot plots, and box plots, free and CC-licensed. The place to drill shape-reading until it is automatic.

A short, durable list. Both are free.

  • Khan Academy, “Modeling data distributions” (within the course above). Where distribution shape meets the standard deviation: density curves, the empirical rule, and the bell shape that this lesson only names. The natural bridge to Track 9’s normal-distribution lesson.
  • Khan Academy, “Summarizing quantitative data” (within the course above). The previous lesson’s source; revisit it to connect the numeric summaries (mean, median, standard deviation) to the shapes you now read in a histogram.

Where this sits inside this track.

  • Summarizing data: center and spread. The previous lesson. Center and spread are the numbers; this lesson is the picture they summarize, and skew is where the two meet.
  • The bell curve: the normal distribution. Later in the track (Phase 3). The bell shape introduced here becomes a precise, central tool, with the empirical rule and z-scores.
  • Why AI runs on statistics. Lesson 1. Class imbalance, visible in a histogram of labels, is the base-rate problem from the opener made concrete.