Skip to content

From sample to population: sampling and the central limit theorem

This is lesson 11 of Track 9 (Statistics & Probability for AI) and the opener of Phase 4 (From sample to truth), the inference phase that the whole track has been building toward. The opening lesson noted that AI learns from a sample, never the whole world; this phase makes that precise. You will learn that any number measured on a sample is an estimate with sample-to-sample wobble, how to measure that wobble with the standard error, and the central limit theorem that makes the estimate’s behavior predictable. The source curriculum is Khan Academy’s Statistics & Probability course, by Sal Khan and the Khan Academy team, freely available and cited as further study.

The lesson separates population from sample and parameter from statistic, introduces the sampling distribution (a statistic is itself a random variable), gives the standard error sigma over root n and the square-root law behind diminishing returns on data, and states the central limit theorem, the reason sample means are normal no matter the population’s shape, which pays off the “why is the normal everywhere” preview from the normal-distribution lesson and powers the confidence intervals and tests to come.

This is lesson 11 of 14 and the first lesson of Phase 4. It builds directly on the normal-distribution lesson (the central limit theorem is why the normal applies to estimates) and uses the random-variable idea from earlier in Phase 3. The next two lessons, Confidence intervals and Hypothesis testing, are both built on the standard error and the central limit theorem introduced here; the capstone then ties the whole track to machine learning.

Prerequisites: The bell curve: the normal distribution (lesson 9), since the central limit theorem makes the normal apply to sample means, and z-scores recur. The random-variable lesson is helpful background. The arithmetic is light, mostly dividing by a square root.

The only computation is the standard error, sigma divided by the square root of n, worked on a few sample sizes to make the square-root law tangible. The central limit theorem is stated and explained in words and is not derived; the goal is to understand what it guarantees and why it matters, not to prove it.

  • Distinguish a population parameter from a sample statistic that estimates it
  • Explain sampling variability and what a sampling distribution is
  • Use the standard error (sigma over root n) to describe how a sample mean’s precision improves with sample size
  • State the central limit theorem and why it makes the sample mean approximately normal regardless of the population’s shape
  • Connect sampling to AI (a test-set metric as a sample estimate, why more data tightens an estimate)
  • Read time: about 12 minutes
  • Practice time: about 15 minutes (a self-check, a standard-error and square-root-law computation, a which-estimate-do-you-trust exercise, and flashcards)
  • Difficulty: standard (one formula and one big idea; the central limit theorem is conceptual, not computational)