Central limit theorem: cheatsheet

The one idea

A statistic measured on a sample estimates a population parameter, with sample-to-sample wobble. The central limit theorem makes the sample mean approximately normal, which is what makes inference possible.

Four words

Population = everything you care about.     Sample = the subset you measure.
Parameter  = true population number (unknown).   Statistic = sample estimate of it (known).
Inference = use the statistic to say something about the parameter.

Sampling distribution and standard error

A statistic is a RANDOM VARIABLE (different sample -> different value = sampling variability).
Sampling distribution = the distribution of the statistic over all possible samples.
  center = the true parameter (unbiased)
  spread = the STANDARD ERROR

Standard error of the mean = sigma / sqrt(n)
  sigma = 20, n = 100 -> SE = 20/10 = 2
  sigma = 20, n = 400 -> SE = 20/20 = 1   (4x data -> half the error)

The square-root law

SE shrinks with sqrt(n), not n.  Halve the error  ->  QUADRUPLE the sample.
First data helps a lot; the millionth point barely moves the estimate (diminishing returns).

The central limit theorem

For a large enough sample, the sampling distribution of the MEAN is approximately NORMAL,
no matter the population's shape (even skewed/bimodal).
=> Why the bell is everywhere; why z-scores and 68-95-99.7 apply to estimates;
   the foundation for confidence intervals and hypothesis tests.

In machine learning

A test-set metric (accuracy, etc.) is a sample estimate with a standard error.
More test data -> smaller SE -> tighter estimate (diminishing returns).
Differences of sample metrics are ~normal -> lets you compare models / run A/B tests.

Pitfalls to dodge

Confusing the data’s spread (sigma) with the estimate’s spread (SE = sigma/sqrt(n), smaller).
Thinking more data makes the DATA less spread out (it tightens the ESTIMATE).
Treating the statistic as the exact parameter (it is an estimate with error).
Forgetting the CLT is a large-sample result (small n + very skewed may not be normal yet).

Words to use precisely

Parameter: a true population value (usually unknown).
Statistic: a sample value estimating a parameter.
Sampling distribution: the distribution of a statistic across samples.
Standard error: the standard deviation of the sampling distribution; sigma/sqrt(n) for the mean.
Central limit theorem: sample means are approximately normal for large n, any population shape.