Cheatsheet: From sample to population: sampling and the central limit theorem
The one idea
Section titled “The one idea”A statistic measured on a sample estimates a population parameter, with sample-to-sample wobble. The central limit theorem makes the sample mean approximately normal, which is what makes inference possible.
Four words
Section titled “Four words”Population = everything you care about. Sample = the subset you measure.Parameter = true population number (unknown). Statistic = sample estimate of it (known).Inference = use the statistic to say something about the parameter.Sampling distribution and standard error
Section titled “Sampling distribution and standard error”A statistic is a RANDOM VARIABLE (different sample -> different value = sampling variability).Sampling distribution = the distribution of the statistic over all possible samples. center = the true parameter (unbiased) spread = the STANDARD ERROR
Standard error of the mean = sigma / sqrt(n) sigma = 20, n = 100 -> SE = 20/10 = 2 sigma = 20, n = 400 -> SE = 20/20 = 1 (4x data -> half the error)The square-root law
Section titled “The square-root law”SE shrinks with sqrt(n), not n. Halve the error -> QUADRUPLE the sample.First data helps a lot; the millionth point barely moves the estimate (diminishing returns).The central limit theorem
Section titled “The central limit theorem”For a large enough sample, the sampling distribution of the MEAN is approximately NORMAL,no matter the population's shape (even skewed/bimodal).=> Why the bell is everywhere; why z-scores and 68-95-99.7 apply to estimates; the foundation for confidence intervals and hypothesis tests.In machine learning
Section titled “In machine learning”- A test-set metric (accuracy, etc.) is a sample estimate with a standard error.
- More test data -> smaller SE -> tighter estimate (diminishing returns).
- Differences of sample metrics are ~normal -> lets you compare models / run A/B tests.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Confusing the data’s spread (sigma) with the estimate’s spread (SE = sigma/sqrt(n), smaller).
- Thinking more data makes the DATA less spread out (it tightens the ESTIMATE).
- Treating the statistic as the exact parameter (it is an estimate with error).
- Forgetting the CLT is a large-sample result (small n + very skewed may not be normal yet).
Words to use precisely
Section titled “Words to use precisely”- Parameter: a true population value (usually unknown).
- Statistic: a sample value estimating a parameter.
- Sampling distribution: the distribution of a statistic across samples.
- Standard error: the standard deviation of the sampling distribution; sigma/sqrt(n) for the mean.
- Central limit theorem: sample means are approximately normal for large n, any population shape.