Cheatsheet: Statistics in machine learning
The one idea
Section titled “The one idea”Every tool in this track lands somewhere in the ML workflow, and the biggest payoff is that model evaluation is statistical inference. The through-line: statistics is not fooling yourself about uncertainty.
The track mapped to the ML workflow
Section titled “The track mapped to the ML workflow”| Stage | Statistical tool | Lesson |
|---|---|---|
| Understand the data | center/spread, shape/skew, correlation, class balance | 2, 3, 4 |
| Read model outputs | conditional probability P(label given inputs); correlation not causation | 6, 4 |
| Train the model | expected value (loss to minimize, reward to maximize); normal noise | 8, 9 |
| Evaluate the model | sampling, standard error, confidence interval, hypothesis test | 11, 12, 13 |
| Read the result honestly | base rates / Bayes, significance vs importance, no causation from correlation | 1, 7, 13, 4 |
Evaluation is inference
Section titled “Evaluation is inference”Test set = a SAMPLE. Metric = a STATISTIC estimating the true value (with a standard error).Report a CONFIDENCE INTERVAL, not a bare number."Is B better than A?" = a HYPOTHESIS TEST (and an A/B test is the same machinery).The train/test split is, at bottom, a sampling problem.Four questions for any model claim
Section titled “Four questions for any model claim”"94% accurate, significantly better than 92%": 1. On how big a test set? What's the confidence interval? (L11, L12) 2. Is the gap significant at that sample size? (L13) 3. Is 94% good given the base rate / class balance? (L1, L7) 4. Is the improvement meaningful (effect size), not just significant? (L13)The boundary with the next track
Section titled “The boundary with the next track”THIS track (statistical thinking): data summaries, outputs as probabilities, expected-value objectives, EVALUATION AS INFERENCE (CI, hypothesis test, base rates).CLASSICAL ML track (model-scoring toolkit): confusion matrix, precision/recall, ROC/AUC, bias-variance tradeoff. Builds on this track; taught there, not here.Pitfalls to dodge
Section titled “Pitfalls to dodge”- Reading a metric as exact (it is an estimate with an interval).
- Confusing significant with meaningful (check effect size).
- Ignoring the base rate (high accuracy can be worthless on rare targets).
- Reading a model’s correlations as causes.
The through-line
Section titled “The through-line”Statistics = the discipline of NOT FOOLING YOURSELF about uncertainty.AI automates inference at scale -> this discipline is how you tell a system thatworks from one that only looks like it does.