Why AI runs on statistics: cheatsheet

The one idea

AI systems report degrees of belief (probabilities), not certainties, because they learn from a limited, noisy sample. Statistics is the discipline of reasoning about those degrees of belief without fooling yourself.

Probability vs statistics

	Probability	Statistics
Direction	Forward: model to data	Backward: data to model
You know	The rules of chance	The observed data
You want	What data to expect	What model produced it
Example	Fair coin to chance of 10 heads	16 of 20 heads to is it fair?
In AI	A model’s prediction + confidence	Evaluating / comparing models

Why uncertainty is unavoidable

Reason	What it means
Learning from a sample	A model saw some data, not all; generalizing is always a bet
The world is noisy	Identical-looking cases have different outcomes; no model removes that

Where each track idea lives in AI

Phase	Idea	Shows up as
1 Describing data	Center, spread, shape, correlation	Reading data before modeling; spotting redundant or degenerate features
2 The laws of chance	Conditional probability, Bayes	Spam filters, fraud detection, medical triage; the base-rate trap
3 Random variables	Distributions, expected value	The normal everywhere; loss functions and rewards as expected value
4 From sample to truth	Sampling, CLT, intervals, tests	Why a test set predicts the future; A/B tests; is model B really better?

Worked example: the base-rate trap

Disease: 1 in 100 people.  Test: 99% accurate both ways.  Population: 10,000.
  Sick                = 100      Healthy             = 9,900
  True positives  99% of 100 = 99    False positives 1% of 9,900 = 99
  Total positives = 99 + 99 = 198
  P(sick | positive) = 99 / 198 = 50%   (NOT 99%)
Rarer target -> more lopsided. The false positives from the large negative
group swamp the true positives from the small positive group.

Pitfalls to dodge

Mistaking confidence for correctness (a confident wrong answer is still wrong).
Reading a single accuracy number without the base rate.
Hearing “correlation” and concluding “cause.”
Treating the training sample as the whole truth.
Thinking probability removes uncertainty (it measures it, it does not erase it).

Words to use precisely

Probability: a degree of belief about an uncertain event; runs forward, model to data.
Statistics: inferring the model behind observed data; runs backward, data to model.
Base rate: how common the thing being detected actually is; decides what an accuracy number means.
Calibration: whether a reported confidence matches the real frequency of being right.
Sample: the limited data a model actually saw, as opposed to the whole population.