Skip to content

Why AI runs on statistics

This is the opening lesson of Track 9 (Statistics & Probability for AI), and it sets up everything that follows. Rather than teach a technique, it gives you the map: why every AI system reports probabilities instead of certainties, how the field divides into two opposite directions of reasoning, where each idea in this track resurfaces inside a real machine-learning system, and the one example that shows why the material is worth learning. The source curriculum is Khan Academy’s Statistics & Probability course, by Sal Khan and the Khan Academy team, freely available and cited per lesson as further study.

The lesson opens by listening to how AI talks (a spam filter’s “98% spam,” a model’s “0.91 confidence”) and asks why none of it is yes-or-no. It answers with the two reasons uncertainty never goes away (learning from a sample, a noisy world), separates probability (running a model of chance forward to expected data) from statistics (running observed data backward to the model behind it), tours where each phase of the track lands in a real system, and closes on the base-rate example: how a test that is 99% accurate for a 1-in-100 disease gives a positive that is real only about half the time.

This is lesson 1 of 14 and the entry point of the track. There is no previous lesson; this one earns its place by explaining why a literacy track puts statistics this early. It previews the whole arc: Phase 1 describes data, Phase 2 builds probability and Bayes, Phase 3 covers random variables and the key distributions, and Phase 4 reaches inference, the reasoning that decides whether an AI system actually works. The next lesson, Summarizing data: center and spread, begins the tour proper.

Prerequisites: none beyond comfort with basic arithmetic and a little algebra. This opening lesson needs no formulas at all; the one worked example is done by counting a population of 10,000 people. If you can follow “1% of 9,900 is 99,” you have everything you need. Curiosity about how AI systems actually reach their answers is the only real requirement.

Track 9 keeps the math at the intuition level throughout, and this orientation lesson has essentially none. The base-rate example is pure counting, no equations. Later lessons introduce notation gently and always anchor it to a worked example you can follow by hand. The goal is reasoning, not formula-pushing.

  • Explain why AI systems reason in probabilities rather than certainties
  • Distinguish probability (model forward to data) from statistics (data backward to a model) and name where AI uses each
  • Map the ideas in this track to where they show up inside machine learning
  • Work the base-rate example to show why a highly accurate test can still produce mostly wrong positives
  • State the through-line of statistical thinking: not fooling yourself about uncertainty
  • Read time: about 10 minutes
  • Practice time: about 12 minutes (a self-check, a probability-or-statistics sorting exercise, a base-rate counting exercise, and flashcards)
  • Difficulty: standard (a conceptual orientation lesson; no math beyond counting)