Normal distribution: brief

What you’ll learn

This is lesson 9 of Track 9 (Statistics & Probability for AI) and the second lesson of Phase 3 (Random variables and the distributions that matter). The histogram lesson named the bell shape and promised it a lesson of its own; this is that lesson. You will learn how a continuous distribution carries probability, what defines the normal distribution, the one rule that makes it usable at a glance, and the z-score that turns “how unusual is this value” into a comparable number. The source curriculum is Khan Academy’s Statistics & Probability course, by Sal Khan and the Khan Academy team, freely available and cited as further study.

The lesson explains probability as area under a density curve, defines the normal by its mean and standard deviation, gives the 68-95-99.7 (empirical) rule with worked test-score numbers, formalizes the z-score (the standardization from Phase 1, now with a name and a use), and closes on the normal in AI: feature standardization, Gaussian noise and weight initialization, outlier detection, and the caution that not all data is normal.

Where this fits

This is lesson 9 of 14, the middle of Phase 3. It builds on the expected value and standard deviation from the previous lesson and on the bell shape named in the histogram lesson, and it makes the z-score from the center-and-spread lesson precise. It also sets up Phase 4: the reason the normal is everywhere (averages of many independent things tend toward it) is the central-limit-theorem lesson’s punchline. The next lesson, on the binomial distribution, returns to discrete counts.

Before you start

Prerequisites: the previous lesson (Random variables and expected value) for the mean and standard deviation of a distribution. The center-and-spread lesson is helpful, since the z-score is the standardization introduced there. Comfort with subtraction and division is the only math required.

About the math

The arithmetic is light: computing a z-score is one subtraction and one division, and the empirical rule turns standard deviations into percentages by memory (68, 95, 99.7). Every idea is anchored to worked numbers (test scores, heights). There is no integration or heavy formula; the density curve is described in words and a picture, and all the working uses the rule of thumb.

By the end, you’ll be able to

Explain how a continuous distribution assigns probability as area under a density curve
Describe the normal distribution in terms of its mean (center) and standard deviation (spread)
Apply the 68-95-99.7 rule to estimate how unusual a value is
Compute a z-score and use it to compare values across different normal distributions
Recognize where the normal distribution and z-scores appear in AI, and that not all data is normal

Time and difficulty

Read time: about 12 minutes
Practice time: about 15 minutes (a self-check, a z-score and empirical-rule exercise, a compare-across-distributions exercise using z-scores, and flashcards)
Difficulty: standard (light arithmetic; the empirical rule and z-score do most of the work)