Skip to content

Summary: The bell curve: the normal distribution

The normal distribution is the bell curve, and the 68-95-99.7 rule plus the z-score make it instantly usable. Named back in the histogram lesson, it describes heights, measurement errors, test scores, and the averages of almost anything. This lesson makes it precise: how a continuous distribution carries probability, what defines a normal, and how to judge how unusual any value is. This summary is the scan-in-five-minutes version of the full lesson.

  • Continuous means area. For a continuous distribution, probability is area under a density curve (total area 1); the probability of a range is its area. No single exact value has its own probability.
  • The normal, in two numbers. A symmetric bell pinned down by its mean (center) and standard deviation (width). Change the mean to slide it, the standard deviation to widen or narrow it; the shape is always the same.
  • The 68-95-99.7 rule. About 68% of values fall within 1 standard deviation of the mean, 95% within 2, 99.7% within 3. For scores with mean 500 and standard deviation 100: about 68% between 400 and 600, 95% between 300 and 700.
  • The z-score. z = (value - mean) / standard deviation, how many standard deviations a value is from the mean. It is the standardization from Phase 1, and its power is comparability: a z of +2 means “near the top” on any scale. With the empirical rule, z = +1 puts about 84% of the distribution below.
  • Why it is everywhere (the next phase’s punchline). The averages and sums of many independent random things tend toward a normal, whatever they started as, which is why so many real quantities are bell-shaped.
  • In AI. The normal underlies feature standardization (z-scores), the default model of noise and weight initialization (Gaussian), and outlier detection (large z-scores). But not all data is normal; skewed data breaks the rule, so check the shape first.

You can now look at any roughly normal quantity and immediately place a value: a z-score tells you how many standard deviations from typical it sits, and the 68-95-99.7 rule turns that into a percentile in your head. That is the everyday move behind reading test percentiles, spotting outliers, and standardizing features for a model. Just as useful is the caution you carry out: the normal is a powerful default, not a universal truth. Before you trust the bell, you check the histogram, because applying the empirical rule to skewed data is one of the quieter ways to be confidently wrong.