Skip to content

Random variables and expected value

This is lesson 8 of Track 9 (Statistics & Probability for AI) and the opener of Phase 3 (Random variables and the distributions that matter). Phase 2 reasoned about whether events happen; this phase reasons about numbers that depend on chance. You will learn what a random variable is, how to summarize one with its expected value (the long-run average) and its variance (the spread), and why expected value is the single most important quantity in machine-learning training. The source curriculum is Khan Academy’s Statistics & Probability course, by Sal Khan and the Khan Academy team, freely available and cited as further study.

The lesson defines random variables (discrete and continuous) and their distributions, computes the expected value of a die and of a betting game, shows that an expected value need not be an achievable outcome (a die’s is 3.5), introduces variance and standard deviation as the spread of outcomes, and closes by connecting expected value to machine learning: a loss function is an expected error to minimize, a reward is an expected payoff to maximize, and reported performance is an expectation over the data.

This is lesson 8 of 14 and the first lesson of Phase 3. It builds on Phase 2’s probability rules and carries the center-and-spread idea from Phase 1 into the world of distributions. The next lesson, The bell curve: the normal distribution, takes the expected value and standard deviation into the continuous setting; the binomial lesson after it returns to discrete counts. Expected value, introduced here, is also the engine behind the loss and reward language used across machine learning.

Prerequisites: Probability foundations (lesson 5), since a random variable’s distribution is built on probabilities that sum to 1. The earlier center-and-spread lesson is helpful, because expected value and variance are the distribution-level versions of the mean and variance you met there. Comfort with multiplying and adding decimals is the only math required.

The arithmetic is straightforward: multiply each value by its probability and add (for expected value), and a similar weighted sum of squared distances (for variance). Every formula is worked on a concrete example, a die, a coin bet, a payoff game, so the calculation and its meaning arrive together. No algebra beyond arithmetic with fractions and decimals.

  • Define a random variable and its probability distribution, and distinguish discrete from continuous
  • Compute the expected value of a discrete random variable as the probability-weighted average
  • Explain that the expected value is a long-run average and need not be an achievable outcome
  • Compute the variance and standard deviation of a random variable and read them as the spread of outcomes
  • Recognize expected value as the core of machine-learning objectives (a loss to minimize, a reward to maximize)
  • Read time: about 12 minutes
  • Practice time: about 15 minutes (a self-check, a compute-the-expected-value exercise, a compare-two-bets exercise separating value from risk, and flashcards)
  • Difficulty: standard (weighted-average arithmetic; the ideas build directly on Phase 1’s center and spread)