Random variables and expected value
Phase 2 reasoned about events: this happens or it does not. But often the thing you actually care about is a number that depends on chance. How much does this bet pay? How many support tickets will the agent close today? How big is the error the model just made? Each of these is a number whose value is set by a random outcome, and that is exactly what a random variable is. Once you have random variables, you can ask the question that runs underneath almost all of machine learning: on average, what do we expect? That average is the expected value, and it is the most important single quantity in this phase.
What a random variable is
Section titled “What a random variable is”A random variable is a variable whose value is a numerical outcome of a random process. Roll a die and let X be the number that comes up: X is a random variable that can be 1, 2, 3, 4, 5, or 6. Flip a coin and let X be 1 for heads and 0 for tails: also a random variable. The random process produces an outcome; the random variable attaches a number to it.
Random variables come in two kinds:
- Discrete: the values are separate and countable, like a die roll, the number of heads in ten flips, or a count of tickets. You can list the possible values.
- Continuous: the values fall anywhere in a range, like a height, a wait time, or a temperature. You cannot list them; they form a continuum. Continuous random variables come into focus in the next lesson, on the normal distribution; this lesson works with discrete ones.
A random variable comes with a probability distribution: the list of its possible values, each paired with its probability. For a fair die, every value 1 through 6 has probability 1/6. As always, the probabilities across all possible values add up to 1.
Expected value: the long-run average
Section titled “Expected value: the long-run average”The expected value of a random variable, written E[X], is the average value it would produce if you repeated the random process many, many times. You compute it by multiplying each value by its probability and adding them up:
E[X] = sum over all values of (value x its probability)For a fair die:
E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = (1 + 2 + 3 + 4 + 5 + 6) / 6 = 21 / 6 = 3.5The expected value of a die roll is 3.5. Notice something important: 3.5 is not a value the die can ever show. The expected value is the long-run average, not a prediction of any single roll and not necessarily an achievable outcome. Over thousands of rolls the average closes in on 3.5; on any one roll you get a whole number.
Expected value shines for decisions under uncertainty. Consider a game: you win $10 with probability 0.2, and lose $3 with probability 0.8.
E[payoff] = (+10)(0.2) + (-3)(0.8) = 2 - 2.4 = -0.4The expected payoff is -$0.40 per play. Any single game you might win or lose, but played repeatedly this is a money-loser, about 40 cents a game. Expected value is how you compare uncertain options on a common footing: the option with the better expected value is the better long-run bet.
Variance: how spread out the outcomes are
Section titled “Variance: how spread out the outcomes are”Expected value gives the center; as in the earlier center-and-spread lesson, you also want the spread. The variance of a random variable measures how far its outcomes typically land from the expected value, weighting each squared distance by its probability:
Var(X) = sum over all values of (value - E[X])^2 x its probabilityand the standard deviation is its square root, back in the original units. Take a simple bet: win $1 on heads, lose $1 on tails, each with probability 0.5.
E[X] = (+1)(0.5) + (-1)(0.5) = 0Var(X) = (1 - 0)^2 (0.5) + (-1 - 0)^2 (0.5) = 0.5 + 0.5 = 1Standard deviation = square root of 1 = 1So this bet has an expected value of 0 (fair) but a standard deviation of 1 (outcomes sit a dollar away from the average either way). Two random variables can share an expected value and differ wildly in variance, exactly the center-versus-spread point from Phase 1, now for distributions instead of datasets. Variance is the language of risk: same expected payoff, more variance, more uncertainty in any single outcome.
Why this matters when you use AI
Section titled “Why this matters when you use AI”Expected value is not a gambling sidebar; it is the backbone of how machine-learning systems are trained and how agents decide.
- A loss function is an expected value. Training a model means reducing its average error over the data, which is the expected value of a loss (a random variable, because which example the model sees is effectively random). “Minimize the loss” means “make the expected error small.” The whole optimization is chasing an expected value downward.
- A reward is an expected value. An agent that learns by trial and reward is trying to maximize its expected total reward. It chooses actions by their expected value: the action expected to pay off best over the long run, exactly the decision rule from the game example.
- Performance is an expected value. A model’s accuracy or error rate is an expected value over the distribution of data it will face. Reporting it as a single number is reporting an expectation, which is why the spread (variance) around it, and how it was estimated, matter just as much (the subject of Phase 4).
When you hear that a model “minimizes a loss” or an agent “maximizes reward,” you are hearing expected value at work. This lesson is the definition behind those phrases.
Common pitfalls
Section titled “Common pitfalls”- Treating the expected value as the most likely or a guaranteed value. It is the long-run average. A die’s expected value is 3.5, which never appears; an expected payoff says nothing certain about any single play.
- Ignoring variance. Two options with the same expected value can carry very different risk. Expected value alone does not tell you how bumpy the ride is.
- Expecting the average to show up quickly. Expected value is a long-run idea. A handful of trials can stray far from it; the average only settles down over many repetitions.
- Forgetting to weight by probability. Expected value is not the plain average of the possible values unless they are equally likely; each value must be multiplied by its own probability.
What you should remember
Section titled “What you should remember”- A random variable is a number whose value comes from a random process; discrete ones (counts, dice) have listable values, continuous ones (heights, times) fill a range and are the next lesson’s focus.
- The expected value E[X] is the probability-weighted average, the long-run mean: E[X] = sum of (value x probability). It need not be an achievable outcome (a die’s is 3.5).
- Use expected value to compare uncertain options: the game paying -$0.40 per play is a long-run loser even though you sometimes win.
- The variance (and its square root, the standard deviation) measures how spread out the outcomes are around the expected value; same expectation, more variance, more risk.
- In AI, expected value is the core objective: a loss function is an expected error to minimize, a reward is an expected payoff to maximize, and reported performance is an expectation over the data.