Skip to content

Cheatsheet: Random variables and expected value

A random variable is a number whose value comes from chance. Its expected value is the long-run, probability-weighted average, the backbone of machine-learning objectives.

Random variable: a number set by a random outcome.
Discrete = listable values (die roll, counts). Continuous = a range (heights, times).
Probability distribution: each possible value paired with its probability (probabilities sum to 1).
E[X] = sum over all values of (value x its probability) = long-run average
Fair die: E[X] = (1+2+3+4+5+6)/6 = 21/6 = 3.5 (never an actual roll: it's an average)
Decision: win $10 w.p. 0.2, lose $3 w.p. 0.8
E = (+10)(0.2) + (-3)(0.8) = 2 - 2.4 = -0.40 per play (long-run loser)

The expected value need not be an achievable outcome. Weight by probability (not a plain average unless values are equally likely).

Var(X) = sum over all values of (value - E[X])^2 x its probability
Std = square root of Var(X) (back in original units)
$1 coin bet (+1 / -1, each 0.5):
E[X]=0, Var = (1)^2(0.5)+(-1)^2(0.5) = 1, Std = 1

Same expected value + more variance = more risk in any single outcome.

PhraseMeans
”Minimize the loss”Make the expected error (a random variable) small
”Maximize reward”Maximize expected total reward; pick actions by expected value
”95% accurate”An expected value over the data distribution
  • Treating the expected value as the most likely or guaranteed value (it is a long-run average).
  • Ignoring variance (same expected value can mean very different risk).
  • Expecting the average to appear in a few trials (it is a long-run idea).
  • Forgetting to weight by probability (not a plain average unless equally likely).
  • Random variable: a number whose value is a random outcome.
  • Expected value E[X]: the probability-weighted average; the long-run mean.
  • Variance: probability-weighted average squared distance from E[X].
  • Standard deviation: square root of variance; spread in original units.