Skip to content

Counts and trials: the binomial distribution

A huge number of real questions have the same shape: out of some fixed number of tries, how many succeed? How many of 10 incoming emails are spam? How many of 100 visitors sign up? How many of a model’s 5 predictions are correct? Each is a count of successes in a fixed number of yes-or-no trials, and the distribution that answers them all is the binomial distribution. It is the discrete workhorse, the counterpart to the previous lesson’s continuous bell, and it shows up anywhere you count “how many out of N.”

A situation is binomial when all four of these hold:

  • A fixed number of trials. You decide up front how many tries: 10 emails, 5 predictions, 100 visitors. This is the trial count.
  • Each trial has two outcomes. Success or failure, yes or no, the single-trial case called a Bernoulli trial (spam or not, correct or wrong).
  • A constant probability of success. Each trial has the same chance of success.
  • Independent trials. One trial’s outcome does not change the others (the independence idea from the probability lessons).

The binomial random variable is then X = the number of successes across all the trials. If any condition fails (the probability drifts, or the trials influence each other), the simple binomial does not apply, the same fine print that ran through the probability lessons.

The probability of exactly a given number of successes

Section titled “The probability of exactly a given number of successes”

To find the chance of an exact number of successes in a fixed number of trials, you need two ingredients: how likely any one specific arrangement of those successes is, and how many such arrangements there are.

Any one specific sequence with that many successes and the rest failures has probability equal to the success probability multiplied by itself once per success, and the failure probability (one minus the success probability) multiplied by itself once per failure. But the successes can fall in many different positions among the trials, and each arrangement counts. The number of ways to choose which trials are the successes is the combinations count shown in the formula below, a counting number. Putting it together:

P(exactly k successes) = C(n, k) x p^k x (1 - p)^(n - k)
C(n, k) how many ways k successes can be arranged among n trials
p^k the probability of the k successes
(1 - p)^(n - k) the probability of the (n - k) failures

Make it concrete with three coin flips and the question “exactly 2 heads” (so the number of trials is 3, the success probability is 0.5, and the number of successes is 2):

C(3, 2) = 3 (the 2 heads can land as HHT, HTH, or THH)
P(exactly 2 heads) = 3 x (0.5)^2 x (0.5)^1 = 3 x 0.125 = 0.375 = 3/8

Check it by listing all 8 equally likely outcomes (HHH, HHT, HTH, HTT, THH, THT, TTH, TTT): exactly three of them have two heads, so 3/8. The formula and the count agree.

Now a less symmetric case that looks like AI. A model is 80% accurate (a success probability of 0.8). Over 5 independent predictions, what is the chance exactly 4 are correct?

C(5, 4) = 5 (the one wrong prediction could be any of the 5)
P(exactly 4 correct) = 5 x (0.8)^4 x (0.2)^1 = 5 x 0.4096 x 0.2 = 0.4096

About a 41% chance of exactly 4 right out of 5. Notice the formula handles a success probability of 0.8 just as easily as the coin’s 0.5.

The binomial PMF for n equals 5 and p equals 0.8, with the highest bar at k equals 4 A probability bar chart with the number of successes k on the horizontal axis from 0 to 5 and the probability on the vertical axis up to about 0.45. Six blue bars at k equals 0, 1, 2, 3, 4, 5 have heights 0.0003, 0.006, 0.051, 0.205, 0.410, 0.328 respectively. The tallest bar is at k equals 4, the mean (n times p equals 4) marked with a vertical accent purple dashed line. The legend gives the formula P of X equals k equals n-choose-k times p to the k times one minus p to the n minus k and the parameter values. k probability 0 1 2 3 4 5 0.1 0.2 0.3 0.4 0.000 0.006 0.051 0.205 0.410 0.328 mean = np = 4 binomial PMF: P(X = k) = C(n, k) · p^k · (1-p)^(n-k) n = 5, p = 0.8 peak at k = 4 = np
Five trials, each with 80 percent success. The probability of exactly k successes peaks at k = 4 (the mean is np = 4), with 5 successes the next most likely outcome. Probabilities of 0, 1, 2 are tiny because the per-trial success rate is so high. The shape is the binomial distribution.

The expected count: trials times the success probability

Section titled “The expected count: trials times the success probability”

You do not need the full formula to answer “how many successes should I expect on average.” The expected value of a binomial is simply the number of trials times the probability of success, that is

E[X] = n x p

Five predictions at 80% accuracy: expect 5 times 0.8, which is 4 correct. A hundred visitors at a 3% sign-up rate: expect 100 times 0.03, which is 3 sign-ups. This is the expected value from the previous lesson, specialized to counting, and it matches intuition exactly: across all the trials, each succeeds a fraction of the time equal to the success probability. (For completeness, the variance is the number of trials times the success probability times one minus the success probability, largest when the success probability is near 0.5.)

A common trap: “exactly this many” and “at least this many” are different questions. The formula gives the chance of exactly a given number of successes. For “at least” that number, you add up the probabilities of that count, the next count, and so on all the way up to every trial succeeding, or, when the target count is small, use the complement trick from the probability lessons. “At least one success” is almost always easiest as one minus the probability of zero successes, which is one minus the failure probability raised to the number of trials. Reaching for the exact-count formula when the question says “at least” is a frequent and avoidable error.

The binomial is the natural model for counting successes, which AI does constantly.

  • Accuracy is a binomial count. A model’s number of correct predictions on a test set of some fixed size is a binomial count: one trial per example, each correct with some probability equal to the model’s true accuracy. This is why an accuracy measured on a small test set is noisy, and it is the foundation for asking how confident you can be in a reported accuracy (the inference phase, next).
  • Conversions and rates. Sign-ups out of visitors, clicks out of impressions, defects out of items inspected, all are binomial counts, and the trials-times-success-probability expected value is the back-of-envelope estimate. A/B testing, in the next phase, compares two such binomial rates.
  • The bridge to the normal. When the trial count is large, the binomial’s bumpy bar chart smooths into the bell of the previous lesson. That is a first glimpse of the next phase’s central limit theorem and the reason the normal approximation is used for large counts.

When you read “the model got 47 of 50 right” or “3% of visitors converted,” you are reading a binomial count, and this lesson is how to reason about the chances behind it.

  • Using the binomial when the conditions fail. If the success probability drifts between trials or the trials are not independent, the simple binomial does not apply. Check the four conditions first.
  • Confusing “exactly this many” with “at least this many.” The formula gives an exact count; “at least” requires a sum or a complement. Mixing them up is a common error.
  • Dropping the combinations factor. The probability of one specific arrangement is not the answer; you must multiply by the number of arrangements, or you will badly undercount.
  • Treating trials times the success probability as guaranteed. The expected count is a long-run average, not a promise: 5 predictions at 80% gives an expected 4 correct, but any single run of 5 might yield 3 or 5.
  • The binomial distribution models the number of successes across a fixed number of independent yes-or-no trials with a constant probability of success; all four conditions (a fixed number of trials, two outcomes, a constant success probability, and independence) must hold.
  • The probability of exactly a given number of successes is the combinations count times the success probability raised to the number of successes times the failure probability raised to the number of failures: the chance of one arrangement times the number of arrangements. Three coins give a 2-heads probability of 3/8; an 80%-accurate model gives a 4-of-5-correct probability of about 0.41.
  • The expected number of successes is the shortcut the number of trials times the probability of success (5 predictions at 80% gives 4 expected).
  • “Exactly this many” is not “at least this many”; the latter needs a sum or the complement (one minus the probability of none, for “at least one”).
  • In AI the binomial underlies accuracy as a count of correct predictions, conversion and click rates, and, for a large number of trials, the normal approximation that leads into the next phase.