Updating beliefs with evidence: Bayes' theorem

The previous lesson left a cliffhanger: the chance of A given B is not the chance of B given A, and we needed a way to convert one into the other. That way is Bayes’ theorem, and it is one of the most important ideas in this whole track. Bayes is the mathematics of changing your mind correctly: you start with a belief, evidence arrives, and Bayes tells you exactly how much to update. It is the formal engine under lesson 1’s base-rate example, under spam filters, and under any system that combines what it already knew with what it just observed.

The idea before the formula

Strip Bayes down and it is a sentence: a new belief equals your old belief, adjusted by how well the evidence fits. You begin with a prior (how likely the hypothesis was before the evidence), you observe something, and you end with a posterior (how likely the hypothesis is now). The adjustment depends on how strongly the evidence points at the hypothesis versus the alternatives.

The trap Bayes protects you from is forgetting the prior. A test that is excellent at detecting a rare disease still has to fight the rarity: if almost nobody has the disease, even a strong positive lands mostly on healthy people. Bayes is what keeps the base rate in the calculation instead of letting a scary-sounding result run away with your belief.

Build it from natural frequencies

The most intuitive way to do Bayes needs no formula, just counting a concrete population, which is exactly the two-way table from the previous lesson. Return to lesson 1’s example and make it concrete with 10,000 people:

Disease affects 1 in 100.  Test is 99% accurate both ways.

                  Test positive    Test negative    Total
  Has disease          99               1             100
  Healthy              99            9,801          9,900
  Total               198            9,802         10,000

(Of 100 sick people, 99% = 99 test positive. Of 9,900 healthy people, 1% = 99 test positive.)

You tested positive, so restrict to the “test positive” column: 198 people, of whom 99 are actually sick.

P(disease | positive) = 99 / 198 = 0.50

Fifty percent, exactly the answer from lesson 1, now derived by Bayesian counting. The 99 true positives are matched one-for-one by 99 false positives, because the healthy group is so much larger that even its tiny 1% error rate produces just as many positives as the disease itself. That is the base rate doing its work, and counting natural frequencies makes it obvious.

Pour 10,000 people through the tree. One percent (100) have the disease; of those, 99 test positive. Of the 9,900 healthy people, 1 percent (99) test positive too. So 198 positive tests total, half from each branch. A positive test alone is only a 50-50 signal. Same answer as the L1 grid; the tree route emphasizes how the probabilities multiply at each branch.

The formula

The same logic written symbolically is Bayes’ theorem. With H for the hypothesis (has the disease) and E for the evidence (tested positive):

                P(E | H) x P(H)
  P(H | E) = ---------------------
                     P(E)

  P(H)      the prior        (the base rate: how likely H was beforehand)
  P(E | H)  the likelihood   (how well the evidence fits H: the test's hit rate)
  P(E)      the evidence     (total chance of seeing E, from all sources)
  P(H | E)  the posterior    (the updated belief you want)

The denominator P(E) is the total probability of the evidence, adding the true positives and the false positives:

  P(E) = P(E | H) x P(H)  +  P(E | not H) x P(not H)

Plug in the disease numbers: prior P(H) = 0.01, likelihood P(E given H) = 0.99, and the false-positive piece P(E given not H) = 0.01 on the healthy 0.99.

  P(E) = (0.99 x 0.01) + (0.01 x 0.99) = 0.0099 + 0.0099 = 0.0198
  P(H | E) = (0.99 x 0.01) / 0.0198 = 0.0099 / 0.0198 = 0.50

Same 50%. The formula and the counting agree, because they are the same idea. Use whichever is clearer for the problem in front of you; natural frequencies are usually easier to reason about, and the formula is easier to compute with once the pieces are named.

Updating again: today’s posterior is tomorrow’s prior

Bayes really earns its keep when evidence keeps arriving. The posterior after the first piece of evidence becomes the prior for the next. Suppose, worried by the first positive, you take a second independent test and it is also positive. Now your prior is no longer 0.01; it is the 0.50 you just computed.

  New prior P(H) = 0.50
  P(E) = (0.99 x 0.50) + (0.01 x 0.50) = 0.495 + 0.005 = 0.50
  P(H | E) = (0.99 x 0.50) / 0.50 = 0.495 / 0.50 = 0.99

Two independent positive tests take you from a 1% base rate to a 50% belief to a 99% belief. Each piece of evidence updates the last, and the same machine handles all of it. This is exactly how a system accumulates confidence as data comes in: start with what you knew, multiply in each new observation, renormalize.

Why this matters when you use AI

Bayes is not just a probability exercise; it is a way of building and reading AI systems.

Spam filters and “naive Bayes.” A classic spam filter estimates the probability an email is spam by combining the prior rate of spam with the likelihoods of each word given spam-or-not, using Bayes. It is called “naive” because it assumes the words are independent given the class, the independence idea from the previous two lessons, applied (and knowingly oversimplified) to make the math tractable.
Combining a base rate with a signal. Any detector’s output has to be combined with how common the thing is to get the real probability. A model that flags fraud with a high hit rate still produces mostly false alarms when fraud is rare, and Bayes is the calculation that tells you the actual chance a flag is real. Reading a model output without its prior is base-rate neglect.
Updating with new data. The “posterior becomes the next prior” pattern is the heart of Bayesian approaches to learning: beliefs are distributions that sharpen as evidence accumulates, rather than fixed guesses. Even outside formally Bayesian models, the mindset (hold a prior, update it in proportion to the evidence) is how a careful practitioner reasons about uncertain results.

The single most valuable habit Bayes gives you: when a result arrives, ask “what did I believe before, and how strong is this evidence really?” rather than letting the result alone set your belief.

Common pitfalls

Ignoring the prior (base-rate neglect). The headline error, and the reason a 99%-accurate test can be 50% right. The prior is part of the formula; skipping it overstates the posterior, often massively.
Confusing the likelihood with the posterior. P(evidence given hypothesis) is not P(hypothesis given evidence). Bayes exists precisely to convert the first into the second; treating them as equal is the previous lesson’s flipped-bar error.
Forgetting the false positives in P(E). The denominator must include evidence from the alternatives (the healthy people who test positive). Leaving them out inflates the posterior.
Expecting certainty from one piece of evidence. A single positive rarely settles the question when the base rate is low; it takes accumulating evidence (the second test) to reach high confidence.

What you should remember

Bayes’ theorem converts P(evidence given hypothesis) into P(hypothesis given evidence): posterior = (likelihood x prior) / evidence. It is how to update a belief correctly when new information arrives.
The four parts are the prior (the base rate beforehand), the likelihood (how well the evidence fits), the evidence (its total probability, including false positives), and the posterior (the updated belief).
Natural frequencies (counting a concrete population) give the same answer as the formula and are usually more intuitive; lesson 1’s 50% falls straight out of either.
Today’s posterior is tomorrow’s prior: evidence accumulates, and two independent positive tests can move a 1% base rate to 50% to 99%.
In AI, Bayes underlies spam filtering, the discipline of combining a detector’s output with the base rate, and the mindset of updating beliefs as data arrives; ignoring the prior is base-rate neglect.