Cheatsheet: When one event tells you about another: conditional probability and independence
The one idea
Section titled “The one idea”Conditional probability is the chance of A given that B happened. Its biggest trap: P(A given B) is not P(B given A). Different denominators, different answers.
The definition
Section titled “The definition”P(A | B) = P(A and B) / P(B)"given B" = restrict to the world where B happened, then ask what fraction is also A.The denominator P(B) is the new, smaller world.Reading a two-way table
Section titled “Reading a two-way table” Test positive Test negative Total Has condition 80 20 100 Healthy 90 810 900 Total 170 830 1000
P(positive | condition) = 80/100 = 0.80 (restrict to the condition ROW)P(condition | positive) = 80/170 = 0.47 (restrict to the positive COLUMN)Restrict to the given event’s row/column; divide the joint cell by that total.
The big warning: do not flip the bar
Section titled “The big warning: do not flip the bar”P(A | B) is NOT P(B | A)."90% of sick people test positive" is NOT "90% of positives are sick."Flipping the condition flips the denominator. (Base-rate neglect / prosecutor's fallacy.)Bayes' theorem (next lesson) converts one direction into the other correctly.Multiplication rule and independence
Section titled “Multiplication rule and independence”General (any events): P(A and B) = P(B) x P(A | B) Two aces, no replacement: 4/52 x 3/51 = 1/221.Independence: A, B independent <=> P(A | B) = P(A) Then it collapses to the simple rule: P(A and B) = P(A) x P(B).In machine learning
Section titled “In machine learning”- Classifier: estimates P(label | inputs) (spam filter: P(spam | the words)).
- Language model: computes P(next word | previous words), then samples.
- Reading outputs: never read P(positive | sick) as P(sick | positive); the base rate separates them.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Swapping P(A | B) for P(B | A) (the error to fear most).
- Treating dependent events as independent (use the conditional factor).
- Reading a high conditional as causation (it is association).
- Losing track of which denominator (whole space vs the B-cases).
Words to use precisely
Section titled “Words to use precisely”- Conditional probability: P(A | B), the chance of A given B happened.
- General multiplication rule: P(A and B) = P(B) x P(A | B), for any events.
- Independent: P(A | B) = P(A); knowing B does not change A.
- Base-rate neglect: ignoring how common A is when flipping a conditional.