Practice: When one event tells you about another: conditional probability and independence

Two skills: computing a conditional probability off a table (restrict, then divide), and the reflex that prevents the costliest error in the subject, never swapping the chance of A given B for the chance of B given A. Keep a scratchpad.

Self-check

Six short questions. Answer each in your head before opening the collapsible.

1. What does P(A given B) mean, and what is its formula?

Show answer

It is the probability of A given that B has happened: P(A and B) divided by P(B). Learning B shrinks the world to just the cases where B is true, and you ask what fraction of those also have A. The denominator is the new, smaller world.

2. How do you read a conditional probability off a two-way table?

Show answer

Restrict to the row or column for the condition, then divide. For P(A given B), use the B row (or column) as the denominator and the A-and-B cell as the numerator. The given event picks the row/column; you divide within it.

3. Why is P(A given B) generally not equal to P(B given A)?

Show answer

Because the two have different denominators: P(A given B) divides by the B-cases, P(B given A) divides by the A-cases. Flipping the bar flips the denominator, so the numbers usually differ (test catches 80% of cases, yet a positive may mean only 47% chance of the condition). Swapping them is base-rate neglect / the prosecutor’s fallacy.

4. State the general multiplication rule, and how independence simplifies it.

Show answer

P(A and B) = P(B) times P(A given B), valid for any events. If A and B are independent, P(A given B) = P(A), and it collapses to P(A) times P(B), the simple rule from the previous lesson. Independence is the special case where the conditional equals the unconditional.

5. How is independence defined in terms of conditional probability?

Show answer

A and B are independent when P(A given B) = P(A): knowing B happened does not change the probability of A. If the conditional differs from the unconditional, the events are dependent.

6. Where does conditional probability appear in a language model?

Show answer

A language model generates text by computing P(next word given all the words so far) and sampling from it. The “given the words so far” is exactly what makes the words dependent rather than independent.

Try it yourself: read the two-way table

A spam filter is studied on 200 emails. Here is how spam status and the presence of the word “free” cross-classify:

                 contains "free"   no "free"    total
  spam                 40              10          50
  not spam             20             130         150
  total                60             140         200

Compute each, then check:

1. P(contains "free" | spam)
2. P(spam | contains "free")
3. P(spam)  (the unconditional base rate)
4. Are "spam" and "contains free" independent?

Show answer

1. Restrict to the spam row (total 50):
   P("free" | spam) = 40 / 50 = 0.80

2. Restrict to the "contains free" column (total 60):
   P(spam | "free") = 40 / 60 = 0.667 (about 0.67)

3. P(spam) = 50 / 200 = 0.25

4. Independent would mean P(spam | "free") = P(spam).
   But 0.67 is not 0.25, so they are NOT independent:
   seeing "free" raises the spam probability from 0.25 to 0.67.
   (That is exactly why the word is a useful spam signal.)

Notice items 1 and 2: P(“free” given spam) = 0.80 but P(spam given “free”) = 0.67. Same two events, different conditionals, because the denominators differ (50 spam emails vs 60 emails containing “free”). Never swap them.

Try it yourself: spot the flipped conditional (and judge independence)

For A and B, name the error. For C and D, say whether the events are independent or dependent.

A. "90% of people with the flu have a fever, so if you have a fever there's
   a 90% chance you have the flu."
B. "Almost every fraudulent transaction trips at least one rule, so almost
   every transaction that trips a rule is fraud."
C. Drawing two cards from a deck one after another WITHOUT putting the
   first back.
D. Two unrelated users in different countries each loading the homepage.

Show answer

A: flipped conditional. It states P(fever given flu) = 0.90 and treats it as P(flu given fever). Those differ because most fevers are not flu (other illnesses, a much larger base) and most people do not have flu. Base-rate neglect.
B: flipped conditional. P(trips a rule given fraud) being high is not P(fraud given trips a rule). Since legitimate transactions vastly outnumber fraud, many rule-trips are false alarms (the base-rate trap from lesson 1).
C: dependent. Removing the first card changes the deck for the second draw, so the second card’s probabilities depend on the first.
D: independent. Two unrelated users acting separately; one loading the page tells you nothing about the other.

The reflex: when a claim flips “X given Y” into “Y given X,” stop and ask about the base rate before believing it.

Flashcards

Eight cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page for offline review.

Q. What does P(A | B) mean, and what is its formula?

The probability of A given that B happened: P(A and B) / P(B). Learning B restricts the world to the B-cases, and you ask what fraction of those also have A.

Q. How do you read a conditional probability off a two-way table?

Restrict to the row or column of the given event, then divide the joint cell by that row/column total. The condition picks the denominator.

Q. Why is P(A | B) generally not P(B | A)?

Different denominators: P(A|B) divides by the B-cases, P(B|A) by the A-cases. Flipping the bar flips the denominator and changes the answer. Swapping them is base-rate neglect / the prosecutor’s fallacy.

Q. State the general multiplication rule for dependent events.

P(A and B) = P(B) x P(A | B). It works for any events; the conditional factor handles dependence. Two aces without replacement: 4/52 x 3/51 = 1/221.

Q. How is independence defined using conditional probability?

A and B are independent when P(A | B) = P(A): knowing B does not change A’s probability. Then the multiplication rule collapses to P(A) x P(B).

Q. What does a classifier compute, in terms of conditional probability?

The probability of a label given the inputs: a spam filter estimates P(spam | the words), a triage model estimates P(condition | the symptoms). Classification is conditional probability.

Q. How does a language model use conditional probability?

It computes P(next word | all previous words) and samples from it, repeatedly. The conditioning on previous words is what makes the words dependent.

Q. Does a high P(A | B) mean B causes A?

No. It means B is informative about A (they are associated). Conditioning is association, not causation, the same caution as correlation-is-not-causation.