Summary: When one event tells you about another: conditional probability and independence

Conditional probability is the chance of one thing given that another happened, and it is the engine under classifiers, language models, and the Bayes lesson coming next. The previous lesson’s multiplication rule needed independence; this lesson handles the dependent events that matter most in AI, where knowing one thing changes the odds of another. Its single most important warning: the chance of A given B is not the chance of B given A. This summary is the scan-in-five-minutes version of the full lesson.

Core ideas

What “given” means. P(A | B) is the probability of A given that B happened, equal to P(A and B) / P(B). Learning B shrinks the world to the B-cases and asks what fraction of those are also A. The denominator is the new, smaller world.
Read it off a two-way table. Restrict to the row or column of the given event, then divide. A screening table with 100 cases among 1,000 people gave P(positive | condition) = 0.80 but P(condition | positive) = 0.47.
P(A given B) is not P(B given A). This is the costliest error in the subject (base-rate neglect, the prosecutor’s fallacy). “Most sick people test positive” is not “most positives are sick.” Flipping the bar flips the denominator and changes the answer; the next lesson, Bayes, converts one direction into the other correctly.
The general multiplication rule. P(A and B) = P(B) x P(A | B), valid for any events. Two aces drawn without replacement: 4/52 x 3/51 = 1/221, because the second draw depends on the first.
Independence is a special case. A and B are independent exactly when P(A | B) = P(A), knowing B does not change A. Then the rule collapses to the simple P(A) x P(B) from the previous lesson.
It is most of machine-learning prediction. A classifier estimates P(label | inputs); a language model computes P(next word | previous words) and samples. Conditional probability is the form prediction usually takes.

What changes for you

You gain the most useful single tool in probability and the reflex that protects you from its most common misuse. The tool: when something is already known, you condition on it, narrowing the world and recomputing. The reflex: whenever you meet a conditional claim (“X% of A are B,” “the chance of this given that”), you check which way the bar points before acting, because the flipped version is a different and often wildly different number. In an AI setting that reflex is what stops you from reading a model’s P(positive given sick) as P(sick given positive), the exact confusion the next lesson, Bayes’ theorem, exists to resolve.