Cheatsheet: the four catastrophic risk categories
The four buckets with sub-mechanisms, analogies, and levers
Section titled “The four buckets with sub-mechanisms, analogies, and levers”| Bucket | Sub-mechanisms | Historical analogy | Intervention levers |
|---|---|---|---|
| Malicious use (Ch 1.2) | Bioweapons, disinformation, authoritarian control | (Implicit: any past tool weaponized at scale) | Access controls, content provenance, abuse detection, liability rules |
| AI race (Ch 1.3) | Corporate race, military AI race, natural selection on the AI population | Nuclear arms race, gain-of-function research | Deployment moratoria, compute caps, shared eval benchmarks, international coordination, liability rules |
| Organizational risks (Ch 1.4) | Diffused responsibility, absent safety culture, inadequate response infrastructure | Challenger (1986), Chernobyl (1986) | Safety culture practices, post-mortems, monitoring ownership, HRO-style discipline, third-party audit |
| Rogue AIs (Ch 1.5) | Specification gaming / control drift, instrumental power-seeking, goal drift via intrinsification | (Foreshadowed via Ch 5 complex systems, current deployed-model incidents) | Alignment research, interpretability, oversight, careful objective design, deployment monitoring |
The classify-and-defend three-step protocol
Section titled “The classify-and-defend three-step protocol”Given a real headline about an AI harm:
- Name the bucket. One of the four. If the headline sits across two, name the cross and pick the dominant one with reasoning.
- Name the sub-mechanism. From the bucket’s column above. “AI race” is incomplete; “AI race, corporate sub-mechanism, evaluation-window compression under competitor pressure” is complete.
- Name a lever. From the bucket’s intervention column. “Regulation” is incomplete; “a regulatory minimum evaluation window decoupling first-to-market from first-to-pass-safety” is complete.
Cross-bucket distinctness rule
Section titled “Cross-bucket distinctness rule”A lever for one bucket usually does not help another:
- A liability rule for misuse does NOT slow an AI race.
- A safety-culture overhaul inside one organization does NOT change competitor pressure on others.
- An interpretability breakthrough does NOT stop a state actor with bad intent.
- A compute cap does NOT catch a deployed model drifting from its training distribution.
Categorical distinctness is the prerequisite for reasoning about how the buckets connect (Ch 1.6, previewed in L2’s close).
Quick disambiguation cheatsheet
Section titled “Quick disambiguation cheatsheet”| When the question is | The bucket is usually |
|---|---|
| ”Who intended the harm?” → someone | Malicious use |
| ”Why did the safety step get skipped?” → external timeline pressure | AI race |
| ”Why was no one watching?” → internal organizational gap | Organizational risk |
| ”Why didn’t patches fix it?” → the system itself is optimizing in an unintended direction | Rogue AI |
Use the disambiguation table as a starter; the classify-and-defend protocol is the rigorous form. The starter helps when a headline is ambiguous; the protocol is what makes the answer defensible.
What this lesson builds toward
Section titled “What this lesson builds toward”- L3 (monitoring + robustness, Ch 3.2-3.3): the deployment-time failure modes the rogue-AI bucket points at, with the specific distinction between robustness failure (system breaks) and monitoring failure (operators do not notice).
- L4 (alignment, Ch 3.4): specification gaming, proxy gaming, deceptive alignment, three failure modes mostly inside the rogue-AI bucket, worked at full depth.
- L5 (safety engineering, Ch 4): the cross-disciplinary toolkit that addresses the organizational-risk bucket directly (nines of reliability, defense in depth, fault tree analysis, normal-accident theory).
- L6 (complex systems, Ch 5): why correct components compose into incorrect systems, the formal cousin of organizational risk.
- L9 (governance, Ch 8): the AI-race-bucket interventions at a national and international scale (compute governance, treaties, regulatory frameworks).