Four catastrophic AI risks: cheatsheet

The four buckets with sub-mechanisms, analogies, and levers

Bucket	Sub-mechanisms	Historical analogy	Intervention levers
Malicious use (Ch 1.2)	Bioweapons, disinformation, authoritarian control	(Implicit: any past tool weaponized at scale)	Access controls, content provenance, abuse detection, liability rules
AI race (Ch 1.3)	Corporate race, military AI race, natural selection on the AI population	Nuclear arms race, gain-of-function research	Deployment moratoria, compute caps, shared eval benchmarks, international coordination, liability rules
Organizational risks (Ch 1.4)	Diffused responsibility, absent safety culture, inadequate response infrastructure	Challenger (1986), Chernobyl (1986)	Safety culture practices, post-mortems, monitoring ownership, HRO-style discipline, third-party audit
Rogue AIs (Ch 1.5)	Specification gaming / control drift, instrumental power-seeking, goal drift via intrinsification	(Foreshadowed via Ch 5 complex systems, current deployed-model incidents)	Alignment research, interpretability, oversight, careful objective design, deployment monitoring

The classify-and-defend three-step protocol

Given a real headline about an AI harm:

Name the bucket. One of the four. If the headline sits across two, name the cross and pick the dominant one with reasoning.
Name the sub-mechanism. From the bucket’s column above. “AI race” is incomplete; “AI race, corporate sub-mechanism, evaluation-window compression under competitor pressure” is complete.
Name a lever. From the bucket’s intervention column. “Regulation” is incomplete; “a regulatory minimum evaluation window decoupling first-to-market from first-to-pass-safety” is complete.

Cross-bucket distinctness rule

A lever for one bucket usually does not help another:

A liability rule for misuse does NOT slow an AI race.
A safety-culture overhaul inside one organization does NOT change competitor pressure on others.
An interpretability breakthrough does NOT stop a state actor with bad intent.
A compute cap does NOT catch a deployed model drifting from its training distribution.

Categorical distinctness is the prerequisite for reasoning about how the buckets connect (Ch 1.6, previewed in L2’s close).

Quick disambiguation cheatsheet

When the question is	The bucket is usually
”Who intended the harm?” → someone	Malicious use
”Why did the safety step get skipped?” → external timeline pressure	AI race
”Why was no one watching?” → internal organizational gap	Organizational risk
”Why didn’t patches fix it?” → the system itself is optimizing in an unintended direction	Rogue AI

Use the disambiguation table as a starter; the classify-and-defend protocol is the rigorous form. The starter helps when a headline is ambiguous; the protocol is what makes the answer defensible.

What this lesson builds toward

L3 (monitoring + robustness, Ch 3.2-3.3): the deployment-time failure modes the rogue-AI bucket points at, with the specific distinction between robustness failure (system breaks) and monitoring failure (operators do not notice).
L4 (alignment, Ch 3.4): specification gaming, proxy gaming, deceptive alignment, three failure modes mostly inside the rogue-AI bucket, worked at full depth.
L5 (safety engineering, Ch 4): the cross-disciplinary toolkit that addresses the organizational-risk bucket directly (nines of reliability, defense in depth, fault tree analysis, normal-accident theory).
L6 (complex systems, Ch 5): why correct components compose into incorrect systems, the formal cousin of organizational risk.
L9 (governance, Ch 8): the AI-race-bucket interventions at a national and international scale (compute governance, treaties, regulatory frameworks).