Skip to content

Cheatsheet: the four catastrophic risk categories

The four buckets with sub-mechanisms, analogies, and levers

Section titled “The four buckets with sub-mechanisms, analogies, and levers”
BucketSub-mechanismsHistorical analogyIntervention levers
Malicious use (Ch 1.2)Bioweapons, disinformation, authoritarian control(Implicit: any past tool weaponized at scale)Access controls, content provenance, abuse detection, liability rules
AI race (Ch 1.3)Corporate race, military AI race, natural selection on the AI populationNuclear arms race, gain-of-function researchDeployment moratoria, compute caps, shared eval benchmarks, international coordination, liability rules
Organizational risks (Ch 1.4)Diffused responsibility, absent safety culture, inadequate response infrastructureChallenger (1986), Chernobyl (1986)Safety culture practices, post-mortems, monitoring ownership, HRO-style discipline, third-party audit
Rogue AIs (Ch 1.5)Specification gaming / control drift, instrumental power-seeking, goal drift via intrinsification(Foreshadowed via Ch 5 complex systems, current deployed-model incidents)Alignment research, interpretability, oversight, careful objective design, deployment monitoring

The classify-and-defend three-step protocol

Section titled “The classify-and-defend three-step protocol”

Given a real headline about an AI harm:

  1. Name the bucket. One of the four. If the headline sits across two, name the cross and pick the dominant one with reasoning.
  2. Name the sub-mechanism. From the bucket’s column above. “AI race” is incomplete; “AI race, corporate sub-mechanism, evaluation-window compression under competitor pressure” is complete.
  3. Name a lever. From the bucket’s intervention column. “Regulation” is incomplete; “a regulatory minimum evaluation window decoupling first-to-market from first-to-pass-safety” is complete.

A lever for one bucket usually does not help another:

  • A liability rule for misuse does NOT slow an AI race.
  • A safety-culture overhaul inside one organization does NOT change competitor pressure on others.
  • An interpretability breakthrough does NOT stop a state actor with bad intent.
  • A compute cap does NOT catch a deployed model drifting from its training distribution.

Categorical distinctness is the prerequisite for reasoning about how the buckets connect (Ch 1.6, previewed in L2’s close).

When the question isThe bucket is usually
”Who intended the harm?” → someoneMalicious use
”Why did the safety step get skipped?” → external timeline pressureAI race
”Why was no one watching?” → internal organizational gapOrganizational risk
”Why didn’t patches fix it?” → the system itself is optimizing in an unintended directionRogue AI

Use the disambiguation table as a starter; the classify-and-defend protocol is the rigorous form. The starter helps when a headline is ambiguous; the protocol is what makes the answer defensible.

  • L3 (monitoring + robustness, Ch 3.2-3.3): the deployment-time failure modes the rogue-AI bucket points at, with the specific distinction between robustness failure (system breaks) and monitoring failure (operators do not notice).
  • L4 (alignment, Ch 3.4): specification gaming, proxy gaming, deceptive alignment, three failure modes mostly inside the rogue-AI bucket, worked at full depth.
  • L5 (safety engineering, Ch 4): the cross-disciplinary toolkit that addresses the organizational-risk bucket directly (nines of reliability, defense in depth, fault tree analysis, normal-accident theory).
  • L6 (complex systems, Ch 5): why correct components compose into incorrect systems, the formal cousin of organizational risk.
  • L9 (governance, Ch 8): the AI-race-bucket interventions at a national and international scale (compute governance, treaties, regulatory frameworks).