Skip to content

The four catastrophic risk categories

L2 is the bucket-elaboration lesson. L1 named the four buckets in one sentence each and asked you to do a four-headline sort. This lesson moves the bar to defense: given a real headline, name the bucket, name the specific sub-mechanism inside the bucket that produced the harm, and name the kind of intervention that would change the dial.

The lesson takes the four buckets in the textbook’s order (Hendrycks Ch 1.2 through Ch 1.5). For each bucket, it names two or three of the sub-mechanisms Hendrycks describes, points to the historical analogy the chapter uses to anchor the bucket (Challenger and Chernobyl for organizational risks; nuclear arms race and gain-of-function research for the AI race), and names one or two intervention levers that operate inside that bucket. The closing section makes the categorical-distinctness rule explicit: levers for one bucket usually do not help another, which is what makes the typology operationally useful rather than rhetorical.

The lesson body uses verbatim quotes from each chapter section anchored to the source under A1 discipline, with attribution inline.

This is lesson 2 of 9 and the second lesson of Phase 1 (the risks landscape). The previous lesson, AI safety as a field, framed the discipline and named the four buckets without elaborating them. The next lesson, Monitoring and robustness (L3), enters Hendrycks’ Chapter 3 (Single Agent Safety) and works the deployment-time failure modes that the rogue-AI and organizational-risk buckets point at. Phase 1 closes after L2; Phase 2 (safety and alignment) opens at L3.

Prerequisites: L1 (AI safety as a field). The four-bucket vocabulary should already feel available; this lesson assumes the L1 paragraph-write capability as the baseline.

The reason this lesson moves the bar from “name the bucket” to “name the bucket and defend it” is practical. A bucket label without a mechanism is a posture; the mechanism is what tells you which interventions are even plausible. “AI race” without “the competitor announced a release for next month and the evaluation window got compressed” tells you nothing about whether a compute cap, a liability rule, a shared eval, or a deployment moratorium is the right lever to reach for. The mechanism is what makes the bucket actionable. The lever is what makes the lesson useful to anyone with policy authority.

  • Classify a real news headline about AI harm into one of Hendrycks’ four buckets and defend the choice with one specific sub-mechanism and one specific intervention lever
  • Name at least two sub-mechanisms inside each bucket
  • Recognize when a headline genuinely sits across two buckets and pick the dominant one with reasoning
  • Identify one outside-AI historical analogy Hendrycks uses to anchor each bucket (where named)
  • State the categorical-distinctness rule: interventions usually do not translate across buckets
  • Read time: about 14 minutes (the bucket-by-bucket walk is denser than L1; each bucket has 3 sub-mechanisms and intervention surface)
  • Practice time: about 18 minutes (six headlines to classify-and-defend, a sub-mechanism matching exercise, a defense-writing exercise, and 10 flashcards)
  • Difficulty: deep (Stage E specialized; L1 paragraph-write capability assumed)