Skip to content

References: the four catastrophic risk categories

Dan Hendrycks. Introduction to AI Safety, Ethics, and Society. Taylor & Francis, 2024. Center for AI Safety, free to read at aisafetybook.com. L2 draws from Chapter 1, sections 1.2 through 1.5, each devoted to one of the four risk buckets.

Chapter sectionTopicURL
Ch 1.2Malicious Useaisafetybook.com/textbook/malicious-use
Ch 1.3AI Raceaisafetybook.com/textbook/ai-race
Ch 1.4Organizational Risksaisafetybook.com/textbook/organizational-risks
Ch 1.5Rogue AIsaisafetybook.com/textbook/rogue-ai

Each quote in the lesson body carries its chapter-section anchor inline. For convenience, the anchored quotes are listed here:

  • §1.2 Malicious Use, on bioweapons: “expedite the discovery of new, more deadly chemical and biological weapons by generating novel toxic molecules or proteins” and “step-by-step instructions to potential bioterrorists.”
  • §1.2 Malicious Use, on disinformation: “generate personalized false narratives tailored to specific individuals” and “exploit people’s trust if they have access to extensive personal information.”
  • §1.3 AI Race, on corporate race: “cut corners on safety testing and training.”
  • §1.3 AI Race, on military automation: “don’t have to risk soldiers’ lives” (on reduced political friction for autonomous systems) and “automatic retaliation systems” (as a flagged subclass with escalation risk).
  • §1.3 AI Race, on AI population dynamics: “selfish AIs willing to break laws or deceive humans can outcompete more restrictive AIs.”
  • §1.4 Organizational Risks, core framing: “accidents are hard to avoid when dealing with complex systems such as AI. Without building a culture of safety, it is likely that there will be accidents in AI development and deployment.”
  • §1.4 Organizational Risks, on Challenger: “organizational negligence, not competition.”
  • §1.4 Organizational Risks, on Chernobyl: “poor safety protocols and an inadequately prepared crew.”
  • §1.5 Rogue AIs, core framing: “we already face issues in controlling the goals of current-day AI systems. If this is also true with future AI systems that are more powerful and more integrated with our economies and militaries, we could see dangerous rogue AI systems emerge.”
  • §1.5 Rogue AIs, on instrumental power-seeking: “view gaining more control over [their] surroundings as instrumentally helpful.”
  • §1.5 Rogue AIs, on goal drift via intrinsification: “intrinsically valu[ing] those conditions too and seek them out regardless of the original goals.”

A1 discipline: every quoted passage is verbatim from the cited section, no paraphrasing inside quote marks, no extension, no collapse, no stitch. The square-bracketed insertions in §1.5 quotes are grammatical clarifications (their / valuing) per standard editorial convention; they are signposted with brackets as the source convention requires.

Same posture as L1: the CAIS textbook is © 2026 Center for AI Safety, published by Taylor & Francis, free to read online with no explicit Creative Commons or reuse license. This lesson is a structural mirror with verbatim quotes anchored to specific chapter sections within fair-use limits, link-out only, no embed, no derivative runs.

These are not required for L2; they extend each bucket if a reader wants to go deeper before L3.

  • Malicious use: the Center for AI Safety’s catastrophic risks explainer at safe.ai for the public-facing version of Hendrycks’ framing on bioweapons and disinformation; the Misuses of AI literature review accumulated by AI policy groups for case-study-level detail.
  • AI race: Allan Dafoe et al., “AI Governance: A Research Agenda” (Future of Humanity Institute, 2018) for the academic version of Hendrycks’ structural-pressure framing; Helen Toner’s writing on the corporate-race dynamics for journalistic detail.
  • Organizational risks: Charles Perrow, Normal Accidents (1984) for the foundational complex-systems framing of why correct components compose into incorrect systems; Diane Vaughan, The Challenger Launch Decision (1996) for the canonical organizational-failure case study. Both are the cross-disciplinary literature Hendrycks reaches into in Chapters 4 and 5.
  • Rogue AIs: Stuart Russell, Human Compatible (2019) for an alternative framing of the alignment problem; the DeepMind Safety Research “specification gaming examples” repository for an updated curated list of incidents.

L3 enters Hendrycks’ Chapter 3 (Single Agent Safety) and works through monitoring and robustness as specific deployment-time failure modes. The rogue-AI sub-mechanisms named in L2 (specification gaming, control drift) are the technical handles L3 reaches for first; the organizational-risk pattern of “no one watching” becomes the monitoring side of the lesson directly.