Skip to content

References: AI safety as a field

Dan Hendrycks. Introduction to AI Safety, Ethics, and Society. Taylor & Francis, 2024. Published by the Center for AI Safety (CAIS), free to read online at aisafetybook.com. Print and online ISBNs: 9781032869926 (print) and 9781032798028 (online). Audiobook also available on Spotify.

L1 draws primarily from Chapter 1: Overview of Catastrophic AI Risks, in particular Section 1.1 (field framing and the four-bucket introduction) and Section 1.6 (discussion of connections between the four risk categories). The full chapter is at aisafetybook.com/textbook/overview-of-catastrophic-ai-risks.

The four-bucket typology (malicious use, AI race, organizational risks, rogue AIs) is the chapter’s core organizing structure; this lesson uses it directly with attribution. The cross-disciplinary scaffolding (safety engineering in Chapter 4, complex systems in Chapter 5, governance in Chapter 8) is the textbook’s structural argument that the field is multi-disciplinary; this lesson references it as foreshadowing for later lessons in the track.

When citing the textbook elsewhere, use the author-specified form:

Dan Hendrycks. Introduction to AI Safety, Ethics and Society. Taylor & Francis, 2024.

The CAIS textbook is © 2026 Center for AI Safety, published by Taylor & Francis (Routledge), free to read online with no explicit Creative Commons or reuse license. This track is a structural mirror: the lesson arc follows the textbook’s chapter structure, original prose anchors against cited chapter content, no embedded text from the source, no derivative quote runs beyond fair-use snippets. Readers are routed to the canonical URL on every references.mdx for the actual textbook content.

These are not required for L1; they are useful entry points if the field-framing lands and you want to read further before L2.

  • Center for AI Safety, “Statement on AI Risk” (2023). A one-sentence consensus statement signed by a wide range of AI researchers and public figures. safe.ai/statement-on-ai-risk. Useful as a reference point for what “the field” agreed on in a single sentence at one moment in time, and for noticing how thin a single-sentence statement is compared to a textbook.
  • DeepMind Safety Research, “Specification gaming examples.” A curated list of specification-gaming incidents in deployed and research systems. Useful for the vocabulary in §3 of this lesson; the L4 alignment lesson will return to this list.
  • Stuart Russell, Human Compatible (2019). A book-length argument for one specific approach to alignment (provably beneficial AI). Useful as a contrast with Hendrycks: a different framing of the same field, by another senior figure. Reading both makes the “discipline produces vocabulary that lets practitioners disagree” point concrete.

L2 takes the four buckets named in L1 and works each one in detail: what counts as malicious use, what counts as an AI race dynamic, what counts as an organizational risk, what counts as a rogue AI. L1’s capability (write the paragraph) is the prerequisite; L2 expects the four-bucket vocabulary to be available as a working tool, not as something the reader is still building.