AI safety as a field: what it studies and why it is a discipline, not a stance
What you’ll learn
Section titled “What you’ll learn”This is the opening lesson of Track 23 (AI Safety and Alignment), and it does the field-framing work that the rest of the track assumes. The source curriculum is Dan Hendrycks’ Introduction to AI Safety, Ethics, and Society (Center for AI Safety, 2024), freely available at aisafetybook.com. The track mirrors the textbook’s structural arc across nine lessons in three phases (risks landscape, safety and alignment, ethics and governance); this lesson grounds the whole shape.
The lesson opens with what “AI safety” is not: it is not a slogan, not a side in an online fight, not a vibe. The corrective is to treat it as a field with a subject, a vocabulary, a method, and a connection to neighboring disciplines. The subject is what specifically can go wrong (and right) when AI systems are designed, deployed, and operated. Hendrycks sorts the failure modes into four categorically distinct buckets (malicious use, AI race, organizational risks, rogue AIs); the lesson previews each in one sentence with a worked-headline sort to make the typology concrete. The lesson then builds the case for “discipline, not stance” along three axes: the field borrows tools from safety engineering and complex-systems theory, it admits uncertainty about which risks are well-characterized today and which are projected, and it produces vocabulary precise enough to let practitioners disagree productively. The lesson closes on the descriptive-not-prescriptive editorial register that the rest of the track will carry.
Where this fits
Section titled “Where this fits”This is lesson 1 of 9, and the entry point of Track 23. There is no previous lesson in this track; the next lesson, The four catastrophic risk categories, takes the buckets named here and works each in detail across the rest of Phase 1. The track sits at Stage E in the curriculum (specialized layer) and assumes prior comfort with the basics of how AI systems are built and trained.
Before you start
Section titled “Before you start”Prerequisites:
- Neural Network Intuition (T11) and Intro to Deep Learning (T12), or equivalent comfort with neural networks: how they are structured, how they learn from data, what a loss function does. The track is not re-teaching the engine; it is examining what fails when the engine is deployed at scale.
- Reinforcement Learning Foundations (T17) is recommended (not required) before L8 specifically, where Hendrycks’ game-theory chapter assumes some intuition about multi-agent dynamics.
If “deployment, distribution shift, reward function, objective function” all feel like working vocabulary, you are ready.
About the discipline
Section titled “About the discipline”Track 23 is a Stage E specialized layer. The audience is more advanced than the platform’s primary AI-anxious non-technical reader. The expected reader is an AI engineer wanting a systematic vocabulary, a policy-curious technologist tired of Twitter-thread takes, or a senior practitioner wanting Hendrycks’ framing without reading the whole textbook. The lessons are conceptually dense (around 2000-3000 words each) and prioritize precision over breadth.
The descriptive register is a structural commitment, not a tone choice. Claims are attributed to specific sources (the chapter, the CAIS framing, the cited reference) rather than asserted in the editorial first person. This lets readers who disagree with Hendrycks’ framing engage with the same vocabulary; it keeps the discipline visible.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- State in one paragraph (6-8 sentences) what AI safety studies and why it is a discipline rather than a stance
- Name Hendrycks’ four catastrophic-risk categories and place each in one sentence
- Distinguish a discipline-shaped claim from a stance-shaped claim and rewrite one into the other
- Recognize the cross-disciplinary scaffolding (safety engineering, complex systems, governance) the field uses
- Explain why a descriptive vocabulary about risks is more useful than a position on whether AI is good or bad
Time and difficulty
Section titled “Time and difficulty”- Read time: about 13 minutes (the lesson body is denser than the platform’s standard tracks; budget accordingly)
- Practice time: about 15 minutes (the paragraph-write exercise is the centerpiece; a four-bucket sort and a discipline-vs-stance rewrite round it out)
- Difficulty: deep (Stage E specialized; assumes T11 + T12 prerequisites)