Summary: AI safety as a field

Summary

AI safety, as Dan Hendrycks treats it in Introduction to AI Safety, Ethics, and Society (CAIS, 2024), is a field rather than a stance. A field has a subject, a vocabulary, a method, and a connection to neighboring fields; a stance has supporters. The distinction is practical: a question routed through “are you for or against AI” stops being a question about the world and starts being a question about a coalition.

The subject of the field is what can go wrong when AI systems are designed, deployed, and operated. Hendrycks sorts those failure modes into four categorically distinct buckets. Malicious use covers people intentionally weaponizing AI systems. AI race covers the structural competitive pressure on labs and countries to ship before evaluation is done, framed as a pressure on incentives rather than a failure of character. Organizational risks covers accidents inside the labs and companies that build AI, the usual mechanisms being complex systems and ambiguous responsibility. Rogue AIs covers systems pursuing objectives in ways their designers did not intend and can no longer correct. The four buckets are deliberately distinct because the interventions that help in one bucket typically do not help in another.

The field is a discipline rather than a stance because it does three things stances do not. It borrows tools from neighboring disciplines: safety engineering (nines of reliability, defense in depth, fault-tree analysis), complex-systems theory (why correct components can compose into incorrect systems), governance frameworks, machine ethics, game theory. It admits uncertainty by distinguishing risks that are well-characterized in deployed systems today from risks that are projected or contested in more capable future systems, without collapsing the two. And it produces vocabulary precise enough to let practitioners who disagree about policy still agree on what they are disagreeing about: robustness failure is not monitoring failure, specification gaming is not proxy gaming, deceptive alignment is not either, and these distinctions are the field’s portable assets across debates.

The editorial register is descriptive, not prescriptive. The textbook attributes claims to specific sources rather than asserting positions in the editorial first person. This is a structural commitment, not a hedge: it keeps the discipline visible, and it lets readers who disagree with Hendrycks’ framing do so using the same vocabulary.

The capability for this lesson is the paragraph-write: be able to state, in roughly 6-8 sentences, what AI safety studies and why it is a discipline rather than a stance, naming at least two of the four risk categories, at least one cross-disciplinary tool the field borrows, and the descriptive register. The Practice section has a model paragraph to compare against, but the work is to write your own first and then read the model.

The rest of the track follows the same shape: nine lessons across three phases (risks landscape, safety and alignment, ethics and governance), each mapped to specific chapters of the Hendrycks textbook, each carrying the same descriptive register, each adding vocabulary that compounds across lessons. By L9 the reader should be able to read a real governance proposal and place it inside Hendrycks’ four-layer governance taxonomy without the layers being a blocker.