Skip to content

References: collective action and multi-agent dynamics

Dan Hendrycks. Introduction to AI Safety, Ethics, and Society. Taylor & Francis, 2024. Center for AI Safety, free to read at aisafetybook.com. L8 draws from Chapter 7 (Collective Action Problems), primarily sections 7.2 (Game Theory) and 7.3 (Cooperation), with 7.4 (Conflict) and 7.5 (Evolutionary Pressures) informing the conflict-and-evolutionary-pressures section.

Chapter sectionTopicURL
Ch 7.2Game Theoryaisafetybook.com/textbook/game-theory
Ch 7.3Cooperationaisafetybook.com/textbook/cooperation
Ch 7.4Conflictaisafetybook.com/textbook/conflict
Ch 7.5Evolutionary Pressuresaisafetybook.com/textbook/evolutionary-pressures

A1 discipline preserved: verbatim from cited sections, no paraphrasing inside quote marks.

  • §7.2 Game Theory, on multi-agent dynamics: “dynamics that may arise when AI and human agents interact. These interactions create risks distinct from those generated by any individual AI agent acting in isolation.”
  • §7.2 Game Theory, on iterated prisoner’s dilemma: “extortion strategies are often successful in the Iterated Prisoner’s Dilemma.”
  • §7.2 Game Theory, on automated economy: “autonomous economy where AIs make all important decisions.”
  • §7.3 Cooperation, opening framing: “Cooperation between AI stakeholders is important in order to mitigate risks from AI.”
  • §7.3 Cooperation, on the cooperation tension: “making AIs cooperative is not an unalloyed good.”

The three named collective-action failure modes (race to the bottom, free rider, escalation), the four cooperation mechanisms (reciprocity, reputation, group selection, institutional), and the Nash-Pareto framing are standard vocabulary in the game-theory and political-economy literature; the lesson uses them as the chapter does. The AI Leviathan framing is Hendrycks’ specific use.

Same posture as L1 through L7: the CAIS textbook is © 2026 Center for AI Safety, published by Taylor & Francis, free to read online with no explicit Creative Commons or reuse license. This lesson is a structural mirror with verbatim quotes anchored to specific chapter sections within fair-use limits, link-out only, no embed, no derivative runs.

Not required for L8; these are the foundational works for the topics Ch 7 brings into the AI safety discussion.

  • Robert Axelrod, The Evolution of Cooperation (Basic Books, 1984; revised edition 2006). The foundational text on iterated prisoner’s dilemma and the emergence of cooperative strategies through reciprocity. Pre-AI; the framing transfers directly to multi-agent AI systems. The Tit-for-Tat result is here.
  • Elinor Ostrom, Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge University Press, 1990). The foundational text on institutional cooperation mechanisms. Ostrom won the Nobel for showing that real-world communities solve free-rider problems through institutional design that does not require central authority. The AI Leviathan framing in Ch 7.3 inherits from this lineage.
  • Garrett Hardin, “The Tragedy of the Commons” (Science 1968), at science.org/doi/10.1126/science.162.3859.1243. The classical statement of the free-rider failure mode. Contested in places (Ostrom is the canonical critique); still the standard reference for the problem statement.
  • Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel, “Cooperative AI: machines must learn to find common ground” (Nature 2021), at nature.com/articles/d41586-021-01170-0. The short commentary that frames the cooperative-AI research agenda. The reference list points at most of the active research in multi-agent cooperation for AI.
  • William H. Press and Freeman Dyson, “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” (PNAS 2012), at pnas.org/doi/10.1073/pnas.1206569109. The extortion-strategies result Hendrycks references; rigorous and surprising. Worth reading for the technical contribution to the iterated-cooperation literature.
  • Joel Z. Leibo et al., “Multi-agent Reinforcement Learning in Sequential Social Dilemmas” (AAMAS 2017), at arxiv.org/abs/1702.03037. The entry point to the multi-agent reinforcement learning (MARL) literature applied to collective-action problems. Shows that the failure modes named in this lesson emerge in trained MARL agents in laboratory settings.
  • Thomas Schelling, The Strategy of Conflict (Harvard University Press, 1960). Foundational text on escalation dynamics and bargaining under threat. Pre-AI but the framings (commitment devices, focal points, mutual deterrence) carry directly into multi-agent AI strategic analysis.

L9 enters Hendrycks Chapter 8 (Governance) and takes the institutional-mechanism question L8 opens (institutional cooperation mechanisms must themselves be designed, governed, and protected) and works it as the policy-layer instrument. Hendrycks’ four-layer governance taxonomy (corporate, national, international, compute) is the chapter’s organizing structure. L9 closes Phase 3 and closes the track.