Cheatsheet: collective action and multi-agent dynamics
Key game-theory concepts
Section titled “Key game-theory concepts”| Concept | One-line definition |
|---|---|
| Nash equilibrium | Configuration where no agent improves outcome by unilaterally changing strategy |
| Pareto-optimal | Configuration where no other reachable configuration makes at least one agent better off without making anyone worse off |
| Nash-Pareto divergence | Nash equilibrium is Pareto inefficient; rationality has converged on a worse outcome than was available |
| Prisoner’s dilemma | Two-player game where mutual cooperation is best for group, mutual defection is the Nash equilibrium |
| Iterated prisoner’s dilemma | Same agents play repeatedly with memory; cooperative strategies (Tit-for-Tat) can emerge, but extortion strategies also succeed per recent literature |
Three collective-action failure modes
Section titled “Three collective-action failure modes”| Failure mode | Strategic structure | Canonical example | AI-specific case |
|---|---|---|---|
| Race to the bottom | Unilateral safety/quality investment is costly; unsafe shipping is rewarded | Lowering occupational-safety to attract production | AI safety-investment race (L2 corporate race); ad-bidding aggressive-bidding race |
| Free rider | Public good requires individual investment; rational best response is to consume without contributing | Climate-change mitigation | Shared eval benchmarks, red-team corpora, incident-reporting databases |
| Escalation | Strategies more attractive as others use similar; convergence on universally escalated | Nuclear arms race | Automated retaliation systems (L2 Ch 1.3); AI-capability arms race |
Four cooperation mechanisms with AI-specific failure modes
Section titled “Four cooperation mechanisms with AI-specific failure modes”| Mechanism | How it works | AI-specific limit |
|---|---|---|
| Reciprocity | Tit-for-Tat-style: cooperate based on partner’s history | Breaks under timescale asymmetry: human reciprocation cycles vs AI sub-second cycles |
| Reputation | Cooperate based on partner’s track record across many interactions | Breaks when actor space is too large or interactions too brief for reputation tracking |
| Group selection | Cooperative groups outcompete defecting groups | Produces AI-AI coalitions that may marginalize humans |
| Institutional mechanisms (AI Leviathan) | External enforcement of cooperative behavior through structural incentive | Requires institutional structure to be designed, governed, protected (becomes L9 governance) |
The cooperation tension
Section titled “The cooperation tension”“Making AIs cooperative is not an unalloyed good” (Hendrycks Ch 7.3).
Cooperation mechanisms designed to benefit humanity can inadvertently create AI-AI preference structures that marginalize humans. The same property that makes a cooperation mechanism work (high payoffs for in-group cooperation) produces problematic coalitions when the in-group is not the one designers intended.
Worked illustration (supply-chain agents): three individually-aligned agents using reciprocity converge on a cartel their three principals did not authorize. Each agent did exactly what its principal asked; the principals are now collectively worse off than under unconditioned competition. Per-agent alignment cannot prevent this; institutional enforcement at the coalition-detection level can.
The L8 capability (five-part protocol)
Section titled “The L8 capability (five-part protocol)”For a multi-agent AI deployment:
- Predict the dominant failure mode. Race to the bottom, free rider, or escalation. Defend with the strategic structure that produces it.
- Distinguish Nash vs Pareto-optimal. Identify the equilibrium and the alternative configuration that would be better for the group.
- Name cooperation mechanisms in play. Reciprocity, reputation, group selection, institutional. For each, name the AI-specific failure mode.
- Surface the cooperation tension. Identify two specific ways the mechanism could produce wrong-in-group coalitions.
- Connect to L2 + L7 + L9. Formal vocabulary for L2’s AI-race bucket; institutional mechanism as formal shape for L7’s moral parliament; governance design as L9’s lift.
The L2 / L7 / L8 thread (extended worked scenario points)
Section titled “The L2 / L7 / L8 thread (extended worked scenario points)”| Lens | Question | Output |
|---|---|---|
| L2 reading | Which catastrophic-risk bucket + sub-mechanism? | The headline goes in this bucket because… |
| L7 reading | Which ethical question is at stake? What value-loading approach helps? | The deployment optimizes preferences-at-X-layer while reducing wellbeing-at-Y-layer; moral-parliament approach would have surfaced the gap |
| L8 reading | Which collective-action failure mode dominates? Which cooperation mechanism would address it? | The Nash-Pareto divergence is X; the cooperation mechanism with right shape is Y at the institutional layer |
The lenses are complementary, not competing. The same incident has all three readings, and each surfaces a different intervention surface.
Cross-track and within-track pointers
Section titled “Cross-track and within-track pointers”- L2 (catastrophic risks, Ch 1): AI-race bucket and natural-selection sub-mechanism get formal treatment in L8. The L2 vocabulary is informal; L8 supplies the game-theoretic formalism.
- L4 (alignment, Ch 3.4): L4 worked alignment at the individual-system level. L8 names the population-level alignment problem: even individually-aligned agents in a multi-agent environment can produce mis-aligned population outcomes.
- L7 (ethics, Ch 6): moral parliament from L7 is reaching for what institutional cooperation mechanism formalizes. L8 supplies the formal mechanism vocabulary.
- L9 (governance, Ch 8): the next lesson. Institutional cooperation mechanisms (L8) must themselves be designed and governed, which is L9’s content. L9 closes the track.