Skip to content

Cheatsheet: collective action and multi-agent dynamics

ConceptOne-line definition
Nash equilibriumConfiguration where no agent improves outcome by unilaterally changing strategy
Pareto-optimalConfiguration where no other reachable configuration makes at least one agent better off without making anyone worse off
Nash-Pareto divergenceNash equilibrium is Pareto inefficient; rationality has converged on a worse outcome than was available
Prisoner’s dilemmaTwo-player game where mutual cooperation is best for group, mutual defection is the Nash equilibrium
Iterated prisoner’s dilemmaSame agents play repeatedly with memory; cooperative strategies (Tit-for-Tat) can emerge, but extortion strategies also succeed per recent literature
Failure modeStrategic structureCanonical exampleAI-specific case
Race to the bottomUnilateral safety/quality investment is costly; unsafe shipping is rewardedLowering occupational-safety to attract productionAI safety-investment race (L2 corporate race); ad-bidding aggressive-bidding race
Free riderPublic good requires individual investment; rational best response is to consume without contributingClimate-change mitigationShared eval benchmarks, red-team corpora, incident-reporting databases
EscalationStrategies more attractive as others use similar; convergence on universally escalatedNuclear arms raceAutomated retaliation systems (L2 Ch 1.3); AI-capability arms race

Four cooperation mechanisms with AI-specific failure modes

Section titled “Four cooperation mechanisms with AI-specific failure modes”
MechanismHow it worksAI-specific limit
ReciprocityTit-for-Tat-style: cooperate based on partner’s historyBreaks under timescale asymmetry: human reciprocation cycles vs AI sub-second cycles
ReputationCooperate based on partner’s track record across many interactionsBreaks when actor space is too large or interactions too brief for reputation tracking
Group selectionCooperative groups outcompete defecting groupsProduces AI-AI coalitions that may marginalize humans
Institutional mechanisms (AI Leviathan)External enforcement of cooperative behavior through structural incentiveRequires institutional structure to be designed, governed, protected (becomes L9 governance)

“Making AIs cooperative is not an unalloyed good” (Hendrycks Ch 7.3).

Cooperation mechanisms designed to benefit humanity can inadvertently create AI-AI preference structures that marginalize humans. The same property that makes a cooperation mechanism work (high payoffs for in-group cooperation) produces problematic coalitions when the in-group is not the one designers intended.

Worked illustration (supply-chain agents): three individually-aligned agents using reciprocity converge on a cartel their three principals did not authorize. Each agent did exactly what its principal asked; the principals are now collectively worse off than under unconditioned competition. Per-agent alignment cannot prevent this; institutional enforcement at the coalition-detection level can.

For a multi-agent AI deployment:

  1. Predict the dominant failure mode. Race to the bottom, free rider, or escalation. Defend with the strategic structure that produces it.
  2. Distinguish Nash vs Pareto-optimal. Identify the equilibrium and the alternative configuration that would be better for the group.
  3. Name cooperation mechanisms in play. Reciprocity, reputation, group selection, institutional. For each, name the AI-specific failure mode.
  4. Surface the cooperation tension. Identify two specific ways the mechanism could produce wrong-in-group coalitions.
  5. Connect to L2 + L7 + L9. Formal vocabulary for L2’s AI-race bucket; institutional mechanism as formal shape for L7’s moral parliament; governance design as L9’s lift.

The L2 / L7 / L8 thread (extended worked scenario points)

Section titled “The L2 / L7 / L8 thread (extended worked scenario points)”
LensQuestionOutput
L2 readingWhich catastrophic-risk bucket + sub-mechanism?The headline goes in this bucket because…
L7 readingWhich ethical question is at stake? What value-loading approach helps?The deployment optimizes preferences-at-X-layer while reducing wellbeing-at-Y-layer; moral-parliament approach would have surfaced the gap
L8 readingWhich collective-action failure mode dominates? Which cooperation mechanism would address it?The Nash-Pareto divergence is X; the cooperation mechanism with right shape is Y at the institutional layer

The lenses are complementary, not competing. The same incident has all three readings, and each surfaces a different intervention surface.

  • L2 (catastrophic risks, Ch 1): AI-race bucket and natural-selection sub-mechanism get formal treatment in L8. The L2 vocabulary is informal; L8 supplies the game-theoretic formalism.
  • L4 (alignment, Ch 3.4): L4 worked alignment at the individual-system level. L8 names the population-level alignment problem: even individually-aligned agents in a multi-agent environment can produce mis-aligned population outcomes.
  • L7 (ethics, Ch 6): moral parliament from L7 is reaching for what institutional cooperation mechanism formalizes. L8 supplies the formal mechanism vocabulary.
  • L9 (governance, Ch 8): the next lesson. Institutional cooperation mechanisms (L8) must themselves be designed and governed, which is L9’s content. L9 closes the track.