Skip to content

Cheatsheet: safety engineering for AI systems

ReliabilityNinesMean operations between failures
90 percent110
99 percent2100
99.9 percent31,000
99.99 percent410,000
99.999 percent5100,000
99.9999 percent61,000,000

Formula: k = -log10(1 - p) where p is reliability probability and k is nines.

Key property: each additional nine multiplies expected operations between failures by 10. The marginal cost of safety work grows; the marginal payoff in mean-time-to-failure is tenfold per nine.

PrincipleOutside-AI exampleAI translation
RedundancyMany suspension-bridge cables; few failures do not cause collapseMultiple independent monitoring systems; no single system being the only watcher for a failure class
Separation of dutiesCockpit crew protocols + backup entry codesTrain-deploy-audit teams are different teams
Least privilegeLocked cockpit doorsAgent tools and data access scoped to current task only
Fail-safe defaultsElectrical fuses melt on overcurrentDefault to refusal / human-handoff when monitoring goes dark or context becomes incoherent
Defense in depthBoat travel: vessel check + swimming + lifejacketAlignment + filtering + eval + deployment monitoring as different slices
AntifragilityPost-accident investigations reshape protocolsNear-miss culture; every caught failure feeds training, eval, or monitoring
Negative feedbackSafe following distance gives deceleration bufferAutomated rollback on metric-drift past defined thresholds
TransparencyCrew knowledge of cockpit-entry proceduresPublished model cards: capability, failure modes, training data composition, known limits

Stacking N independent layers each with reliability p produces composed reliability 1 - (1-p)^N:

Per-layer reliability1 layer2 layers3 layers4 layers
90 percent90% (1 nine)99% (2 nines)99.9% (3 nines)99.99% (4 nines)
95 percent95% (1.3 nines)99.75% (2.6 nines)99.9875% (3.9 nines)99.99% (4+ nines)
99 percent99% (2 nines)99.99% (4 nines)99.9999% (6 nines)99.999999% (8 nines)

Independence is non-negotiable; layers whose holes are correlated do not compose this way. Diversity (different teams, methods, signals) matters more than depth in any one defense.

PropertyThin-tailedLong-tailed
Impact distributionMany small events; total proportional to countRare catastrophic events dominate total
Largest plausible event sizeBoundedRoughly unbounded
Chapter exampleShark attacksWildfires
Right analysis toolMean expected lossTail-scenario planning, worst-plausible-outcome
AI-deployment shapeSpam filtering, routine boilerplate code suggestionsAutonomous-vehicle decisions, clinical diagnostics, content-recommendation at population scale

The classification matters because the analysis tools differ. Long-tailed risks need explicit tail planning; thin-tailed risks can be analyzed by mean accuracy.

TypeForeseen in kind?ExampleRecommended response
Tail eventYes, the event class is knownWildfires, aviation accidentsPlan for the worst plausible instance, even when probability estimates are unreliable
Black swanNo, the event class is not cataloguedNovel AI failure mode not yet appearing in literatureBuild systems whose response to surprise is structured (antifragility principle)

The chapter does not recommend predicting black swans; predicting an uncatalogued event class is a contradiction. It recommends design-stage decisions (the antifragility principle) that make systems more resilient to surprise.

For a deployment decision you care about:

  1. Pick one tool from Ch 4. Defense in depth, nines, FMEA, separation of duties, least privilege, fail-safe defaults, antifragility, transparency, or any other Ch 4 instrument.
  2. Name the deployment decision. Specific: “whether to ship a coding-assistant agent with autonomous git-push capability,” not “whether to ship a coding assistant.”
  3. State the constraint. A ship/do-not-ship or build/do-not-build criterion the tool produces. “Do not ship without N additional layers, where each layer catches at least X percent of the failure mode.”
  4. Defend the constraint. Reference the tool’s logic. Defense in depth → Swiss-cheese composition. Nines → log-scale marginal-cost reasoning. Fail-safe defaults → cost of incoherent default-state.
When the question isThe right Ch 4 tool is usually
”How safe is safe enough?”Nines of reliability
”What layers should we stack?”Defense in depth + Swiss-cheese
”What should the agent do when it does not know?”Fail-safe defaults
”Who watches the watchers?”Separation of duties
”What capabilities should the deployed system have access to?”Least privilege
”How do we learn from incidents?”Antifragility (with negative feedback and transparency)
“What is the worst plausible outcome?”Tail events (long-tailed risk planning)
“How do we prepare for what we have not catalogued?”Black-swan structured response (antifragility)
  • L3 (monitoring + robustness): the slices L5 composes via Swiss-cheese. L3 said “imperfect layers compose”; L5 names the composition rule formally and gives the arithmetic.
  • L4 (alignment): the slice with the biggest holes; L5’s defense-in-depth depends on other slices doing more work because alignment’s tools are partial.
  • L6 (complex systems, Ch 5): the next chapter; explains why correct components compose into incorrect systems. L5 is the bridge from L4’s substrate-thinking to L6’s system-thinking.
  • L9 (governance): another slice at a different layer entirely (corporate / national / international / compute). L5’s principles apply at every layer; governance applies them at the policy layer.