Cheatsheet: safety engineering for AI systems
Nines of reliability (Ch 4.3)
Section titled “Nines of reliability (Ch 4.3)”| Reliability | Nines | Mean operations between failures |
|---|---|---|
| 90 percent | 1 | 10 |
| 99 percent | 2 | 100 |
| 99.9 percent | 3 | 1,000 |
| 99.99 percent | 4 | 10,000 |
| 99.999 percent | 5 | 100,000 |
| 99.9999 percent | 6 | 1,000,000 |
Formula: k = -log10(1 - p) where p is reliability probability and k is nines.
Key property: each additional nine multiplies expected operations between failures by 10. The marginal cost of safety work grows; the marginal payoff in mean-time-to-failure is tenfold per nine.
The eight safe-design principles (Ch 4.4)
Section titled “The eight safe-design principles (Ch 4.4)”| Principle | Outside-AI example | AI translation |
|---|---|---|
| Redundancy | Many suspension-bridge cables; few failures do not cause collapse | Multiple independent monitoring systems; no single system being the only watcher for a failure class |
| Separation of duties | Cockpit crew protocols + backup entry codes | Train-deploy-audit teams are different teams |
| Least privilege | Locked cockpit doors | Agent tools and data access scoped to current task only |
| Fail-safe defaults | Electrical fuses melt on overcurrent | Default to refusal / human-handoff when monitoring goes dark or context becomes incoherent |
| Defense in depth | Boat travel: vessel check + swimming + lifejacket | Alignment + filtering + eval + deployment monitoring as different slices |
| Antifragility | Post-accident investigations reshape protocols | Near-miss culture; every caught failure feeds training, eval, or monitoring |
| Negative feedback | Safe following distance gives deceleration buffer | Automated rollback on metric-drift past defined thresholds |
| Transparency | Crew knowledge of cockpit-entry procedures | Published model cards: capability, failure modes, training data composition, known limits |
Swiss-cheese composition rule
Section titled “Swiss-cheese composition rule”Stacking N independent layers each with reliability p produces composed reliability 1 - (1-p)^N:
| Per-layer reliability | 1 layer | 2 layers | 3 layers | 4 layers |
|---|---|---|---|---|
| 90 percent | 90% (1 nine) | 99% (2 nines) | 99.9% (3 nines) | 99.99% (4 nines) |
| 95 percent | 95% (1.3 nines) | 99.75% (2.6 nines) | 99.9875% (3.9 nines) | 99.99% (4+ nines) |
| 99 percent | 99% (2 nines) | 99.99% (4 nines) | 99.9999% (6 nines) | 99.999999% (8 nines) |
Independence is non-negotiable; layers whose holes are correlated do not compose this way. Diversity (different teams, methods, signals) matters more than depth in any one defense.
Thin-tailed vs long-tailed
Section titled “Thin-tailed vs long-tailed”| Property | Thin-tailed | Long-tailed |
|---|---|---|
| Impact distribution | Many small events; total proportional to count | Rare catastrophic events dominate total |
| Largest plausible event size | Bounded | Roughly unbounded |
| Chapter example | Shark attacks | Wildfires |
| Right analysis tool | Mean expected loss | Tail-scenario planning, worst-plausible-outcome |
| AI-deployment shape | Spam filtering, routine boilerplate code suggestions | Autonomous-vehicle decisions, clinical diagnostics, content-recommendation at population scale |
The classification matters because the analysis tools differ. Long-tailed risks need explicit tail planning; thin-tailed risks can be analyzed by mean accuracy.
Black swans vs tail events
Section titled “Black swans vs tail events”| Type | Foreseen in kind? | Example | Recommended response |
|---|---|---|---|
| Tail event | Yes, the event class is known | Wildfires, aviation accidents | Plan for the worst plausible instance, even when probability estimates are unreliable |
| Black swan | No, the event class is not catalogued | Novel AI failure mode not yet appearing in literature | Build systems whose response to surprise is structured (antifragility principle) |
The chapter does not recommend predicting black swans; predicting an uncatalogued event class is a contradiction. It recommends design-stage decisions (the antifragility principle) that make systems more resilient to surprise.
The L5 capability (four-part protocol)
Section titled “The L5 capability (four-part protocol)”For a deployment decision you care about:
- Pick one tool from Ch 4. Defense in depth, nines, FMEA, separation of duties, least privilege, fail-safe defaults, antifragility, transparency, or any other Ch 4 instrument.
- Name the deployment decision. Specific: “whether to ship a coding-assistant agent with autonomous git-push capability,” not “whether to ship a coding assistant.”
- State the constraint. A ship/do-not-ship or build/do-not-build criterion the tool produces. “Do not ship without N additional layers, where each layer catches at least X percent of the failure mode.”
- Defend the constraint. Reference the tool’s logic. Defense in depth → Swiss-cheese composition. Nines → log-scale marginal-cost reasoning. Fail-safe defaults → cost of incoherent default-state.
Quick disambiguation cheatsheet
Section titled “Quick disambiguation cheatsheet”| When the question is | The right Ch 4 tool is usually |
|---|---|
| ”How safe is safe enough?” | Nines of reliability |
| ”What layers should we stack?” | Defense in depth + Swiss-cheese |
| ”What should the agent do when it does not know?” | Fail-safe defaults |
| ”Who watches the watchers?” | Separation of duties |
| ”What capabilities should the deployed system have access to?” | Least privilege |
| ”How do we learn from incidents?” | Antifragility (with negative feedback and transparency) |
| “What is the worst plausible outcome?” | Tail events (long-tailed risk planning) |
| “How do we prepare for what we have not catalogued?” | Black-swan structured response (antifragility) |
Cross-track and within-track pointers
Section titled “Cross-track and within-track pointers”- L3 (monitoring + robustness): the slices L5 composes via Swiss-cheese. L3 said “imperfect layers compose”; L5 names the composition rule formally and gives the arithmetic.
- L4 (alignment): the slice with the biggest holes; L5’s defense-in-depth depends on other slices doing more work because alignment’s tools are partial.
- L6 (complex systems, Ch 5): the next chapter; explains why correct components compose into incorrect systems. L5 is the bridge from L4’s substrate-thinking to L6’s system-thinking.
- L9 (governance): another slice at a different layer entirely (corporate / national / international / compute). L5’s principles apply at every layer; governance applies them at the policy layer.