Safety engineering: cheatsheet

Nines of reliability (Ch 4.3)

Reliability	Nines	Mean operations between failures
90 percent	1	10
99 percent	2	100
99.9 percent	3	1,000
99.99 percent	4	10,000
99.999 percent	5	100,000
99.9999 percent	6	1,000,000

Formula: k = -log10(1 - p) where p is reliability probability and k is nines.

Key property: each additional nine multiplies expected operations between failures by 10. The marginal cost of safety work grows; the marginal payoff in mean-time-to-failure is tenfold per nine.

The eight safe-design principles (Ch 4.4)

Principle	Outside-AI example	AI translation
Redundancy	Many suspension-bridge cables; few failures do not cause collapse	Multiple independent monitoring systems; no single system being the only watcher for a failure class
Separation of duties	Cockpit crew protocols + backup entry codes	Train-deploy-audit teams are different teams
Least privilege	Locked cockpit doors	Agent tools and data access scoped to current task only
Fail-safe defaults	Electrical fuses melt on overcurrent	Default to refusal / human-handoff when monitoring goes dark or context becomes incoherent
Defense in depth	Boat travel: vessel check + swimming + lifejacket	Alignment + filtering + eval + deployment monitoring as different slices
Antifragility	Post-accident investigations reshape protocols	Near-miss culture; every caught failure feeds training, eval, or monitoring
Negative feedback	Safe following distance gives deceleration buffer	Automated rollback on metric-drift past defined thresholds
Transparency	Crew knowledge of cockpit-entry procedures	Published model cards: capability, failure modes, training data composition, known limits

Swiss-cheese composition rule

Stacking N independent layers each with reliability p produces composed reliability 1 - (1-p)^N:

Per-layer reliability	1 layer	2 layers	3 layers	4 layers
90 percent	90% (1 nine)	99% (2 nines)	99.9% (3 nines)	99.99% (4 nines)
95 percent	95% (1.3 nines)	99.75% (2.6 nines)	99.9875% (3.9 nines)	99.99% (4+ nines)
99 percent	99% (2 nines)	99.99% (4 nines)	99.9999% (6 nines)	99.999999% (8 nines)

Independence is non-negotiable; layers whose holes are correlated do not compose this way. Diversity (different teams, methods, signals) matters more than depth in any one defense.

Thin-tailed vs long-tailed

Property	Thin-tailed	Long-tailed
Impact distribution	Many small events; total proportional to count	Rare catastrophic events dominate total
Largest plausible event size	Bounded	Roughly unbounded
Chapter example	Shark attacks	Wildfires
Right analysis tool	Mean expected loss	Tail-scenario planning, worst-plausible-outcome
AI-deployment shape	Spam filtering, routine boilerplate code suggestions	Autonomous-vehicle decisions, clinical diagnostics, content-recommendation at population scale

The classification matters because the analysis tools differ. Long-tailed risks need explicit tail planning; thin-tailed risks can be analyzed by mean accuracy.

Black swans vs tail events

Type	Foreseen in kind?	Example	Recommended response
Tail event	Yes, the event class is known	Wildfires, aviation accidents	Plan for the worst plausible instance, even when probability estimates are unreliable
Black swan	No, the event class is not catalogued	Novel AI failure mode not yet appearing in literature	Build systems whose response to surprise is structured (antifragility principle)

The chapter does not recommend predicting black swans; predicting an uncatalogued event class is a contradiction. It recommends design-stage decisions (the antifragility principle) that make systems more resilient to surprise.

The L5 capability (four-part protocol)

For a deployment decision you care about:

Pick one tool from Ch 4. Defense in depth, nines, FMEA, separation of duties, least privilege, fail-safe defaults, antifragility, transparency, or any other Ch 4 instrument.
Name the deployment decision. Specific: “whether to ship a coding-assistant agent with autonomous git-push capability,” not “whether to ship a coding assistant.”
State the constraint. A ship/do-not-ship or build/do-not-build criterion the tool produces. “Do not ship without N additional layers, where each layer catches at least X percent of the failure mode.”
Defend the constraint. Reference the tool’s logic. Defense in depth → Swiss-cheese composition. Nines → log-scale marginal-cost reasoning. Fail-safe defaults → cost of incoherent default-state.

Quick disambiguation cheatsheet

When the question is	The right Ch 4 tool is usually
”How safe is safe enough?”	Nines of reliability
”What layers should we stack?”	Defense in depth + Swiss-cheese
”What should the agent do when it does not know?”	Fail-safe defaults
”Who watches the watchers?”	Separation of duties
”What capabilities should the deployed system have access to?”	Least privilege
”How do we learn from incidents?”	Antifragility (with negative feedback and transparency)
“What is the worst plausible outcome?”	Tail events (long-tailed risk planning)
“How do we prepare for what we have not catalogued?”	Black-swan structured response (antifragility)

Cross-track and within-track pointers

L3 (monitoring + robustness): the slices L5 composes via Swiss-cheese. L3 said “imperfect layers compose”; L5 names the composition rule formally and gives the arithmetic.
L4 (alignment): the slice with the biggest holes; L5’s defense-in-depth depends on other slices doing more work because alignment’s tools are partial.
L6 (complex systems, Ch 5): the next chapter; explains why correct components compose into incorrect systems. L5 is the bridge from L4’s substrate-thinking to L6’s system-thinking.
L9 (governance): another slice at a different layer entirely (corporate / national / international / compute). L5’s principles apply at every layer; governance applies them at the policy layer.