Practice: complex systems and emergent risk

Exercise 1: spot the four properties in a worked deployment

Read the deployment description below. Identify which of the four complex-systems properties (emergence, nonlinearity, feedback loops, tight coupling) shows up in the deployment, with one sentence of evidence per property. If a property does not show up, say so explicitly and explain why; the deployment may not have all four.

Deployment:

A multi-region delivery-logistics platform uses three coordinating AI models. Model A predicts demand by neighborhood-hour. Model B routes delivery vehicles in real time to minimize total fleet distance. Model C dispatches contractor drivers to specific orders based on driver location and predicted-completion time. The three models share a common state store updated every 15 seconds. Each model was trained against historical data from a previous deployment generation. Performance reviews and pay calculations for contractor drivers depend on a “completion score” computed by Model C using the same predictions Model C used for dispatch.

Answer key

Emergence: present. The system has a population-level property (overall delivery throughput; geographic balance of contractor work) that is not a property of any individual model. Optimizing each model individually for its specified objective does not guarantee the population-level property is desirable.
Nonlinearity: present, at least weakly. The 15-second shared-state coupling means small changes in one model’s outputs can produce disproportionate changes in another’s inputs. Under demand spike conditions, this nonlinearity can be substantial.
Feedback loops: present and especially worth flagging. The contractor pay calculation uses Model C’s own predictions, which means Model C’s prediction errors feed back into Model C’s training signal at the next training generation, and into contractors’ own behavior (drivers learn what Model C rewards and adjust their workflow to game it; the future training distribution is shaped by what previous-generation Model C rewarded).
Tight coupling: present. The 15-second shared-state update means the three models cannot operate in isolation for long; a failure or anomaly in one propagates within seconds.

All four properties present makes this deployment a normal-accidents candidate in Perrow’s sense: accidents are not guaranteed, but the system structure makes a certain class of accidents statistically inevitable across enough operating hours.

Exercise 2: Perrow-flavored decomposition of three historical incidents

For each incident below, identify (a) the component-level analysis (what individual element failed or behaved poorly), (b) the system-level analysis (what interaction or structure produced the failure), and (c) whether the incident is a “normal accident” in Perrow’s sense. Answers below; do the exercise first.

The 2010 Flash Crash. (Briefly covered in the lesson body.)
The 1996 Ariane 5 rocket failure: a software exception in the inertial reference system, originally written for Ariane 4, caused the rocket to self-destruct 37 seconds into flight. The software was reused without revalidation against Ariane 5’s flight profile.
The 2003 Northeast blackout: a software bug in a regional alarm system, plus an unaddressed line sag in a single transmission line, plus the absence of operator coordination protocols across utility boundaries, cascaded into outages affecting 55 million people.

Answer key

Flash Crash. Component-level: each algorithm did exactly what its specification said. System-level: the interaction between the large mutual-fund algorithmic sell and the response of high-frequency trading algorithms drained liquidity in a feedback loop. Normal accident? Yes. The market structure (tight coupling between algorithmic actors, no circuit-breakers calibrated to algorithmic timescales) made this class of accident statistically inevitable over enough operating hours.
Ariane 5. Component-level: the inertial reference software had a specific data-conversion overflow under Ariane 5’s higher horizontal velocity. There IS a component-level diagnosis. System-level: the certification process did not revalidate reused software against the new flight profile; the system of “engineering practice plus certification process” had a gap that the component-level diagnosis does not fully address. Normal accident? Partially. The proximate cause was a component bug; the structural cause was a certification process that did not catch reused-component validation gaps. The system-level diagnosis is necessary to prevent recurrence; the component-level diagnosis alone would have produced a fix for the specific overflow without addressing the recertification gap.
Northeast blackout. Component-level: the alarm system bug, the line sag. Both are component-level diagnoses. System-level: the interaction between the alarm bug (operators did not see the early indicator) and the absence of cross-utility coordination protocols meant local failures propagated across regional boundaries before any single utility could respond. Normal accident? Strongly yes. Perrow specifically discusses electric power grids as a paradigm case; the tight coupling and interactive complexity are constitutive of grid operation.

The point of the exercise is to feel that the component-level diagnosis and the system-level diagnosis are both true and address different recurrence-prevention problems. The component-level fix addresses the specific failure mode; the system-level fix addresses the class of failure modes that share the same structural property.

Exercise 3: propose two complex-systems-aware design changes

Return to the delivery-logistics deployment from Exercise 1. Propose two design changes that would reduce complex-systems-flavored risk. Constraints: the changes should NOT be of the form “improve Model A/B/C” or “add more training data” or “fix a specific bug.” They should target the system structure: tight coupling, feedback loops, emergence, or the layered-defenses-not-independent problem.

Write each proposal as: (a) the change, (b) which complex-systems property it targets, (c) the failure class it addresses, (d) the cost or downside of the change.

Below is a worked answer for one possible change; do the exercise yourself for a second one.

Example proposal (loosening the contractor-pay feedback loop)

Change: Compute contractor performance scores using a measurement function that is not used by Model C for dispatch. Specifically, score performance on next-day customer ratings (which depend on real outcomes the model did not predict) rather than on Model C’s predicted-completion time.
Property targeted: feedback loops. The current design has Model C’s predictions feeding back into both training data and contractor behavior; the proposed design breaks that loop by separating the measurement function used for performance from the prediction function used for dispatch.
Failure class addressed: the goal-drift / Goodhart-flavored failure where contractors learn to game the prediction function and where the next training generation inherits predictions shaped by gameable behavior.
Downside: the performance signal is now delayed by one day, which means contractor feedback (corrections to underperforming patterns) lags. The signal is also noisier because customer ratings include factors outside contractor control. Both costs are real; the system-level safety case treats them as worth paying.

Your second proposal should target a different property (tight coupling, emergence, or the layered-defenses-not-independent problem) and a different failure class.

Flashcards

Q. What are the four properties of complex systems named in the lesson?

Emergence (system has properties no component does), nonlinearity and sensitivity to initial conditions (small input changes produce large output changes in analytically intractable ways), feedback loops (outputs feed back as inputs, sometimes stabilizing and sometimes amplifying), and tight coupling (state of one part constrains others within timescales too short for human intervention).

Q. What is a normal accident in Perrow's sense?

An accident that is not the result of operator error or component failure but of the system having a structure (specifically: tight coupling plus interactive complexity) that makes the accident class statistically inevitable. The framing is from Charles Perrow’s Normal Accidents (1984). No amount of component-level engineering can drive the rate of normal accidents to zero; the engineering-vs-system distinction is what the framing makes visible.

Q. What is emergence, and why is it not a bug?

The system has properties that are not properties of any of its components. A neural network represents concepts; no individual neuron does. A market discovers prices; no individual trader does. Emergence is why systems are useful (you cannot get price discovery or concept representation from any single component) and also why they are hard to reason about (component-level analysis cannot predict emergent properties).

Q. Why is tight coupling especially dangerous in safety-critical systems?

Because a tightly-coupled system propagates a local failure across the system before operators can isolate it. A loosely-coupled system lets local failures stay local; operators have time to respond, route around, or shut down. Perrow’s framework pairs tight coupling with interactive complexity (interactions between components are non-obvious) and argues that the combination is what produces normal accidents.

Q. Why are AI deployments specifically tightly coupled to their environments?

Because deployed models affect the data their successors will be trained on, the operator practices around them, the user expectations they shape, and the regulatory frameworks that will govern the next generation. Many of these feedbacks operate at timescales shorter than human deliberation. A content-recommendation system shapes user preferences within months; the shifted preferences become training data; the feedback loop is closed before policy can catch up.

Q. What is model monoculture, and why is it a complex-systems concern?

When many deployed systems share the same underlying base model (because economies of scale concentrate model production into a few labs), correlated failure modes that are invisible at the individual-model level become visible at the population level. A weakness in a widely-licensed base model is a weakness in every product built on it; an adversarial input that defeats the model defeats every downstream system simultaneously. The risk lives at a layer no individual product team can address.

Q. Why does L5's Swiss-cheese composition rule break down in real-world deployments?

Three reasons. Shared blind spots: training and deployment may use the same eval framework descended from the same team’s assumptions. Correlated failure modes: multiple monitoring systems may read the same logs, so a log-pipeline failure takes them all down simultaneously. Adversarial pressure that breaks independence: an attacker often defeats successive layers with the same technique. The operational fix is to make layers more genuinely independent (different teams, methods, signals, timescales), not to add more layers.

Q. What was the system-level diagnosis of the 2010 Flash Crash?

No individual algorithm was malfunctioning. A large mutual-fund algorithmic sell order interacted with high-frequency trading algorithms whose response was correct under their individual specifications but collectively produced a feedback loop that drained liquidity. The component-level analysis (each algorithm did what it was designed to do) found nothing wrong; the system-level analysis (the interaction) found the failure. The market structure was the failure mode.

Q. What are emergent capabilities in AI, and why are they a complex-systems phenomenon?

Large neural networks exhibit capabilities at certain scales that they do not exhibit at smaller scales, with discontinuous thresholds that smooth-scaling-laws cannot predict. From the complex-systems perspective, this is what you would expect from a system whose internal dynamics are nonlinear and whose component count varies across orders of magnitude. Predicting at what scale a capability will emerge by extrapolating from smaller-scale measurements is methodologically suspect for the same reason predicting hurricane formation from a thermometer reading is.

Q. What is the L6 capability in four parts?

(1) Name four complex-systems properties and identify each in a deployed AI scenario. (2) Distinguish a normal accident (system-structure-produced) from a preventable engineering failure. (3) Recognize when the L5 Swiss-cheese rule breaks because layers are not independent, and name what would restore independence. (4) Take a deployed AI system and propose two design changes that reduce complex-systems risk without addressing any component-level bug.