Practice: beneficial AI and machine ethics

Exercise 1: design a moral parliament for a deployment

You are designing the value-loading approach for an AI-driven content-moderation system deployed across a large social platform with users in many countries and many cultural contexts. You decide to use a moral-parliament approach. Sketch the parliament: which perspectives or stakeholder constituencies get seats, how votes are weighted, and what decision rule the parliament uses to reach a moderation policy. Write your design as a short proposal (one paragraph plus a list of seats), then identify three places where your design itself encodes value judgments the moral-parliament framing was supposed to defer.

The point of the exercise is to feel that the moral-parliament approach does not avoid value judgments; it relocates them to the parliament-design layer where they can be made transparently. The relocation is the value of the approach, not its avoidance.

Example design (one possible sketch)

The parliament has nine seats. Three seats represent the platform’s three largest user demographics by jurisdiction (cultural-context seats). Three seats represent named ethical frameworks (utilitarian, deontological, virtue-ethics). One seat represents the platform’s stated mission and its founders’ values. One seat represents children and other future users not currently in any demographic. One seat represents creators and content-producers whose livelihood depends on the platform. Votes are equal-weight (one seat, one vote); the decision rule is supermajority (six of nine) for any policy that materially restricts speech, simple majority for clarifying-not-restricting policies.

Three places this design encodes value judgments:

The choice to give cultural-context seats by demographic-by-jurisdiction encodes a position on whether moral views are bound to cultural identity vs to individual reflection.

The choice to include children and future users as a seat encodes a position on intergenerational ethics (whether future people have standing in current decisions).

The choice of supermajority for restrictive policies encodes a position about the priority of liberty over collective benefit.

Your design will be different; the exercise is to identify the value-judgments in your design. If your design feels neutral to you, look harder; the relocations are there.

Exercise 2: SWF choice on a worked loan-approval scenario

A loan-approval AI is deployed in a regional bank. The model produces the following outcomes across 10,000 applications:

Group	Applications	True positives (approved + creditworthy)	False positives (approved + not creditworthy)	False negatives (rejected + creditworthy)	True negatives (rejected + not creditworthy)
Majority group (8,000 apps)	8,000	4,800	320	320	2,560
Group A (2,000 apps)	2,000	940	100	460	500

The model produces 92 percent accuracy on the majority group and 78 percent accuracy on Group A. The false-rejection rate (creditworthy rejected) for Group A is 460/1400 ≈ 33 percent, compared to 320/5120 ≈ 6 percent for the majority group.

For each of the three SWF choices below, decide whether the deployment ships and write one sentence of reasoning.

Utilitarian SWF: total expected revenue from true-positive approvals minus losses from false positives.
Prioritarian SWF: weighted sum where Group A welfare is weighted three times the majority-group welfare (reflecting the prioritarian preference for the worse-off).
Equalized-odds fairness constraint added to either SWF: require false-negative and false-positive rates to be approximately equal across groups, even at cost to aggregate revenue.

Answer key

Ships. Aggregate revenue is positive; the utilitarian SWF treats each dollar of harm and each dollar of benefit symmetrically; the imbalance in error rates does not change the verdict because the SWF does not penalize distribution.
Does not ship as-is. The prioritarian weighting amplifies the cost of Group A’s 460 false rejections; under most plausible weight choices, the amplified cost exceeds the aggregate-revenue benefit. The deployment requires either model improvements (reducing Group A error rate) or a different deployment configuration (a higher-threshold rejection rule for Group A that recognizes the model’s reduced reliability on that subgroup).
Does not ship as-is. Equalized odds requires the 33 percent vs 6 percent false-negative-rate gap to close; the current model cannot satisfy this without either accuracy degradation on the majority group or accuracy improvement on Group A. The deployment requires either a different model or a different decision-threshold per group.

The exercise teaches the operational move: SWFs and fairness criteria are not data-derived; they are designer-chosen, and the choice changes the verdict. The L7 capability is to be able to defend the choice you made.

Exercise 3: critique a cost-benefit analysis

You receive the following cost-benefit summary for a proposed AI-driven traffic-management deployment in a midsize city:

“The deployment is projected to reduce total commute times by an average of 8 percent across the affected population, resulting in estimated economic value of $42M annually from time savings. Implementation cost is $3M annually. Expected externality from increased traffic in neighborhoods adjacent to optimized routes is estimated at $1.2M annually in property-value reduction. Net positive expected value: $37.8M annually. Recommendation: deploy.”

Apply the chapter’s two named blind spots of cost-benefit analysis (financial-proxy assumption, distributional-impact neglect). For each, write one sentence of critique. Then propose one specific question whose answer would change the recommendation.

Example critique

Financial-proxy blind spot: the analysis treats commute time savings as a uniform-value commodity, but the 8-percent average masks variance: a household commuting via two buses with a transfer constraint can have a deployment effect that pushes them past a connection threshold and produces a large step change in commute experience, while a household commuting by car may see a small smooth improvement. The financial proxy (time × hourly wage) is the same in both cases; the actual wellbeing effect is not.
Distributional-impact blind spot: the $1.2M property-value externality in “adjacent neighborhoods” is computed as an aggregate, but the burden falls on residents of those specific neighborhoods, not on the population that captures the commute-time benefit. The analysis would net out the same under a utilitarian SWF; under a prioritarian SWF (or a fairness constraint requiring the externality not to fall disproportionately on already-burdened neighborhoods), it would not.
One question that would change the recommendation: “What is the demographic composition of the neighborhoods absorbing the externality, and how does it compare to the demographic composition of the commuter population capturing the benefit?” If the answer is that the burden falls on already-burdened communities, the prioritarian or fairness recommendations differ from the utilitarian one.

Flashcards

Q. How does Hendrycks Ch 6.9 define moral uncertainty?

Not knowing which moral beliefs are correct. The condition arises because different ethical frameworks and different stakeholders endorse different values, and the disagreements survive sustained reflection. It is not a measurement problem one more experiment will resolve.

Q. Why does the chapter argue AI systems should represent moral uncertainty?

“AI systems should represent moral uncertainty to avoid acting on overconfidence, which could lead to outcomes that humans consider morally reprehensible” (Hendrycks Ch 6.9). A system acting with high confidence on a single ethical framework, plus high capability, can produce outcomes proponents of other frameworks judge as serious harm. Representing uncertainty constrains the confidence.

Q. What is the My Favorite Theory strategy for moral uncertainty, and what is its problem?

Pick the ethical framework you have the highest credence in and act consistently with it; treat the others as mistaken. The advantage is decisiveness: the AI has a tractable specification. The cost is the failure mode the framing was trying to prevent: high confidence + high capability + outcomes other frameworks reject. The strategy fails the chapter’s stated bar.

Q. What is maximizing expected choiceworthiness as a strategy for moral uncertainty?

Treat ethical frameworks like uncertain hypotheses about the world; weight them by credence; compute for each action its expected moral value across frameworks; pick the action with the highest expectation. Advantage: this is what Bayesian decision theory under uncertainty would do. Cost: requires units that translate between frameworks (utilitarian utils vs deontological weights), which may not be coherent.

Q. What is the moral parliament approach, and what is its tradeoff?

Simulate representatives of different moral perspectives and stakeholder viewpoints; have them deliberate and reach compromises; act on the compromise. Advantage: handles stakeholder heterogeneity, adaptable to evolving values, prefers compromise to extremes. Cost: the parliament has to be designed, and the design choices (which perspectives get seats, how votes are weighted, what compromise mechanism is used) are themselves load-bearing ethical decisions.

Q. What is a social welfare function, in the chapter's framing?

A mathematical function that aggregates individual welfare into one society-wide measure. The chapter framing: “Social welfare functions aggregate individual wellbeing into overall societal wellbeing” (Hendrycks Ch 6.8). They are needed once an AI is acting on behalf of a population, because individual welfares have to be combined into a collective measure.

Q. What is the difference between a utilitarian and a prioritarian SWF?

A utilitarian SWF sums individual wellbeing directly: the collective welfare is the sum (or mean) of individual welfares. A prioritarian SWF weights worse-off individuals more heavily. The utilitarian SWF treats all individuals symmetrically; the prioritarian SWF embeds a preference against very bad outcomes for any individual. Same data, different SWFs, different deployment verdicts.

Q. Why is cost-benefit analysis an incomplete approximation of social welfare, per Hendrycks?

Two named blind spots. Financial-proxy assumption: cost-benefit translates wellbeing into dollar values; a thousand dollars of harm to a low-income person is not equivalent to a thousand dollars to a high-income person, but utility-translated dollars treat them symmetrically. Distributional-impact neglect: cost-benefit aggregates costs and benefits across the population without weighting by who bears them, so a deployment whose benefits go to one group and whose costs fall on another can show positive net while being prioritarian-rejected.

Q. Why are fairness criteria not jointly satisfiable in general?

The formal-fairness literature has shown that demographic parity (rejection rates equal across groups), equalized odds (true-positive and false-positive rates equal across groups), and calibration (predicted probability matches realized rate across groups) cannot all be satisfied simultaneously except in degenerate cases (perfect prediction, or no underlying difference between groups). So picking which fairness criterion to enforce is another value-loading decision the chapter argues should be made transparently.

Q. How does L7 connect to L4 (outer alignment)?

L4 said outer alignment is hard because the loss function does not capture the designer’s intent. L7 names a deeper reason: there is no single goal to capture. Multiple stakeholders, multiple ethical frameworks; any specification implicitly chooses between them. The honest version of value-loading is to make the choice transparent and contestable, not to hide it inside a loss function. Moral uncertainty is the L4 outer-alignment problem at the layer of values themselves.