Beneficial AI and machine ethics, in brief

What you’ll learn

Phase 3 of Track 23 opens at L7 by changing the question. Phase 2 worked what fails when AI systems are deployed: the failure surface (L3), the alignment substrate (L4), the engineering toolkit (L5), the complex-systems constraints (L6). Phase 3 asks what we are trying to do, and for whom. Once you have a robust monitored aligned system, you still have to specify what it should be aligned with, and the answer is genuinely contested. The chapter that handles this is Hendrycks Chapter 6.

The first move in Ch 6 is to admit the question does not have a single answer. There is no consensus ethical framework that a designer can simply hand to an AI system. The chapter’s name for this fact is moral uncertainty, and the lesson uses it as the substrate. Three strategies (My Favorite Theory, maximizing expected choiceworthiness, moral parliament) are named, each with a tradeoff. The chapter does not declare one strategy correct; it notes that moral parliament has gathered the most attention in recent AI-ethics literature because it scales to stakeholder heterogeneity.

Social welfare functions (Ch 6.8) are the second layer. Once an AI is acting on behalf of a population, individual welfares have to be aggregated into a collective measure. Utilitarian SWFs sum directly; prioritarian SWFs weight worse-off individuals more heavily. The lesson uses a worked loan-approval scenario to show how the SWF choice changes the ship/do-not-ship verdict on the same data. The chapter’s section on fairness (Ch 6.3) extends with the named criteria (demographic parity, equalized odds, calibration) and the formal result that these criteria are not jointly satisfiable in general. Picking which to enforce is itself a value-loading decision.

The wider Ch 6 catalog (law, the economic engine, wellbeing, preferences, happiness) is summarized rather than worked. The lesson notes specifically that wellbeing, preferences, and happiness are not interchangeable, and that L4 proxy-gaming failures often operate on the distinction. The closing section connects back to L4: outer alignment is hard because the loss function does not capture the goal; L7 names a deeper reason: there is no single goal to capture.

Where this fits

This is lesson 7 of 9, the first lesson of Phase 3 (ethics and governance). The previous lesson, Complex systems and emergent risk (L6), closed Phase 2 by interrogating the L5 Swiss-cheese independence assumption. The next lesson, Collective action and multi-agent dynamics (L8, Ch 7), takes the multi-stakeholder framing L7 introduces and works it at the formal level (game theory, cooperation, conflict). Phase 3 closes at L9 (governance, Ch 8).

Before you start

Prerequisites: L6 (Complex systems). The L4 vocabulary (outer alignment, specification gaming, proxy gaming) is heavily used in L7’s connection back to the substrate question.

About the descriptive-not-prescriptive register in L7

L7 is one of the lessons where the Phase 0 §6 descriptive-not-prescriptive discipline matters most. The lesson body presents Hendrycks’ framing of moral uncertainty, the three strategies, and the SWF families as the chapter develops them; it does not advocate for any specific framework, strategy, or SWF as the correct choice. Claims are attributed to the chapter (“the chapter argues”, “Hendrycks names”, “the literature has shown”) rather than asserted in the editorial first person. The lesson’s job is to give the reader the vocabulary to make the value-loading choice deliberately, not to make the choice for them.

By the end, you’ll be able to

Explain moral uncertainty in two to three sentences without leaning on technical jargon
Name the three strategies for acting under moral uncertainty and the tradeoff each entails
Distinguish utilitarian from prioritarian SWFs and identify a deployment decision the choice changes
Recognize cost-benefit analysis as an incomplete approximation and name two specific blind spots
Connect to L4 outer alignment: the value-loading question Ch 6 is asking is the substrate question Ch 3.4 left open

Time and difficulty

Read time: about 13 minutes (the conceptual density is higher than L6 because the philosophical vocabulary is new; the loan-approval worked example anchors it)
Practice time: about 14 minutes (one moral-parliament design exercise, one SWF-decision exercise on a worked loan-approval scenario, one cost-benefit critique exercise, ten flashcards)
Difficulty: deep (Stage E specialized; L4 + L6 vocabulary heavily assumed; some moral-philosophy concepts will be new even to readers comfortable with the technical material)