Skip to content

References: beneficial AI and machine ethics

Dan Hendrycks. Introduction to AI Safety, Ethics, and Society. Taylor & Francis, 2024. Center for AI Safety, free to read at aisafetybook.com. L7 draws from Chapter 6 (Beneficial AI and Machine Ethics), primarily sections 6.8 (Social Welfare Functions) and 6.9 (Moral Uncertainty) for the load-bearing arguments, with the broader Ch 6 catalog (sections 6.2 Law, 6.3 Fairness, 6.4 Economic Engine, 6.5 Wellbeing, 6.6 Preferences, 6.7 Happiness) informing the lesson’s discussion of value-type distinctions.

Chapter sectionTopicURL
Ch 6.3Fairnessaisafetybook.com/textbook/fairness
Ch 6.5Wellbeingaisafetybook.com/textbook/wellbeing
Ch 6.8Social Welfare Functionsaisafetybook.com/textbook/social-welfare-functions
Ch 6.9Moral Uncertaintyaisafetybook.com/textbook/moral-uncertainty

A1 discipline preserved: verbatim from cited sections, no paraphrasing inside quote marks.

  • §6.9 Moral Uncertainty, core framing: “AI systems should represent moral uncertainty to avoid acting on overconfidence, which could lead to outcomes that humans consider morally reprehensible.”
  • §6.8 Social Welfare Functions, core framing: “Social welfare functions aggregate individual wellbeing into overall societal wellbeing.”
  • §6.8 Social Welfare Functions, on cost-benefit analysis: “relies on financial proxies for wellbeing and neglects distributional impacts.”

The three strategies for moral uncertainty (My Favorite Theory, maximizing expected choiceworthiness, moral parliament) and the named SWF families (utilitarian, prioritarian) are paraphrased from the chapter’s structure; the not-jointly-satisfiable fairness result is from the formal-fairness literature that Ch 6.3 draws on rather than from the chapter directly.

Same posture as L1 through L6: the CAIS textbook is © 2026 Center for AI Safety, published by Taylor & Francis, free to read online with no explicit Creative Commons or reuse license. This lesson is a structural mirror with verbatim quotes anchored to specific chapter sections within fair-use limits, link-out only, no embed, no derivative runs.

Not required for L7; these are the foundational works for the topics Ch 6 brings into the AI safety discussion.

  • Moral uncertainty: William MacAskill, Krister Bykvist, and Toby Ord, Moral Uncertainty (Oxford University Press, 2020). The book-length treatment that the moral-parliament approach traces back to. The expected-choiceworthiness strategy gets its rigorous form there. Available widely.
  • Moral parliament specifically: Toby Newberry and Toby Ord, “The Parliamentary Approach to Moral Uncertainty” (Global Priorities Institute working paper, 2021), available at globalprioritiesinstitute.org. The most accessible recent treatment.
  • Social choice theory: Kenneth Arrow, Social Choice and Individual Values (Wiley, 1951; 3rd edition 2012). The foundational text on aggregating individual preferences into collective decisions. Arrow’s impossibility theorem (no aggregation rule simultaneously satisfies a reasonable set of axioms for a population with three or more options) is the formal background for why SWF choice matters.
  • Prioritarianism: Derek Parfit, “Equality and Priority” (Ratio 1997), at academic.oup.com/ratio. The canonical philosophical argument for prioritarianism vs egalitarianism.
  • Fairness criteria in ML: Solon Barocas, Moritz Hardt, and Arvind Narayanan, Fairness and Machine Learning: Limitations and Opportunities (MIT Press, 2023). The textbook reference; the not-jointly-satisfiable result (impossibility of simultaneous demographic parity, equalized odds, and calibration in general) is worked formally there. Free draft at fairmlbook.org. The original impossibility result is from Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan, “Inherent Trade-Offs in the Fair Determination of Risk Scores” (ITCS 2017), at arxiv.org/abs/1609.05807.
  • Wellbeing-preferences-happiness distinctions: Daniel Kahneman, Ed Diener, and Norbert Schwarz (editors), Well-Being: The Foundations of Hedonic Psychology (Russell Sage Foundation, 1999), is the academic anchor. For a more accessible treatment, Martha Nussbaum’s capabilities approach in Creating Capabilities (Harvard University Press, 2011) gives a wellbeing framework alternative to hedonic or preference-satisfaction accounts.
  • Cost-benefit analysis and its limits: Lisa Heinzerling and Frank Ackerman, Priceless: On Knowing the Price of Everything and the Value of Nothing (The New Press, 2004), is the classic public-policy critique of cost-benefit analysis that informs the chapter’s “neglects distributional impacts” framing.

L8 enters Hendrycks Chapter 7 (Collective Action Problems) and takes the multi-stakeholder framing L7 introduced and works it at the formal level. Game theory provides the analytical tools (Nash equilibria, prisoner’s dilemma, public goods); cooperation, conflict, and evolutionary pressures are the substantive topics. The moral-parliament approach from L7 becomes the coordination instrument in the multi-actor setting L8 names. The natural-selection-on-AIs sub-mechanism from L2 (race bucket) returns at L8 in formal dress.