References: The human-centered view

Source material

This lesson follows Stanford CS231n’s treatment of human-centered AI (Lecture 18, the closing lecture). This is also the closing lesson of Clawdemy’s Track 16.

Course: Stanford CS231n, “Deep Learning for Computer Vision”
Instructors: Fei-Fei Li, Ehsan Adeli, and Justin Johnson (Stanford University)
Course site: cs231n.stanford.edu
This lesson maps to: Lecture 18 (Human-Centered AI).

Attribution (Clawdemy-authored): Stanford CS231n: Deep Learning for Computer Vision, Fei-Fei Li, Ehsan Adeli, and Justin Johnson, Stanford University (cs231n.stanford.edu). CS231n does not publish a required citation string; this is the attribution Clawdemy uses.

A note on access and license

The current term’s lecture recordings are posted on Canvas for enrolled Stanford students. Recordings from previous years are publicly available on YouTube under YouTube’s standard license; Clawdemy links out rather than embedding or rehosting. The course notes (cs231n.github.io) and site are Stanford’s. No Creative Commons license is published for the lectures, so we treat them as link-only references.

Primary papers (cited by name and venue)

Adversarial examples

Adversarial examples original. Szegedy et al., “Intriguing properties of neural networks” (ICLR 2014). First demonstration of adversarial examples in deep networks.
Fast Gradient Sign Method (FGSM). Goodfellow, Shlens, Szegedy, “Explaining and Harnessing Adversarial Examples” (ICLR 2015). The simple, influential attack method that revealed the linearity hypothesis.
Madry et al. PGD. “Towards Deep Learning Models Resistant to Adversarial Attacks” (ICLR 2018). Adversarial training as a defense; the projected-gradient-descent attack.

Bias and fairness in computer vision

Gender Shades. Buolamwini, Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” (FAT* 2018). The disaggregated face-detection audit; foundational measurement-of-bias work.
FairFace. Karkkainen, Joo, “FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation” (WACV 2021). Balanced-by-construction face dataset.
Datasheets for Datasets. Gebru, Morgenstern, Vecchione, Vaughan, Wallach, Daumé, Crawford, “Datasheets for Datasets” (CACM 2021 / arXiv 2018). Standardized dataset-documentation practice.
Inclusive Images. Shankar et al., “No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World” (NeurIPS 2017 Workshop on Machine Learning for the Developing World). Geographic-bias measurement.

Out-of-distribution detection

OOD detection baseline. Hendrycks, Gimpel, “A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks” (ICLR 2017). Established the maximum-softmax-probability baseline.
Deep Anomaly Detection. Several lines of research building on the above.

Calibration

Calibration of modern networks. Guo, Pleiss, Sun, Weinberger, “On Calibration of Modern Neural Networks” (ICML 2017). Showed modern deep networks are systematically miscalibrated; introduced temperature scaling.

Shortcut learning

Shortcut Learning survey. Geirhos et al., “Shortcut Learning in Deep Neural Networks” (Nature Machine Intelligence 2020). The wolf-vs-husky lineage and broader pattern of spurious-feature reliance.

Distribution shift

WILDS benchmark. Koh et al., “WILDS: A Benchmark of in-the-Wild Distribution Shifts” (ICML 2021). Standardized benchmark for distribution-shift evaluation.

Further study

Model Cards. Mitchell et al., “Model Cards for Model Reporting” (FAT* 2019). Standardized model-documentation practice paralleling datasheets-for-datasets.
The Trustworthy ML literature. A broader academic field has emerged around the topics in this lesson; the references above are entry points.

Sister-track connections

This lesson’s framings appear in their own forms across Clawdemy’s planned tracks:

Track 6 (Privacy and Local-First AI). Covers privacy as a related but distinct concern; engineering controls for vendor data flows, threat models for AI systems. Frozen as of 2026-05-18 at 3 published lessons; the structural framings transfer.
T18 (planned, Reinforcement Learning). RL deployment has its own trust-and-safety considerations (exploration risks, reward hacking, distribution shift in environments).
T24 (planned, Image Generation and Multimodal). Production text-to-image systems have their own deployment considerations (provenance, watermarking, content moderation as engineering pipelines).

How we use this source

Clawdemy follows CS231n’s Lec 18 ordering (failure modes, bias, fairness and accountability, trustworthy deployment) and applies a deliberate technical-not-policy discipline per the Track 16 Phase 0 guardrail. Failure modes are presented as engineering catalog with named responses. Bias is presented as a property of training data + architecture + evaluation, with measurement (disaggregated reporting) as the first step and mitigation as a three-category set of engineering choices. The Gender Shades 2018 audit is cited as the canonical example of disaggregated-measurement-revealing-bias. The aggregate-vs-disaggregated arithmetic example in practice (1,000 images, 4 sub-groups, aggregate 92 percent hiding 26-point gap on group C) is Clawdemy-authored to make the measurement point concrete. The deployment-plan exercise is Clawdemy-authored against the standard practitioner consensus on production monitoring and human-in-the-loop design. We do not engage substantively with policy / regulatory / ethical-theory debates; those are deferred to the appropriate forums. We do not reproduce CS231n’s slides, figures, problem sets, or lecture text. Full attribution policy: see Doc/attribution-policy.md.