Human-centered computer vision, in brief

What you’ll learn

This is the closing lesson of Track 16 (lesson 16 of 16). The one capability it builds: you will be able to read any deployment claim about a vision system and ask the right engineering questions, what was the training data distribution, what sub-groups were measured, how calibrated is the confidence, what is the monitoring story, what is the failure-mode plan. The source curriculum is Stanford CS231n, cs231n.stanford.edu; this lesson maps to Lecture 18 (Human-Centered AI).

A scope note up front. Vision systems raise real questions involving law, regulation, ethics, and policy. Those questions matter. They are not what this lesson is about. This lesson treats the engineering side: how failures arise mechanically, how bias is a property of training-data composition + architecture + evaluation procedure, how to measure those properties, what design and process choices reduce them. Policy debates around what to permit, regulate, or restrict belong in their own forums with the right stakeholders (legal, ethics, regulatory). The engineering view does not replace those debates; it gives them sharper inputs.

The lesson opens with the failure-mode engineering catalog (distribution shift, adversarial examples, out-of-distribution inputs, shortcut learning, calibration / overconfidence), treats bias as a training-data engineering property (Gender Shades 2018 audit as the canonical disaggregated-measurement example; three categories of mitigation), names the trustworthiness gap between benchmark accuracy and real-world reliability and the engineering layers that close it, and closes with a track-closing summary and cross-track routing.

Where this fits

This is lesson 16 of 16: the close of Track 16. It depends on the L8 detection / segmentation / visualization lesson (which set up the wolf-vs-husky-snow shortcut-learning example) and the L14 vision-and-language lesson (which introduced “bias is a property of training data” as the framing this lesson generalizes). It is the deployment-grade-thinking bridge that ties the entire track’s mechanics work to real-world engineering. After this lesson: T16 is complete-drafted (16 lessons, 6-artifact contract throughout).

Before you start

Prerequisites: lessons 8 (detection / segmentation / visualization, including the wolf-vs-husky-snow shortcut-learning case) and 14 (vision and language, including the CLIP-bias-as-training-data-property framing). Lessons 3 (loss + optimization) and 4 (neural networks + backprop) are background: this lesson does not introduce new math but draws on the training-loop intuition built earlier.

About the math

Light. The body has no equations; it works in concepts and engineering decisions. Practice includes one aggregate-vs-disaggregated arithmetic exercise (4 sub-groups, 250 images each, computing per-group accuracy from raw correct-counts and observing how a 92 percent aggregate hides a 26-point sub-group gap). Addition, division, and a percentage calculation; no calculus.

By the end, you’ll be able to

Distinguish engineering scope from policy scope and articulate why both matter
Name the five failure modes and their engineering responses
Explain bias as a training-data engineering property and the three categories of mitigation
Compute aggregate vs disaggregated accuracy and see what aggregate hides
Articulate the trustworthiness gap and the engineering layers that close it

Time and difficulty

Read time: about 14 minutes
Practice time: about 15 minutes (a failure-mode diagnosis across 4 scenarios, a disaggregated-evaluation arithmetic exercise, a deployment-plan sketch incorporating monitoring and failure-mode planning, plus flashcards)
Difficulty: standard (the concepts are operational rather than mathematical; the lift is reasoning through deployment engineering as a multi-layer system rather than treating “the model” as the whole story)