| In scope | Out of scope |
|---|
| Engineering side: failure modes, bias as training-data property, measurement and mitigation as design choices | Policy debates: what to permit, regulate, restrict; ethical-theory disputes |
| Treats bias and trustworthiness as measurable design problems | Belongs in different forums with legal / ethics / regulatory stakeholders |
| Sharpens policy debates with measurable inputs | Does not replace policy decisions |
| Failure mode | Mechanism | Engineering response |
|---|
| Distribution shift | Deployment data differs from training | Broader curation; domain adaptation; domain randomization; monitoring |
| Adversarial examples | Tiny crafted perturbations flip predictions | Adversarial training; certified robustness; input validation |
| Out-of-distribution (OOD) inputs | Input genuinely outside training; model confidently wrong | Calibrated confidence; OOD detection heads; ensemble disagreement |
| Shortcut learning | Model latches onto spurious correlation (wolf-vs-husky-snow) | Dataset curation; augmentation; eval splits that test shortcut-reliance |
| Calibration / overconfidence | Confidence scores misaligned with actual accuracy | Temperature scaling; isotonic regression; deep ensembles |
| Origin | Mechanism |
|---|
| Training data composition | Model fits data’s statistical structure; uneven data → uneven model |
| Web-scraped datasets | Reflect demographics / geographies / contexts of the web, which are not uniform |
| Different data, different bias profiles | This is an engineering claim, not a permanent property |
| Gender Shades 2018 | Disaggregated face-detection accuracy by skin tone + gender; error rates several times higher for darker-skinned women than lighter-skinned men; traced to training-set skew |
|---|
| Practice | What it does |
|---|
| Per-group accuracy reporting | Reveals sub-group disparities aggregate hides |
| Multi-attribute breakdown | Per demographic, geographic, contextual sub-population simultaneously |
| Audits (Gender Shades style) | Pre-deployment systematic measurement |
| Datasheets for datasets (Gebru et al. 2018) | Standardize documenting composition + intended use |
Aggregate 920/1,000 = 92.0%. Per-group:
| Group | Accuracy |
|---|
| A | 98.8% |
| B | 98.0% |
| C | 72.0% |
| D | 99.2% |
Aggregate hides 26-point gap; group C is the engineering concern.
| Category | Examples |
|---|
| Data-side | Balanced curation (Inclusive Images, FairFace); targeted collection for underrepresented sub-groups; datasheets for datasets |
| Model-side | Adversarial debiasing; loss reweighting by sub-group; multi-task with fairness-aware auxiliaries |
| Evaluation-side | Disaggregated reporting; stress-test sets probing weak sub-groups; pre-deployment audits |
| Gap | Detail |
|---|
| Benchmark vs reality | Test set ≠ production; drift, edge cases, sub-group disparities surface in deployment |
| Closed by | Monitoring; calibration; human-in-the-loop; explicit failure-mode plans |
| What to monitor | Per-group accuracy over time; OOD-input rate; confidence distribution; downstream metrics |
| Calibration matters | Lets system know when to defer (human escalation, refuse to act) |
| Layer | Purpose |
|---|
| Trained model | The base classifier / detector / generator |
| Data pipeline | Production input flow; pre-processing |
| Evaluation suite | Pre-deployment + ongoing |
| Monitoring dashboard | Drift detection in production |
| Calibration | Make confidence usable for deferral |
| Human-review queue | Route uncertain predictions to people |
| Failure-mode plan | Rollback procedure; graceful degradation; escalation chain |
| Pitfall | Reality |
|---|
| Bias is one-and-done | Property of data + model + eval; shifts as any shift; multi-dimensional; continuous measurement is the posture |
| High test accuracy = guarantee | Distribution shift, shortcut learning, calibration, tail events live in the gap |
| ”Trustworthy AI” is one problem | Several problems with different mitigations: distribution shift, robustness, fairness, OOD, calibration, interpretability |
| Engineering = policy | Engineering treats measurable design; policy decides acceptability. Both needed; this lesson is only the engineering view |
| Phase | Lessons | Theme |
|---|
| 1: Foundations | L1-L4 | General-purpose image classifier |
| 2: How machines see | L5-L9 | Vision-specific architecture (conv, CNNs, sequence, detection, video) |
| 3: Generating and grounding vision | L10-L15 | Harder tasks (self-supervised, GAN/VAE, diffusion, 3D, vision+language, world modeling) |
| Close | L16 | Engineering-side human-centered view |
| To go deeper | Track |
|---|
| Foundations review | T11 (Neural Network Intuition) |
| Attention + transformers (in depth) | T5 (AI Foundations) |
| Transformer mechanics | T14 (planned, Practical Transformers) |
| Model-based RL | T18 (planned, Reinforcement Learning) |
| ELBO + score-based derivations | T19 (planned, Generative Modeling) |
| Production text-to-image + multimodal + video | T24 (planned, Image Generation and Multimodal) |
Vision systems work mechanically AND fail mechanically; the engineering view treats failure modes, bias, and trustworthiness as measurable design problems with engineering responses (data curation, calibration, monitoring, sub-group reporting, human-in-the-loop, failure plans), while leaving policy debates to the right stakeholders.