References: Loss and optimization

Source material

This lesson follows Stanford CS231n’s treatment of loss functions and optimization, covered across the linear-classification and optimization notes.

Course: Stanford CS231n, “Deep Learning for Computer Vision”
Instructors: Fei-Fei Li, Ehsan Adeli, and Justin Johnson (Stanford University)
Course site: cs231n.stanford.edu
Course notes (loss functions): cs231n.github.io/linear-classify (SVM and softmax / cross-entropy losses, regularization).
Course notes (optimization): cs231n.github.io/optimization-1 (the random search → random local search → gradient descent ladder; loss landscape; numerical vs analytic gradient; learning rate effects; mini-batch / SGD).
This lesson maps to: Lecture 3 (Regularization and Optimization).

Attribution (Clawdemy-authored): Stanford CS231n: Deep Learning for Computer Vision, Fei-Fei Li, Ehsan Adeli, and Justin Johnson, Stanford University (cs231n.stanford.edu). CS231n does not publish a required citation string; this is the attribution Clawdemy uses.

A note on access and license

The current term’s lecture recordings are posted on Canvas for enrolled Stanford students. Recordings from previous years are publicly available on YouTube under YouTube’s standard license; Clawdemy links out rather than embedding or rehosting. The course notes (cs231n.github.io) and site are Stanford’s. No Creative Commons license is published for the lectures, so we treat them as link-only references.

Further study

CS231n linear-classification notes. cs231n.github.io/linear-classify gives a longer, illustrated treatment of both losses, including geometric pictures of SVM hinges.
CS231n optimization notes. cs231n.github.io/optimization-1 carries the random / local / gradient descent ladder and the CIFAR-10 step-size demo numbers cited here.
Neural Network Intuition (Track 11, Clawdemy). Lessons 5-7 (cost, the cost landscape, gradient descent step by step) cover the same gradient descent picture in a generic neural-network setting; T16 readers from T11 will recognize the loop in vision-classifier dress.

How we use this source

Clawdemy follows CS231n’s pedagogical ordering (define the loss, then optimize), names the same two losses, walks the same random / local / gradient ladder, and cites the same CIFAR-10 step-size demo numbers (15.5% / 21.4% accuracy ladder; 2.20 → 1.65 vs > 2500 loss-vs-step-size). The worked-by-hand numerical examples (the [0.8, 0.3, -0.4] SVM and softmax computations, the [1.5, -0.7, -0.8] SVM-zero case in practice, the W and gradient values for the one-step exercise) are Clawdemy-authored against the CS231n framing. We do not reproduce CS231n’s slides, figures, problem sets, or lecture text. Full attribution policy: see Doc/attribution-policy.md.