Linear classifiers, in brief

What you’ll learn

This is lesson 2 of Phase 1 (Foundations for vision). Lesson 1 made the case that vision must be data-driven; this lesson is the simplest concrete machine that carries that out. The one capability it builds: you will be able to compute a linear-classifier prediction by hand and explain exactly what each piece of s = W · x + b is doing. That equation is the seed every later vision model grows from, and the final layer of nearly every modern vision model is still of this same form. The source curriculum is Stanford CS231n, cs231n.stanford.edu.

The lesson defines the score function, grounds it in CIFAR-10’s shapes (an image as 3072 pixel numbers, ten classes, W as a 10-by-3072 matrix), walks one small prediction step by step, shows that each row of W is a learned template for one class (and visualizing CIFAR-10’s actual learned templates produces the famously ghostly two-headed horse), explains the geometric hyperplane view, and ends on the structural limit, one template per class, that motivates everything that follows.

Where this fits

This is lesson 2 of 16, and the second lesson of Phase 1. It depends directly on lesson 1’s “data-driven approach” framing: the linear classifier is the simplest learner that approach produces. The next lesson, How a classifier learns: loss and optimization, defines exactly what “predictions match labels” means as a single number (the loss) and shows how to nudge W and b to make it smaller. Phase 1 closes with neural networks and backpropagation, after which Phase 2 introduces the convolutional networks that finally break the multi-modal limit named here.

Before you start

Prerequisites: lesson 1 of this track (Why seeing is hard for machines), which sets up the data-driven approach this lesson realizes. Neural Network Intuition (Track 11) is helpful soft background, the per-neuron w · x + b from its lesson 3 is the same computation done per-class here.

About the math

Light, but more arithmetic than lesson 1. The only operations are multiplying pairs of numbers and adding them up (a dot product), plus checking matrix shapes (K-by-D times D-by-1 = K-by-1). The body works one tiny prediction by hand, the practice section walks you through another with different numbers, and a parameter-counting exercise multiplies pixel-count times class-count. Nothing beyond arithmetic and shape-bookkeeping is required.

By the end, you’ll be able to

Write the score function s = W · x + b and name what each symbol is, with CIFAR-10 shapes
Compute a small linear-classifier prediction by hand
Explain why each row of W is a learned per-class template
Describe the geometric (hyperplane) view and the bias’s role in it
Identify the one-template-per-class limit and explain why it motivates the lessons ahead

Time and difficulty

Read time: about 13 minutes
Practice time: about 15 minutes (a fresh worked dot-product prediction, parameter-counting arithmetic at three scales, a multi-modal reasoning question, plus flashcards)
Difficulty: standard (the math is multiplication and addition; the conceptual jump is the template interpretation and seeing the limit clearly)