Skip to content

Cheatsheet: Linear classifiers

SymbolMeaningShape (CIFAR-10)
ximage flattened to a column of pixel values[3072 by 1]
Wweight matrix (one row per class)[10 by 3072]
bbias vector (one number per class)[10 by 1]
soutput scores, one per class[10 by 1]

s = W · x + b. Prediction = argmax(s).

ThingWhat it is
One row of WLearned template for one class; visualize by reshaping back to image dims
One column of WAll K classes’ weights for ONE pixel
The dot product (row of W) · xScore for that class: how well image matches template
The bias bPer-class default offset (lean), independent of input
PredictionThe class with the highest score
BeforeAfter
s = W · x + b, W is [K by D], x is [D by 1], b is [K by 1]s = W · x, W is [K by D+1], x is [D+1 by 1] (last entry = 1)

Same scores; one matrix instead of two.

ViewWhat it says
Template viewEach row of W is a learned template; score is dot product (template matching)
Geometric viewEach row of W + bias = one hyperplane in pixel space; classify by which side image lands
SetupTotal learned numbers
8 × 8 grayscale, 3 classes192 + 3 = 195
32 × 32 × 3 (CIFAR-10), 10 classes30,720 + 10 = 30,730
224 × 224 × 3, 1000 classes (ImageNet-scale)150,528,000 + 1000 = 150,529,000
PitfallReality
Templates are photographs of class membersThey are learned compromises that max train-set score (ghostly two-headed horse)
Rows vs columns of WRow = one class; column = all classes’ weights for one pixel
Scores are probabilitiesThey are unbounded real numbers; ranking only. Softmax converts later
Linear is enough~40 percent on CIFAR-10. One template per class is the structural ceiling

Multiply, add, pick the largest: a linear classifier is template-matching with learned templates, and it is the last layer of nearly every modern vision model.