| Symbol | Meaning | Shape (CIFAR-10) |
|---|
| x | image flattened to a column of pixel values | [3072 by 1] |
| W | weight matrix (one row per class) | [10 by 3072] |
| b | bias vector (one number per class) | [10 by 1] |
| s | output scores, one per class | [10 by 1] |
s = W · x + b. Prediction = argmax(s).
| Thing | What it is |
|---|
| One row of W | Learned template for one class; visualize by reshaping back to image dims |
| One column of W | All K classes’ weights for ONE pixel |
The dot product (row of W) · x | Score for that class: how well image matches template |
| The bias b | Per-class default offset (lean), independent of input |
| Prediction | The class with the highest score |
| Before | After |
|---|
s = W · x + b, W is [K by D], x is [D by 1], b is [K by 1] | s = W · x, W is [K by D+1], x is [D+1 by 1] (last entry = 1) |
Same scores; one matrix instead of two.
| View | What it says |
|---|
| Template view | Each row of W is a learned template; score is dot product (template matching) |
| Geometric view | Each row of W + bias = one hyperplane in pixel space; classify by which side image lands |
| Setup | Total learned numbers |
|---|
| 8 × 8 grayscale, 3 classes | 192 + 3 = 195 |
| 32 × 32 × 3 (CIFAR-10), 10 classes | 30,720 + 10 = 30,730 |
| 224 × 224 × 3, 1000 classes (ImageNet-scale) | 150,528,000 + 1000 = 150,529,000 |
| Pitfall | Reality |
|---|
| Templates are photographs of class members | They are learned compromises that max train-set score (ghostly two-headed horse) |
| Rows vs columns of W | Row = one class; column = all classes’ weights for one pixel |
| Scores are probabilities | They are unbounded real numbers; ranking only. Softmax converts later |
| Linear is enough | ~40 percent on CIFAR-10. One template per class is the structural ceiling |
Multiply, add, pick the largest: a linear classifier is template-matching with learned templates, and it is the last layer of nearly every modern vision model.