Skip to content

Cheatsheet: What learning really means

cost C(w, b) = one number for how wrong the network is right now
learning = adjust the weights and biases to make C as small as possible

No understanding installed. Just a wrongness number going down.

  1. Write the desired output as one-hot: 1 in the correct slot, 0 elsewhere. (For a “3”: [0,0,0,1,0,0,0,0,0,0].)
  2. For each of the 10 outputs, take (network value minus desired value).
  3. Square each difference.
  4. Sum the 10 squares. That is the cost for this image.
  5. Average over the whole training set for the total cost.

(Sum of squared differences is the choice the 3B1B series uses; other cost functions exist.)

Network outputReadingCost
[.02,.01,.05,.92,.03,.04,.01,.02,.01,.02]confident, correct0.0129 (low)
[.1,.1,.1,.1,.1,.1,.1,.1,.1,.1]total shrug0.90 (high)

Bad math, the high one: 9·(0.1)² + (0.1-1)² = 0.09 + 0.81 = 0.90. The single big miss (0.1 where 1 was wanted) contributes 0.81 because squaring makes big misses dominate.

network: f(x ; w, b) input = an image, output = a guess
cost: C(w, b) input = a whole network, output = a wrongness score

For a fixed training set, only w and b are free to move. C maps the ~13,000 parameters to one number. Learning = find the (w, b) that minimizes C.

  • ~13,000 dials, not 2.
  • C is bumpy and complicated, not a tidy bowl.
  • Brute force is impossible (the combinations are beyond astronomical).
  • Need a method that finds “downhill” from wherever you stand. That is L6 (the landscape) and L7 (gradient descent).
  • “Learning means the network understands.” No. Knobs turn; a number drops.
  • “Cost is the output.” No. Output is 10 numbers per image; cost is one score over the whole set.
  • “Low training cost means good, period.” No. It means good on what it was scored against; unseen images are a separate question.
  • “Cost is a function of the image.” No. For a fixed training set, cost is a function of the weights and biases.

Learning is a search for the knob settings that make one wrongness number as small as it can go.