Neural networks recap: cheatsheet

The whole track in one chain

Lesson	The piece it added
1	A network is a function: 784 numbers in, 10 out. Stop writing rules, show examples.
2	Built from layers of neurons; a neuron holds one number 0-1 (its activation).
3	Each neuron: weighted sum of inputs, plus a bias, through a squish.
4	The whole network is one function with ~13,000 knobs (weights + biases).
5	”Learning” = make the cost (a wrongness score) small.
6	Picture the cost as a landscape; the negative gradient points downhill.
7	Gradient descent: step downhill, repeat.
8	Backprop: desires propagate backward; one sweep gives the whole gradient.
9	That sweep is the chain rule, run backward through the layers.

The training loop

forward pass  → cost → backward pass (backprop) → update (gradient descent) → repeat

One pass over the whole training set = one epoch. Training runs many epochs.

One step, start to finish

messy "3" image (784 numbers)
  → forward pass → output [0.1, 0.05, 0.0, 0.2, 0.5, ...]  (tallest is "4": wrong)
  → cost ≈ 0.90  (high; desired was 1 at the "3" slot)
  → backprop → every knob's downhill nudge
  → update → all ~13,000 knobs step a hair downhill
  → network is slightly less wrong on this image
  → repeat across thousands of images, many epochs → it learns

The one picture to keep

A row of dials, and a landscape behind them. Where the dials sit = where you stand; your height = how wrong you are. Training = feel downhill, turn every dial a hair that way, repeat until you settle in a low valley. Forward pass reads your height; backprop feels the slope; gradient descent takes the step.

What was deferred (and to where)

Topic	Where
Convolutional nets, transformers	Track 5 (AI Foundations) covers transformers; future CV track
Smarter optimizers (momentum, Adam)	further study
Regularization, dropout, batch norm	further study
Fine-tuning, transfer learning	further study
Building it in real code	Track 13

Where to go next

Build it yourself → Track 13 (Build Neural Networks from Scratch). This track’s gradient descent and backprop, written as Python you run.
Understand modern LLMs → Track 5 (AI Foundations). It covers transformers; a transformer is a neural network, so this foundation carries straight over.
Use AI to build things → Track 20 (AI Agents and Tool Use). Agents built on top of trained networks.

The one-line version

A neural network is a row of dials and a landscape behind them; training is a patient walk downhill. You can now picture it, reason about it, and know where to look next.