Cheatsheet: Seeing it whole, and where next
The whole track in one chain
Section titled “The whole track in one chain”| Lesson | The piece it added |
|---|---|
| 1 | A network is a function: 784 numbers in, 10 out. Stop writing rules, show examples. |
| 2 | Built from layers of neurons; a neuron holds one number 0-1 (its activation). |
| 3 | Each neuron: weighted sum of inputs, plus a bias, through a squish. |
| 4 | The whole network is one function with ~13,000 knobs (weights + biases). |
| 5 | ”Learning” = make the cost (a wrongness score) small. |
| 6 | Picture the cost as a landscape; the negative gradient points downhill. |
| 7 | Gradient descent: step downhill, repeat. |
| 8 | Backprop: desires propagate backward; one sweep gives the whole gradient. |
| 9 | That sweep is the chain rule, run backward through the layers. |
The training loop
Section titled “The training loop”forward pass → cost → backward pass (backprop) → update (gradient descent) → repeatOne pass over the whole training set = one epoch. Training runs many epochs.
One step, start to finish
Section titled “One step, start to finish”messy "3" image (784 numbers) → forward pass → output [0.1, 0.05, 0.0, 0.2, 0.5, ...] (tallest is "4": wrong) → cost ≈ 0.90 (high; desired was 1 at the "3" slot) → backprop → every knob's downhill nudge → update → all ~13,000 knobs step a hair downhill → network is slightly less wrong on this image → repeat across thousands of images, many epochs → it learnsThe one picture to keep
Section titled “The one picture to keep”A row of dials, and a landscape behind them. Where the dials sit = where you stand; your height = how wrong you are. Training = feel downhill, turn every dial a hair that way, repeat until you settle in a low valley. Forward pass reads your height; backprop feels the slope; gradient descent takes the step.
What was deferred (and to where)
Section titled “What was deferred (and to where)”| Topic | Where |
|---|---|
| Convolutional nets, transformers | Track 5 (AI Foundations) covers transformers; future CV track |
| Smarter optimizers (momentum, Adam) | further study |
| Regularization, dropout, batch norm | further study |
| Fine-tuning, transfer learning | further study |
| Building it in real code | Track 13 |
Where to go next
Section titled “Where to go next”- Build it yourself → Track 13 (Build Neural Networks from Scratch). This track’s gradient descent and backprop, written as Python you run.
- Understand modern LLMs → Track 5 (AI Foundations). It covers transformers; a transformer is a neural network, so this foundation carries straight over.
- Use AI to build things → Track 20 (AI Agents and Tool Use). Agents built on top of trained networks.
The one-line version
Section titled “The one-line version”A neural network is a row of dials and a landscape behind them; training is a patient walk downhill. You can now picture it, reason about it, and know where to look next.