Cheatsheet: The whole network as one function
The one idea that matters
Section titled “The one idea that matters”The whole network = one function: 784 numbers in → 10 numbers outEverything inside (layers, neurons, weighted sums, squishes) is its inner workings.The forward pass (worked, tiny network)
Section titled “The forward pass (worked, tiny network)”Shape: 3 inputs → 2 hidden → 2 outputs, using ReLU. Input x = [1.0, 0.5, 0.0].
Hidden: h1: w[0.5,-0.4,0.2] b 0.1 → sum 0.4 → ReLU 0.4 h2: w[-0.3,0.8,0.5] b -0.2 → sum -0.1 → ReLU 0.0 (clamped) hidden activations = [0.4, 0.0]
Output (reads [0.4, 0.0]): o1: w[0.6,0.9] b 0.0 → sum 0.24 → ReLU 0.24 o2: w[-0.5,0.3] b 0.05 → sum -0.15 → ReLU 0.0 output = [0.24, 0.0] → first class winsForward pass = lesson-3 neuron formula, applied layer by layer, each layer feeding the next.
The function framing
Section titled “The function framing”f(x ; w, b) | | | | | +-- biases (fixed; part of the network) | +----- weights (fixed; part of the network) +---------- input x (changes every time you use it)- Change
x→ different image → different output. - Change
worb→ different network → different output.
Same skeleton, different weights
Section titled “Same skeleton, different weights”| Weights | Behavior |
|---|---|
| All zeros | Every activation 0; same dead output for any input |
| Random | Outputs noise unrelated to the input |
| Well-tuned | The same skeleton reliably reads digits |
The architecture is a skeleton. The parameters make it a specific, behaving function. Picking a working network is a search through the space of all possible parameter settings.
Where the “intelligence” lives
Section titled “Where the “intelligence” lives”In the specific values of the weights and biases. Not in the structure (same for all three rows above), not in the formula (never changes). About 13,000 numbers for the digit network; billions for a modern model.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- “The network decides or understands.” It evaluates a function. No comprehension step.
- “Input and parameters are the same kind of number.” No. Input varies per use; weights and biases are fixed and define the network.
- “The architecture is what makes it smart.” No. The smarts are the parameter values.
- “There must be more than arithmetic.” No. Multiply, add, squash, repeated. Scale is the only thing that grows.
The one-line version
Section titled “The one-line version”A neural network is a function whose behavior is written entirely in its numbers. Set them well and it works; set them badly and it does not.