Summary: The whole network as one function

The first three lessons named a goal (a function from 784 numbers to 10) and then built the parts: layers of neurons, and the multiply-add-squash each neuron runs. This lesson steps back to see the whole machine, and the whole machine turns out to be exactly the function we promised. Running it is just the per-neuron formula applied layer by layer. The payoff is a single reframe that powers most of modern AI: a network is a function, and everything it can do lives in the specific values of its weights and biases, nothing more. This is the scan-it-in-five-minutes version.

Core ideas

The whole network is one function. It takes 784 numbers in and gives 10 out. The layers, neurons, weighted sums, and squishes are its inner workings; from the outside it is as ordinary as any function. This is literal, not a metaphor.
Running it is the forward pass. Apply lesson 3’s neuron formula layer by layer, each layer’s activations feeding the next, until the output layer holds the answer. Worked tiny example (3 inputs, 2 hidden, 2 outputs, ReLU): input [1.0, 0.5, 0.0] produces hidden [0.4, 0.0] and output [0.24, 0.0], so the first class wins.
The parameters are the function. Written f(x; w, b): the input x changes every time you use the network; the weights and biases w, b are fixed and define which network you have. Change x and you asked about a different image; change w or b and you are holding a different network.
Same skeleton, wildly different behavior. Set all weights to 0 and the network gives the same dead output for anything; set them randomly and it outputs noise; set them well and the identical 784-16-16-10 skeleton reads digits. The architecture is a skeleton; the parameters make it behave.
Building a network is a search. Every setting of the roughly 13,000 weights and biases is one specific function. Almost all are useless; a few work. The job is to find a good point in that vast parameter space.
The “intelligence” is in the numbers. Not in the structure (same for the useless and the useful network) and not in the formula (never changes), but entirely in the parameter values: about 13,000 for the digit net, billions for a modern model. There is no understanding step anywhere; the network evaluates a function the way a calculator does.

What changes for you

Carry one mental model out of this chapter and let it be this: an AI model is a function, and its behavior is fixed by its parameters. That cuts through a surprising amount of confusion. It explains why a model returns the same answer to the same input when its settings are held steady, why “fine-tuning” means adjusting parameters rather than teaching in any human sense, why two models can share an architecture and behave nothing alike, and why a model has no thoughts about your question, it is evaluating arithmetic, just at enormous scale. That sets up the one question this whole chapter circled: nobody types 13,000 numbers by hand, let alone billions, so how are the right values found? The search has a name, and Phase 2 begins it. It is called learning.