The whole network as one function: brief

What you’ll learn

The first three lessons named a goal (a function from 784 numbers to 10) and then got busy with the parts: layers of neurons, and the multiply-add-squash that each neuron runs. This lesson steps back so you can see the whole machine at once, and the whole machine turns out to be exactly the function lesson 1 promised.

You will see that the entire network is, literally, one function: 784 numbers in, 10 numbers out, with every layer and neuron as its inner workings. You will run a complete forward pass by hand on a tiny network, watching the per-neuron formula apply layer by layer. You will meet the f(x; w, b) framing that cleanly separates the input x (which changes every time you use the network) from the weights and biases w, b (which are fixed and define which network you have). And you will see that the same 784-16-16-10 skeleton behaves completely differently, dead, noisy, or digit-reading, depending only on its parameter values. The chapter’s payoff is a single durable idea: a network is a function, and all its capability lives in those numbers, which sets up the question the rest of the track answers, how the right numbers get found.

Where this fits

This is lesson 4, the last of the three structure lessons (lesson 1 set up the problem) and the close of the Phase 1 arc. Lesson 1 named the function, lesson 2 arranged neurons into layers, lesson 3 gave the per-neuron computation, and this lesson assembles all of it into the whole function and reframes the goal as a search through parameter space. That reframe is the bridge into Phase 2, which opens with how you measure a network’s wrongness and then how you step its parameters toward better behavior. After this lesson, the structural picture is complete; everything ahead is about learning.

Before you start

Prerequisite (within this track): lesson 3, Weights, biases, and the squish, since the forward pass here is just that single-neuron computation applied repeatedly. If you can run one neuron (weighted sum, add bias, squash), you can run the whole network. No new math is introduced; a calculator is optional for the practice, and no coding or installation is needed.

By the end, you’ll be able to

Explain that the whole network is a single function from 784 numbers to 10, with the layers as its inner workings
Run a complete forward pass by hand, applying the neuron formula layer by layer
Use the f(x; w, b) framing to distinguish the per-use input from the fixed weights and biases that define the network
Explain why the same architecture behaves completely differently under different parameter values, and why building a network is a search through parameter space
Locate a network’s behavior in its specific parameter values rather than in its structure or formula

Time and difficulty

Read time: about 10 minutes
Practice time: about 14 minutes (running a full forward pass by hand, an input-versus-parameters drill, and flashcards)
Difficulty: standard