Practice: The whole network as one function

Self-check

Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.

1. In what sense is the whole neural network “one function”?

Show answer

Literally. A function takes inputs and reliably produces outputs, and the network takes 784 numbers in and gives 10 numbers out. All the layers, neurons, weighted sums, and squishes are the inner workings of that single function. It looks complicated only because it has many internal steps; from the outside it is just numbers in, numbers out.

2. What is the “forward pass”?

Show answer

The process of feeding an input through the network to get an output. Mechanically, it is lesson 3’s neuron formula applied layer by layer, with each layer’s activations becoming the next layer’s inputs, until the output layer’s activations are the answer.

3. In f(x; w, b), what is the difference between x and w, b, and why does the semicolon matter?

Show answer

x is the input (the 784 pixel values) and it changes every time you use the network. w and b are the weights and biases: fixed numbers baked into the function, the same for every image. The semicolon separates the thing that varies per use (x) from the thing that defines which network this is (w, b). Mixing these two up is the most common confusion in the chapter.

4. Same 784-16-16-10 skeleton, three weight settings: all zeros, random, well-tuned. What does each do?

Show answer

All zeros: every activation is 0 for any input, so the network gives the same dead output for a 3, a 7, or a cat. Random: it produces outputs, but they are noise unrelated to the image. Well-tuned: the very same skeleton reliably reads handwritten digits. The skeleton never changed; only the parameters did. Behavior lives entirely in the parameter values.

5. The lesson reframes building a network as a “search.” A search for what?

Show answer

For a good point in parameter space. Every possible setting of the roughly 13,000 weights and biases is one specific function. Most settings (all zeros, random ones) are useless; somewhere in that vast space is a setting that makes the function read digits well. Building a working network means finding such a setting, which is what learning will do.

6. Someone asks why a model gives the exact same answer to the exact same input every time. How does the function view explain it?

Show answer

Because evaluating a function with the same input and the same fixed parameters always gives the same output, exactly like a calculator. With the weights and biases held steady, the function has not changed, so the output cannot change. (This also explains why “fine-tuning” means adjusting parameters, and why two models with the same architecture can behave nothing alike: different parameters, different function.)

Try it yourself, part 1: run a full forward pass

Pen and paper, about 8 minutes. A complete network, input to output, using ReLU (max(0, x)).

Setup. A tiny network: 2 inputs, one hidden layer of 2 neurons, 1 output neuron. The input is x = [1.0, 2.0]. The parameters:

h1: weights [0.5, 0.5],  bias 0
h2: weights [1.0, -1.0], bias 0.5
o1: weights [0.4, 0.6],  bias 0.2   (reads the two hidden activations)

Steps. Compute both hidden activations (weighted sum, add bias, ReLU), then feed those into the output neuron and compute its activation.

Show answer

Hidden layer, reading the input [1.0, 2.0]:

h1: (0.5·1.0) + (0.5·2.0) + 0   = 0.5 + 1.0 = 1.5   → ReLU(1.5) = 1.5
h2: (1.0·1.0) + (-1.0·2.0) + 0.5 = 1.0 - 2.0 + 0.5 = -0.5 → ReLU(-0.5) = 0.0
hidden activations = [1.5, 0.0]   (h2 went dark, clamped by ReLU)

Output layer, reading [1.5, 0.0]:

o1: (0.4·1.5) + (0.6·0.0) + 0.2 = 0.6 + 0 + 0.2 = 0.8 → ReLU(0.8) = 0.8

The network’s output is 0.8. You just ran a complete forward pass: nothing but the lesson-3 multiply-add-squash, applied layer by layer. A real 784-16-16-10 network is the identical procedure with more steps.

Try it yourself, part 2: input or parameter?

About 4 minutes. For each scenario, say whether what changed is the input (x) or the parameters (w, b), and predict the effect on the output.

You feed the trained digit network a photo of a 7 instead of a 3.
You set every weight and bias in the network to 0.
You fine-tune the model on a new batch of labeled images.
You show the network the exact same image twice, with all settings held fixed.

Show answer

Input changed. Different picture, so different 784 numbers go in, so the output changes. The network is the same; you just asked it about a different image.
Parameters changed (drastically). Every activation becomes 0 for any input, so the output is the same dead result regardless of the image. You are now holding a different (and useless) network.
Parameters changed. Fine-tuning adjusts the weights and biases. The architecture is untouched, but you now have a different function that should behave differently (hopefully better) on new inputs.
Nothing changed. Same input, same fixed parameters, so the function returns the identical output both times. That determinism is exactly what the function view predicts.

Flashcards

Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.

Q. In what sense is a neural network one function?

Literally: it takes 784 numbers in and gives 10 numbers out. All the layers, neurons, weighted sums, and squishes are its inner workings. Complicated inside, but from outside just numbers in, numbers out.

Q. What is the forward pass?

Feeding an input through the network to get an output: lesson 3’s neuron formula applied layer by layer, each layer’s activations becoming the next layer’s inputs, until the output layer holds the answer.

Q. In f(x; w, b), what does each part mean?

x is the input (784 pixel values), which changes every time you use the network. w and b are the weights and biases: fixed numbers that define which network this is. The semicolon separates per-use input from the network-defining parameters.

Q. What happens if you change x versus change w or b?

Change x (the input): different image, different output, same network. Change w or b (the parameters): you are now holding a different network, so the output changes even for the same image.

Q. Same skeleton, all-zero weights versus well-tuned weights?

All zeros: every activation is 0, same dead output for any input. Well-tuned: the identical skeleton reliably reads digits. The architecture never changed; behavior lives entirely in the parameter values.

Q. Why is building a working network a 'search'?

Every setting of the ~13,000 weights and biases is one specific function. Almost all settings are useless; a few make the function read digits. Building a network means searching that vast parameter space for a good setting.

Q. Where does a model's 'intelligence' actually live?

In the specific values of its weights and biases. Not in the structure (a skeleton, same for useless and useful networks) and not in the formula (never changes). About 13,000 numbers for the digit net; billions for a modern model.

Q. Why does a model give the same answer to the same input?

Because evaluating a function with the same input and fixed parameters always gives the same output, like a calculator. Nothing changed, so the output cannot change. There are no thoughts, just arithmetic.

Q. Is there understanding inside a neural network?

No. It evaluates a function: many layers of weighted sums and squishes, parameterized by many numbers. Set the numbers well and the answers are useful. There is no awareness or comprehension step anywhere.