Neurons as numbers, layers as structure

Last lesson we named what we were after: a function that takes 784 numbers in (the brightness of every pixel in a 28 by 28 image) and gives 10 numbers out (one score per possible digit). We left that function as a sealed box and promised to open it later. This is later.

So let us open it. The good news is that what is inside is far less mysterious than the word “neural network” makes it sound. There are no tiny brains in there, no electricity, no thinking. There are layers of numbers, and the numbers flow from one layer to the next. That is genuinely most of the picture. Let us build it one piece at a time.

A neuron is just a number

Start with the smallest part. In a neural network, a neuron is not a cell, not a switch, not a little circuit. A neuron is a container that holds a single number between 0 and 1. That is the entire definition.

That number has a name: the neuron’s activation. When a neuron’s activation is near 1, people say it is “lit up” or “firing.” When it is near 0, the neuron is quiet. In between is in between. A neuron holding 0.7 is mostly lit; one holding 0.05 is nearly dark.

A neuron is not a cell, a switch, or a tiny brain. It is a container that holds a single number between 0 and 1, called its activation. Every time you read "neuron" for the rest of this track, picture a little box with one number in it. A network is just a lot of these boxes with numbers flowing between them.

That is worth pausing on, because the word “neuron” carries so much baggage. Forget the biology. For the rest of this track, every time you read “neuron,” picture a little box with one number in it from 0 to 1. The whole network is just a lot of these boxes, arranged in a particular way, with numbers flowing between them.

The input layer: one neuron per pixel

Now we arrange the boxes. The first group, called the input layer, is where the image comes in.

Our image is 28 pixels wide and 28 pixels tall, which is 28 times 28, or 784 pixels in total. So we build an input layer with exactly 784 neurons, one for each pixel. Each neuron’s activation is set to the brightness of its pixel: 0 for a fully black pixel, 1 for a fully white one, and a value in between for gray.

The thing your eye reads instantly as "a 3" enters the network as 784 numbers sitting in 784 boxes, nothing more. Each input neuron holds one pixel's brightness, 0 for black up to 1 for white. The example pixel at row 10, column 14 lands in neuron 10 times 28 plus 14, neuron 294, holding 0.7.

That is the whole input layer. The thing your eye reads instantly as “a 3” enters the network as 784 numbers sitting in 784 boxes. Nothing more.

Let us make it concrete with one pixel. Suppose we look at the pixel in row 10, column 14 of the image, and it is a medium gray with brightness 0.7. If we number the neurons row by row, that pixel lands in neuron number 10 times 28 plus 14, which is neuron number 294. So neuron 294 in the input layer holds the activation 0.7. Do that for all 784 pixels and the image is fully loaded into the network.

The output layer: one neuron per possible answer

Jump to the far end. The last group, the output layer, is where the answer comes out.

There are exactly ten things the network can answer, the digits 0 through 9, so the output layer has ten neurons, one per digit. After the network has done its work, each output neuron holds an activation that we read as a confidence score. The neuron with the highest activation is the network’s guess.

After the network does its work, each of the ten output neurons holds a confidence score. Here the 3 neuron holds 0.92, far above the rest, so the answer is "3" and the high value says the network is confident. If two bars were close, say 0.45 and 0.43, that would be the network hesitating between two digits.

Say that after processing an image, the ten output neurons hold these activations, in order from 0 to 9:

0: 0.02   1: 0.01   2: 0.05   3: 0.92
4: 0.03   5: 0.04   6: 0.01   7: 0.02
8: 0.01   9: 0.02

Scan for the largest. The neuron for the digit 3 holds 0.92, far above all the others. So the network’s answer is “3,” and the high value tells us it is confident. If two neurons were close, say 0.45 and 0.43, that would be the network hesitating between two digits. Reading the output is just finding the tallest bar.

Hidden layers: the part in the middle

So far we have the image coming in (784 neurons) and the answer coming out (10 neurons). What connects them? Everything in between, called the hidden layers.

“Hidden” only means “not the input and not the output.” These are the in-between boxes that do the actual work of turning raw pixel brightness into a digit guess. In the classic example this track follows, there are two hidden layers, each with 16 neurons. So the full structure looks like this:

Input layer: 784 neurons (one per pixel)
Hidden layer 1: 16 neurons
Hidden layer 2: 16 neurons
Output layer: 10 neurons

The whole example network: 784 input neurons (one per pixel), two hidden layers of 16, and 10 output neurons, 826 in all. Each layer feeds the next, always forward, no loops and no going back. A real, working digit recognizer that fits in a number you can say out loud.

Add those up and the example network has 784 plus 16 plus 16 plus 10, which is 826 neurons in total. A real, working digit recognizer, and it fits in a number you can say out loud.

You might reasonably ask: why two hidden layers, and why 16 each? The honest answer is that these are design choices, not laws. Different networks use different numbers, and picking them is part of the craft of building one. Two layers of 16 is simply a clean, small choice that is big enough to learn the patterns and small enough to picture. Do not read deep meaning into the exact figures; read them as “enough room to work.”

Feedforward: the numbers flow one way

There is one more structural fact, and it is the reason this kind of network is called feedforward. The numbers move in a single direction: from the input layer, into the first hidden layer, into the second, and out through the output layer. Forward, always forward. No loops, no going back, no neuron in an earlier layer listening to a later one.

Each layer takes the activations of the layer before it and produces the activations of the layer after it. The image enters as 784 numbers, gets transformed into 16 numbers, then another 16, then finally 10. That one-directional flow is the simplest neural network architecture there is, and it is the one we are building our intuition on.

The hope for what the middle does

Here is the appealing story for why hidden layers might help, and it is worth telling clearly as long as we are honest that it is a hope, not a guarantee.

You might imagine that the first hidden layer learns to notice small pieces of a digit, like a short edge or a little curve. The second hidden layer might then notice larger patterns, like a full loop or a long stroke, by combining those small pieces. And the output layer might assemble those larger patterns into whole-digit guesses: a loop on top of a loop leans toward 8, an open curve over a flat base leans toward 3.

An appealing picture: the first hidden layer notices small edges, the second combines them into loops and strokes, the output assembles those into a whole-digit guess. It is the right framing to hold for now, but hold it loosely. Whether a real network organizes itself this neatly is a genuine question, and the honest answer is that it is usually messier.

It is a lovely, tidy picture, and it is the right framing to hold for now. But hold it loosely. Whether a real trained network actually organizes itself this neatly is a genuine question, and the honest answer, which a later lesson gets into, is that the patterns a network really learns tend to be messier and less human-readable than this clean edges-to-loops-to-digits story suggests. For this lesson, the hope is the framing; just keep a mental asterisk on it.

Why this matters when you use AI

When you hear that a model has “billions of neurons” or read a headline about an AI’s “brain,” it is easy to picture something alive and thinking. This lesson is the antidote. A neuron is a number between 0 and 1. A network is layers of those numbers with values flowing forward through them. “Billions of neurons” means billions of little numbers, nothing spookier.

That reframing is genuinely useful when you use AI tools. It explains why these systems have no awareness of what they are doing, why their “confidence” is literally just which output number came out tallest, and why the same architecture can read digits, faces, or audio without caring which: it is always numbers in, numbers out, the same flow. Once the word “neuron” stops sounding like biology and starts sounding like “a number in a box,” a lot of AI hype quietly deflates into something you can reason about.

Common pitfalls

Thinking a neuron is like a brain cell. The name is borrowed, the resemblance is not. A neuron here is a container for one number between 0 and 1. No biology required, and the analogy mostly gets in the way.

Thinking activations are on-or-off. An activation is any value from 0 to 1, not just 0 or 1. A neuron at 0.6 is partly lit. The in-between values are where most of the information lives.

Reading meaning into “2 hidden layers of 16.” Those numbers are a design choice for one teaching example, not a rule. Real networks vary enormously. Treat them as “enough room,” not as a magic recipe.

Taking the edges-to-loops story as fact. It is the hope for what hidden layers do, and a useful first picture, but a trained network does not reliably organize itself that cleanly. Hold the story as motivation, not as a description of what is provably happening inside.

What you should remember

A neuron is a container holding one number between 0 and 1, called its activation. That is the whole definition. Forget the biology.
The input layer has one neuron per pixel (784 for a 28 by 28 image), each holding that pixel’s brightness; the output layer has one neuron per answer (10 for the digits), and the tallest activation is the guess.
Hidden layers sit in between and do the work of turning pixels into a guess; the example network is 784, 16, 16, 10, which is 826 neurons in all.
Feedforward means the numbers flow one direction only, input to output, each layer feeding the next, no loops.

A neural network is just layers of numbers, and the only thing that ever moves through it is numbers.

Next: the cheatsheet puts the structure on one page, and lesson 3 answers the question this one leaves open. What actually makes one neuron light up more than another? That is weights, biases, and the squish.