Skip to content

Lesson: Neurons as numbers, layers as structure

Last lesson we named what we were after: a function that takes 784 numbers in (the brightness of every pixel in a 28 by 28 image) and gives 10 numbers out (one score per possible digit). We left that function as a sealed box and promised to open it later. This is later.

So let us open it. The good news is that what is inside is far less mysterious than the word “neural network” makes it sound. There are no tiny brains in there, no electricity, no thinking. There are layers of numbers, and the numbers flow from one layer to the next. That is genuinely most of the picture. Let us build it one piece at a time.

Start with the smallest part. In a neural network, a neuron is not a cell, not a switch, not a little circuit. A neuron is a container that holds a single number between 0 and 1. That is the entire definition.

That number has a name: the neuron’s activation. When a neuron’s activation is near 1, people say it is “lit up” or “firing.” When it is near 0, the neuron is quiet. In between is in between. A neuron holding 0.7 is mostly lit; one holding 0.05 is nearly dark.

A neuron is a container holding one number between 0 and 1 A single circle filled about 70 percent of the way up, like a tank, with the value 0.7 written in the filled part. It is labeled "activation." A short note explains that a neuron holds one number from 0 to 1: near 1 it is lit or firing, near 0 it is quiet, and 0.7 is mostly lit. 0.7 activation one number, 0 to 1 near 1: lit up, or "firing" near 0: quiet 0.7: mostly lit
A neuron is not a cell, a switch, or a tiny brain. It is a container that holds a single number between 0 and 1, called its activation. Every time you read "neuron" for the rest of this track, picture a little box with one number in it. A network is just a lot of these boxes with numbers flowing between them.

That is worth pausing on, because the word “neuron” carries so much baggage. Forget the biology. For the rest of this track, every time you read “neuron,” picture a little box with one number in it from 0 to 1. The whole network is just a lot of these boxes, arranged in a particular way, with numbers flowing between them.

Now we arrange the boxes. The first group, called the input layer, is where the image comes in.

Our image is 28 pixels wide and 28 pixels tall, which is 28 times 28, or 784 pixels in total. So we build an input layer with exactly 784 neurons, one for each pixel. Each neuron’s activation is set to the brightness of its pixel: 0 for a fully black pixel, 1 for a fully white one, and a value in between for gray.

A 28 by 28 image becomes an input layer of 784 neurons On the left, the 28 by 28 grayscale image of a 3. An arrow labeled "784 brightness values" points right to a tall column of neurons drawn as small circles, the input layer. Most are empty outlines (dark pixels near 0), some are shaded (brighter pixels). One neuron partway down is highlighted in violet and labeled neuron 294 equals 0.7, the example pixel at row 10 column 14. The column is bracketed and labeled input layer, 784 neurons, one per pixel. 28 x 28 image 784 brightness values . . . . . . neuron 294 = 0.7 pixel row 10, col 14 input layer: 784 neurons (one per pixel)
The thing your eye reads instantly as "a 3" enters the network as 784 numbers sitting in 784 boxes, nothing more. Each input neuron holds one pixel's brightness, 0 for black up to 1 for white. The example pixel at row 10, column 14 lands in neuron 10 times 28 plus 14, neuron 294, holding 0.7.

That is the whole input layer. The thing your eye reads instantly as “a 3” enters the network as 784 numbers sitting in 784 boxes. Nothing more.

Let us make it concrete with one pixel. Suppose we look at the pixel in row 10, column 14 of the image, and it is a medium gray with brightness 0.7. If we number the neurons row by row, that pixel lands in neuron number 10 times 28 plus 14, which is neuron number 294. So neuron 294 in the input layer holds the activation 0.7. Do that for all 784 pixels and the image is fully loaded into the network.

The output layer: one neuron per possible answer

Section titled “The output layer: one neuron per possible answer”

Jump to the far end. The last group, the output layer, is where the answer comes out.

There are exactly ten things the network can answer, the digits 0 through 9, so the output layer has ten neurons, one per digit. After the network has done its work, each output neuron holds an activation that we read as a confidence score. The neuron with the highest activation is the network’s guess.

The output layer reads as a bar chart, with the digit 3 the tallest Ten bars, one per output neuron labeled 0 through 9, drawn as a bar chart of activations. The bar for the digit 3 towers at 0.92 and is highlighted in violet. Every other bar is tiny, between 0.01 and 0.05. The tallest bar is the network's guess, so the answer is 3, and the high value means the network is confident. 0.02 0 0.01 1 0.05 2 0.92 3 0.03 4 0.04 5 0.01 6 0.02 7 0.01 8 0.02 9 digit tallest bar wins: the guess is 3 activation = confidence (0 to 1)
After the network does its work, each of the ten output neurons holds a confidence score. Here the 3 neuron holds 0.92, far above the rest, so the answer is "3" and the high value says the network is confident. If two bars were close, say 0.45 and 0.43, that would be the network hesitating between two digits.

Say that after processing an image, the ten output neurons hold these activations, in order from 0 to 9:

0: 0.02 1: 0.01 2: 0.05 3: 0.92
4: 0.03 5: 0.04 6: 0.01 7: 0.02
8: 0.01 9: 0.02

Scan for the largest. The neuron for the digit 3 holds 0.92, far above all the others. So the network’s answer is “3,” and the high value tells us it is confident. If two neurons were close, say 0.45 and 0.43, that would be the network hesitating between two digits. Reading the output is just finding the tallest bar.

So far we have the image coming in (784 neurons) and the answer coming out (10 neurons). What connects them? Everything in between, called the hidden layers.

“Hidden” only means “not the input and not the output.” These are the in-between boxes that do the actual work of turning raw pixel brightness into a digit guess. In the classic example this track follows, there are two hidden layers, each with 16 neurons. So the full structure looks like this:

  • Input layer: 784 neurons (one per pixel)
  • Hidden layer 1: 16 neurons
  • Hidden layer 2: 16 neurons
  • Output layer: 10 neurons
The full feedforward network: 784, 16, 16, 10 Four layers drawn left to right. On the left, the input layer of 784 neurons drawn as a compressed vertical stack. Next, two hidden layers of 16 neurons each. On the right, the output layer of 10 neurons labeled 0 through 9. Faint lines connect every layer to the next, suggesting the dense wiring. An arrow along the bottom points left to right, labeled feedforward, numbers flow one direction. The total is 826 neurons. 826 neurons total 0 1 2 3 4 5 6 7 8 9 784 input 16 hidden 16 hidden 10 output feedforward: numbers flow one direction, input to output
The whole example network: 784 input neurons (one per pixel), two hidden layers of 16, and 10 output neurons, 826 in all. Each layer feeds the next, always forward, no loops and no going back. A real, working digit recognizer that fits in a number you can say out loud.

Add those up and the example network has 784 plus 16 plus 16 plus 10, which is 826 neurons in total. A real, working digit recognizer, and it fits in a number you can say out loud.

You might reasonably ask: why two hidden layers, and why 16 each? The honest answer is that these are design choices, not laws. Different networks use different numbers, and picking them is part of the craft of building one. Two layers of 16 is simply a clean, small choice that is big enough to learn the patterns and small enough to picture. Do not read deep meaning into the exact figures; read them as “enough room to work.”

There is one more structural fact, and it is the reason this kind of network is called feedforward. The numbers move in a single direction: from the input layer, into the first hidden layer, into the second, and out through the output layer. Forward, always forward. No loops, no going back, no neuron in an earlier layer listening to a later one.

Each layer takes the activations of the layer before it and produces the activations of the layer after it. The image enters as 784 numbers, gets transformed into 16 numbers, then another 16, then finally 10. That one-directional flow is the simplest neural network architecture there is, and it is the one we are building our intuition on.

Here is the appealing story for why hidden layers might help, and it is worth telling clearly as long as we are honest that it is a hope, not a guarantee.

You might imagine that the first hidden layer learns to notice small pieces of a digit, like a short edge or a little curve. The second hidden layer might then notice larger patterns, like a full loop or a long stroke, by combining those small pieces. And the output layer might assemble those larger patterns into whole-digit guesses: a loop on top of a loop leans toward 8, an open curve over a flat base leans toward 3.

The hopeful story: edges, then loops and strokes, then a digit A three-step build-up shown left to right with arrows. Step one, layer 1, a scatter of small edge and curve fragments. Step two, layer 2, those combine into a loop and a stroke. Step three, the output, a recognized digit 3. A header calls this the hope for what the hidden layers might do, marked with an asterisk, and a footnote cautions that a real trained network organizes itself messier than this tidy story. The hope: what the hidden layers might do* Layer 1: small edges Layer 2: loops and strokes Output: a digit * Hold this loosely. A trained network tends to organize itself messier than this tidy story (a later lesson shows how).
An appealing picture: the first hidden layer notices small edges, the second combines them into loops and strokes, the output assembles those into a whole-digit guess. It is the right framing to hold for now, but hold it loosely. Whether a real network organizes itself this neatly is a genuine question, and the honest answer is that it is usually messier.

It is a lovely, tidy picture, and it is the right framing to hold for now. But hold it loosely. Whether a real trained network actually organizes itself this neatly is a genuine question, and the honest answer, which a later lesson gets into, is that the patterns a network really learns tend to be messier and less human-readable than this clean edges-to-loops-to-digits story suggests. For this lesson, the hope is the framing; just keep a mental asterisk on it.

When you hear that a model has “billions of neurons” or read a headline about an AI’s “brain,” it is easy to picture something alive and thinking. This lesson is the antidote. A neuron is a number between 0 and 1. A network is layers of those numbers with values flowing forward through them. “Billions of neurons” means billions of little numbers, nothing spookier.

That reframing is genuinely useful when you use AI tools. It explains why these systems have no awareness of what they are doing, why their “confidence” is literally just which output number came out tallest, and why the same architecture can read digits, faces, or audio without caring which: it is always numbers in, numbers out, the same flow. Once the word “neuron” stops sounding like biology and starts sounding like “a number in a box,” a lot of AI hype quietly deflates into something you can reason about.

Thinking a neuron is like a brain cell. The name is borrowed, the resemblance is not. A neuron here is a container for one number between 0 and 1. No biology required, and the analogy mostly gets in the way.

Thinking activations are on-or-off. An activation is any value from 0 to 1, not just 0 or 1. A neuron at 0.6 is partly lit. The in-between values are where most of the information lives.

Reading meaning into “2 hidden layers of 16.” Those numbers are a design choice for one teaching example, not a rule. Real networks vary enormously. Treat them as “enough room,” not as a magic recipe.

Taking the edges-to-loops story as fact. It is the hope for what hidden layers do, and a useful first picture, but a trained network does not reliably organize itself that cleanly. Hold the story as motivation, not as a description of what is provably happening inside.

  • A neuron is a container holding one number between 0 and 1, called its activation. That is the whole definition. Forget the biology.
  • The input layer has one neuron per pixel (784 for a 28 by 28 image), each holding that pixel’s brightness; the output layer has one neuron per answer (10 for the digits), and the tallest activation is the guess.
  • Hidden layers sit in between and do the work of turning pixels into a guess; the example network is 784, 16, 16, 10, which is 826 neurons in all.
  • Feedforward means the numbers flow one direction only, input to output, each layer feeding the next, no loops.

A neural network is just layers of numbers, and the only thing that ever moves through it is numbers.

Next: the cheatsheet puts the structure on one page, and lesson 3 answers the question this one leaves open. What actually makes one neuron light up more than another? That is weights, biases, and the squish.