Skip to content

Weights, biases, and the squish

Lesson 2 left a question hanging: a hidden neuron “gets its activation from the layer before it,” but how, exactly? This lesson answers it with the single small computation that every neuron in the network, hidden and output alike, runs. Learn it once and you understand all of them.

You will see the computation in three steps: a weighted sum (multiply each incoming activation by its connection’s weight and add them up), plus a bias (a number that shifts how eager the neuron is to activate), all passed through an activation function (the “squish”) that keeps the result in a usable range. You will meet the two common squishes, sigmoid and ReLU, run one neuron by hand with both, and then count the knobs: the small 784-16-16-10 digit network already needs about 13,002 weights and biases, and modern networks have billions of them. The closing idea is the one that matters most: a network’s behavior lives entirely in those parameter values, not in its structure and not in the unchanging formula.

This is lesson 3 of the track, the second of the three structure lessons. Lesson 2 arranged neurons into layers; this lesson gives the rule that connects one layer to the next. Lesson 4 then zooms back out, stacking this single-neuron formula across the whole network to reveal it as one big function from 784 inputs to 10 outputs, with all ~13,000 parameters as its adjustable knobs. Once the structure is fully clear, Phase 2 turns to the real question this lesson raises: where do the right parameter values come from?

Prerequisite (within this track): lesson 2, Neurons as numbers, layers as structure, so that “a neuron holds an activation between 0 and 1” and the input/hidden/output layer picture are already in place. Comfort with multiplying and adding a handful of numbers is all the math you need; a calculator helps for the sigmoid step in the practice, but no coding or installation is required.

  • Describe the three-step neuron computation, a weighted sum of inputs, plus a bias, passed through an activation function
  • Explain what weights do (scale each input) and what a bias does (shift the neuron’s eagerness to activate)
  • Compare sigmoid and ReLU, and explain that both are fixed functions that only keep activations in a usable range
  • Compute a single neuron’s activation by hand with both sigmoid and ReLU
  • Count the parameters (weights and biases) of a small network and explain that a network’s behavior lives in those values, not the structure or the formula
  • Read time: about 11 minutes
  • Practice time: about 14 minutes (running a neuron by hand both ways, a count-the-parameters drill, and flashcards)
  • Difficulty: standard