Cheatsheet: Weights, biases, and the squish
The one formula every neuron runs
Section titled “The one formula every neuron runs”activation = squish( weighted sum of inputs + bias ) = squish( w1·a1 + w2·a2 + ... + wn·an + bias )Same computation in every hidden and output neuron. The network’s power is in how many there are and the exact numbers in each.
The three parts
Section titled “The three parts”| Part | What it is | What it does |
|---|---|---|
| Weight | A number on each connection | Scales one input: + boosts, - dampens, ~0 ignores |
| Bias | A number on the neuron itself | Shifts default eagerness: - cautious, + eager |
| Activation function | A fixed squashing function | Maps the unbounded result into a usable range |
The weights coming into a neuron act like a template; the weighted sum scores how well the input matches it.
The two common activation functions
Section titled “The two common activation functions”| Function | Formula | Shape | Note |
|---|---|---|---|
| Sigmoid | 1 / (1 + e^(-x)) | Smooth S-curve, maps to (0, 1) | Traditional default |
| ReLU | max(0, x) | 0 for negative, x for positive | Common modern default; fast, trains well |
Which one is a design choice. Both just keep activations in a usable range; neither is where learning happens.
Worked neuron (3 inputs)
Section titled “Worked neuron (3 inputs)”Inputs 0.5, 0.8, 0.2; weights 0.3, -0.2, 0.5; bias 0.1.
weighted sum + bias = (0.5·0.3) + (0.8·-0.2) + (0.2·0.5) + 0.1 = 0.15 - 0.16 + 0.10 + 0.10 = 0.19
sigmoid(0.19) ≈ 0.547 ReLU(0.19) = 0.19Counting the knobs (the 784-16-16-10 network)
Section titled “Counting the knobs (the 784-16-16-10 network)”| Connection | Weights + biases | Parameters |
|---|---|---|
| Input → hidden 1 | 784·16 + 16 | 12,560 |
| Hidden 1 → hidden 2 | 16·16 + 16 | 272 |
| Hidden 2 → output | 16·10 + 10 | 170 |
| Total | ~13,002 |
All weights and biases together are the parameters. Small network: ~13K. Modern networks: billions.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- “Each neuron is complicated.” No. Multiply, add, add bias, squash. Always the same.
- “Weight equals bias.” No. Weight scales one input (on a connection); bias shifts the whole neuron.
- “The smarts are in the activation function.” No. Sigmoid and ReLU never change while learning. The smarts are in the weights and biases.
- “Billions of parameters means something exotic.” No. Billions of ordinary numbers in the same simple formula. Bigger, not different.
The one-line version
Section titled “The one-line version”Each neuron is almost embarrassingly simple; the power is in how many there are and the exact numbers tuned into every one.