Skip to content

Cheatsheet: Gradient descent, step by step

new value = old value − learning_rate × (that knob's slope)
w_new = w_old − learning_rate × ∇C(w_old)

Applied to all ~13,000 knobs at once, every step. Subtract, because downhill. Repeat until the cost flattens.

  1. At the current position, compute the gradient (slope in every direction).
  2. Step: update every knob by the rule above.
  3. New position → new slopes → go back to step 1.
  4. Stop when cost stops dropping much, or after a set number of steps.

The gradient is local (only the slope right where you stand), so each step needs a fresh one. That is why training is many small steps, not one jump.

Worked run: C(w) = w², start w = 5, learning rate 0.1

Section titled “Worked run: C(w) = w², start w = 5, learning rate 0.1”
StepwC
0525
14.016.0
23.210.24
32.566.55
42.0484.19
71.0491.10

Cost slides toward 0. Steps shrink automatically as the slope 2w shrinks near the bottom.

RateFirst step from w=5Result
0.1 (good)5 → 4.0converges smoothly
2.0 (too big)5 → −15 (C 225), then 45 (C 2025)diverges, cost explodes
0.001 (too small)5 → 4.99 (C 24.9)crawls; thousands of steps needed

Big enough to progress, small enough not to overshoot. Modern methods adapt it automatically (and per knob).

  • Sign of the component → direction (turn this knob up or down).
  • Size of the component → amount (turn it a lot or a little).

It assumes you can get ∇C. Actually computing the slope of a 13,000-knob network efficiently is a separate problem: backpropagation (lesson 8).

Practical note: real training uses stochastic gradient descent, estimating the gradient from a small random batch each step. Cheaper; same conceptual loop.

  • “One step finishes it.” No. Many small steps, fresh slope each time.
  • “Bigger learning rate is always faster.” No. Too big overshoots and diverges.
  • “The gradient knows where the bottom is.” No. It only knows the local slope.
  • “Gradient descent computes the gradient.” No. It uses it. Computing it is L8.

Read the slope, step against it, repeat. The hard part was never the walking; it is figuring out the slope.