Skip to content

Cheatsheet: How models actually learn: gradient descent

TermMeaning
Losstotal error a given set of parameters produces
Landscapeparameters are horizontal directions, loss is the height
Goalfind the lowest point (smallest loss)
What you senseonly the local slope under your feet
PieceWhat it isEffect
Gradientslope of the loss; which way and how steeply error changespoints uphill; step against it
Learning ratesize of each steptoo big overshoots; too small crawls
StepAction
1Guess the parameters (often random)
2Compute the loss (how wrong now?)
3Compute the gradient (which way is downhill?)
4Step: new = old - (learning_rate * gradient)
5Repeat until steps stop lowering the loss

Worked trace (bowl with minimum at w = 5, learning rate 0.25, gradient = 2*(w-5))

Section titled “Worked trace (bowl with minimum at w = 5, learning rate 0.25, gradient = 2*(w-5))”
Stepwgradientloss = (w-5)^2
start1-816
13-44
24-21
toward 5toward 0toward 0
Symptom / ideaMeaning
Loss climbs or swings wildlylearning rate too large; reduce it
Loss falls painfully slowlylearning rate too small
Settles above the true beststuck in a local minimum
Stochastic gradient descentestimate slope from a random sample each step; how large models train