Cheatsheet: The cost landscape
The one idea that matters
Section titled “The one idea that matters”cost landscape = picture C(w, b) as terrain each knob setting = a point; its cost = the height theregoal = get downhill, toward a low valleyThe landscape, dimension by dimension
Section titled “The landscape, dimension by dimension”| Knobs | Picture | Can you draw it? |
|---|---|---|
| 1 | A curve (height vs the one knob) | Yes |
| 2 | A 3D surface of hills and valleys | Yes |
| ~13,000 (real network) | A landscape in 13,000+ dimensions | No, but the math is identical |
Build intuition in 2D; trust it carries to high dimensions unchanged.
Slopes and the gradient
Section titled “Slopes and the gradient”- At any point, every direction has a slope: move that way, does cost rise or fall? (The derivative idea; Track 8 has the precise version.)
- Gradient
∇C= the direction of steepest uphill. A vector, assembled from every knob’s individual slope. - Negative gradient
-∇C= the direction of steepest downhill. Step this way to lower cost fastest.
Worked downhill steps
Section titled “Worked downhill steps”1D: C(w) = w² (a parabola, bottom at 0). At w = 3: slope 2w = 6 (uphill to the right). Step in the -w direction → toward 0 → cost 9 → 0. Cost dropped.
2D: C(w1, w2) = w1² + w2² (a bowl, bottom at origin). At (3, 4): gradient [6, 8], negative gradient [-6, -8] (points back to origin). Step that way → cost 25 → lower. Same recipe at any number of knobs.
Local vs global minima
Section titled “Local vs global minima”| Term | Meaning |
|---|---|
| Local minimum | A valley bottom: every nearby step goes uphill |
| Global minimum | The deepest valley anywhere on the landscape |
Downhill walking only guarantees reaching a local minimum, not the global one. A trained network is usually a good solution, not a provably best one.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- “The landscape is stored in the network.” No. It is a way of picturing the cost function; height = the cost at that knob setting.
- “I should visualize 13,000 dimensions.” No. Use 2D; the math carries over untouched.
- “The gradient is one number.” Only in 1D. With many knobs it is a direction (vector) pointing steepest-uphill.
- “Downhill finds the best answer.” It finds a local minimum, maybe not the deepest valley.
The one-line version
Section titled “The one-line version”You are standing on a vast hilly terrain, and the negative gradient is the compass that always points downhill.