Skip to content

Cheatsheet: The cost landscape

cost landscape = picture C(w, b) as terrain
each knob setting = a point; its cost = the height there
goal = get downhill, toward a low valley
KnobsPictureCan you draw it?
1A curve (height vs the one knob)Yes
2A 3D surface of hills and valleysYes
~13,000 (real network)A landscape in 13,000+ dimensionsNo, but the math is identical

Build intuition in 2D; trust it carries to high dimensions unchanged.

  • At any point, every direction has a slope: move that way, does cost rise or fall? (The derivative idea; Track 8 has the precise version.)
  • Gradient ∇C = the direction of steepest uphill. A vector, assembled from every knob’s individual slope.
  • Negative gradient -∇C = the direction of steepest downhill. Step this way to lower cost fastest.

1D: C(w) = w² (a parabola, bottom at 0). At w = 3: slope 2w = 6 (uphill to the right). Step in the -w direction → toward 0 → cost 9 → 0. Cost dropped.

2D: C(w1, w2) = w1² + w2² (a bowl, bottom at origin). At (3, 4): gradient [6, 8], negative gradient [-6, -8] (points back to origin). Step that way → cost 25 → lower. Same recipe at any number of knobs.

TermMeaning
Local minimumA valley bottom: every nearby step goes uphill
Global minimumThe deepest valley anywhere on the landscape

Downhill walking only guarantees reaching a local minimum, not the global one. A trained network is usually a good solution, not a provably best one.

  • “The landscape is stored in the network.” No. It is a way of picturing the cost function; height = the cost at that knob setting.
  • “I should visualize 13,000 dimensions.” No. Use 2D; the math carries over untouched.
  • “The gradient is one number.” Only in 1D. With many knobs it is a direction (vector) pointing steepest-uphill.
  • “Downhill finds the best answer.” It finds a local minimum, maybe not the deepest valley.

You are standing on a vast hilly terrain, and the negative gradient is the compass that always points downhill.