References: Gradient descent, step by step

Source material

Source curriculum (structural mirror, cited as further study):
• 3Blue1Brown, Neural Networks, Chapter 2: "Gradient descent, how neural networks learn"
  Creator: Grant Sanderson (text adaptation by Josh Pullen)
  Lesson page: https://www.3blue1brown.com/lessons/gradient-descent
  Series index: https://www.3blue1brown.com/?topic=neural-networks
  License: copyright Grant Sanderson; videos published on his site and YouTube
This lesson mirrors the gradient-descent-algorithm portion of Chapter 2.
Clawdemy's lessons are original prose that follows the pedagogical arc of this
series. We do not reproduce or transcribe the videos; we cite them as the
recommended companion. All rights to the original videos remain with the creator.

Watch this next

Gradient descent, how neural networks learn (3Blue1Brown) by Grant Sanderson. The chapter this lesson mirrors. The later half animates the ball actually rolling down the cost surface step by step, and shows the network’s digit accuracy climbing as the cost falls. Watching the descent happen alongside the worked numbers here makes the loop click.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

TensorFlow Playground. This is the lesson to finally play with the learning-rate control (top of the page). Set it tiny and watch training crawl; set it huge and watch the loss bounce or blow up; find the sweet spot in between. You are reproducing this lesson’s three worked runs with your own hands.
Neural Networks and Deep Learning, Chapter 1 (gradient descent + SGD) by Michael Nielsen. Works the update rule formally and explains stochastic gradient descent (the small-random-batch shortcut) in careful detail. The natural deeper read on exactly this lesson’s algorithm.

Adjacent topics

Where this leads inside this track.

The cost landscape (lesson 6). The previous lesson built the terrain and the compass this one walks across. If “step against the negative gradient” feels abstract, lesson 6 is where the downhill direction comes from.
What backpropagation is really doing (lesson 8). This lesson assumed the gradient was available. Lesson 8 cracks open how it is actually computed for a network with thousands of parameters, efficiently, in one backward sweep. It is the missing piece that makes the whole training loop runnable.