References: How models actually learn: gradient descent
Source material
Section titled “Source material”Source material (conceptual spine):• StatQuest with Josh Starmer: "Gradient Descent, Step by Step" Creator: Josh Starmer YouTube: https://www.youtube.com/watch?v=sDv4f4s2SB8 Channel / site: https://statquest.org/ License: as published on StatQuest's public YouTube channel (link-out only)
Related StatQuest video:• "Stochastic Gradient Descent" YouTube: https://www.youtube.com/watch?v=vMh0zPT0tLI
Clawdemy provides original notes, summaries, and quizzes derived from this materialfor educational purposes. All rights to the original videos remain with the creator.What this lesson draws from each source
Section titled “What this lesson draws from each source”- StatQuest’s “Gradient Descent, Step by Step” anchors the procedure: the loss surface, stepping against the slope, and the role of the step size. StatQuest works the gradient with calculus on a sum-of-squared-residuals example; this lesson deliberately keeps the no-calculus intuition (the foggy hillside) and traces a single-parameter bowl by hand, so the mechanism is clear before any derivatives. If you want the calculus-level derivation, the StatQuest video is the place to go.
- The “loss as a landscape” framing and the worked bowl trace are Clawdemy’s own simplifications, built to make the downhill loop concrete without notation.
Going deeper
Section titled “Going deeper”- StatQuest with Josh Starmer. The gradient descent and stochastic gradient descent videos pair directly with this lesson. StatQuest also covers the chain rule and backpropagation, which is gradient descent applied through the layers of a neural network.
- 3Blue1Brown: Gradient descent, how neural networks learn by Grant Sanderson. A visual, geometry-first walk through gradient descent in the context of a neural network learning to recognize digits. The single best companion video if you want to see the landscape and the steps.
Adjacent topics
Section titled “Adjacent topics”- Logistic regression (the next lesson). The first place we use gradient descent for real: there is no neat formula for the best logistic regression, so it is fit by gradient descent.
- Backpropagation. The method that computes the gradient efficiently across the many layers of a neural network. It is the reason gradient descent scales to billions of parameters. A natural next step once this lesson is solid (and the subject of other Clawdemy tracks).
- Learning-rate schedules. In practice the learning rate is often changed during training (large at first, smaller later). A refinement of the single fixed rate used here.
Community discussion
Section titled “Community discussion”None selected for this lesson. Gradient descent is thoroughly covered by the StatQuest and 3Blue1Brown resources above. If a canonical discussion surfaces, it will be added at the next review.