Skip to content

References: Backpropagation and the chain rule

Source curriculum (structural mirror, cited as further study):
• 3Blue1Brown, Neural Networks, Chapter 5: "Backpropagation calculus"
Creator: Grant Sanderson (text adaptation by Josh Pullen)
Lesson page: https://www.3blue1brown.com/lessons/backpropagation-calculus
Series index: https://www.3blue1brown.com/?topic=neural-networks
License: copyright Grant Sanderson; videos published on his site and YouTube
This lesson mirrors the backpropagation-calculus chapter (the companion to the
intuition chapter mirrored in the previous lesson). Note: live 3B1B Chapter 3
("Analyzing our neural network", at /lessons/neural-network-analysis) sits
between the gradient-descent chapter (Ch2, mirrored in T11 lesson 7) and the
backpropagation chapter (Ch4, mirrored in lesson 8). T11 deliberately does not
mirror Ch3 as a standalone lesson; its central insight, that a trained
network's hidden layers do not cleanly detect "edges then loops" as the
hopeful story suggests, is folded into lesson 2 as the "hold the edges-to-
loops story loosely" caveat. Live Ch3-5 numbering verified 2026-05-25.
Clawdemy's lessons are original prose that follows the pedagogical arc of this
series. We do not reproduce or transcribe the videos; we cite them as the
recommended companion. All rights to the original videos remain with the creator.
  • Backpropagation calculus (3Blue1Brown) by Grant Sanderson. The chapter this lesson mirrors. It works the chain rule through a small chain of neurons with the activation functions included, and shows the notation laid out cleanly. If the worked chain here clicked, the video is the next level of the same example with every factor, including the squish, in place.

A short, durable list. Each link is a specific next step, not a generic pile.

  • The chain rule itself: Clawdemy Track 8 (Visual Math: Calculus) and 3Blue1Brown’s Essence of Calculus by Grant Sanderson. This lesson used the chain rule as a given. If “rates multiply along a chain” felt like a leap, this is where the chain rule gets built from the ground up, with the geometric intuition behind it.

  • Neural Networks and Deep Learning, Chapter 2 (the backprop equations) by Michael Nielsen. Derives the four backpropagation equations in full, including the activation-function factor we only mentioned here. The natural deeper read for anyone who wants the complete, general form rather than the one-neuron-per-layer chain.

Where this leads inside this track.

  • What backpropagation is really doing (lesson 8). The previous lesson told the story (desires propagating backward); this one supplied the arithmetic (the chain rule). Read them as one pair: intuition, then the math that makes it precise.

  • Gradient descent (lesson 7). The chain rule produces each knob’s slope, which is one component of the gradient. Lesson 7 is what consumes those slopes to take a step. Together, lessons 5 through 9 are the complete training loop.

  • Seeing it whole (lesson 10). The final lesson assembles every piece, from a messy handwritten 3 to a trained network, and points you toward building one yourself.