Skip to content

The chain rule, visually

The product rule handled functions multiplied together; this lesson handles the other way functions combine, nested one inside another, like sin(x²) or (3x + 1)². The single capability it builds: apply the chain rule and explain it as multiplying rates through a composition. This one is worth caring about beyond calculus class, because it is the single most-used rule in machine learning, the engine of backpropagation.

The chain rule is d/dx(f(g(x))) = f'(g(x)) · g'(x): the outer derivative (evaluated at the inner function) times the inner derivative. You will read a composition as a two-step pipeline (x -> g -> u -> f) whose stage-rates compound, so the rates multiply. You will drill the classic error, that the outer derivative is f'(g(x)), not f'(x) (for sin(x²) it is cos(x²), not cos(x)), work several examples ((3x+1)² cross-checked by expanding, sin(x²), (sin x)³ to contrast nesting, the double-nest sin(cos x), and a preview e^(2x) -> 2e^(2x)), and see that backpropagation is precisely this rule applied through a network’s layers, which is why long chains can make gradients vanish or explode.

This is lesson 6 of Phase 2 (The differentiation toolkit). It and the previous lesson split one 3B1B chapter: lesson 5 took the product rule (functions multiplied), this takes the chain rule (functions nested). Together they cover the two ways functions most commonly combine, and they lean on the power rule (lesson 3) and trig derivatives (lesson 4) in their examples. This lesson is also reciprocal with Track 11’s backpropagation lesson: this is the math, that is the application to neural networks. Phase 2 continues with e (lesson 7), implicit differentiation (lesson 8), and limits (lesson 9).

Prerequisite (within this track): lesson 5, The product rule, visually, for the nudge-and-look method and the habit of identifying how functions combine. The worked examples use the power rule (lesson 3) and trig derivatives (lesson 4), so keep those handy. Comfort reading a nested expression like sin(x²) as “an inner function inside an outer one” is the key skill; no coding, nothing installed. The practice is pen and paper.

  • Apply the chain rule d/dx(f(g(x))) = f’(g(x)) * g’(x) to nested functions
  • Explain why the rates of each stage multiply through a composition (read as a pipeline)
  • Avoid the classic error by evaluating the outer derivative at the inner function, not at x
  • Distinguish compositions (chain rule) from products (product rule), and connect the chain rule to backpropagation
  • Read time: about 11 minutes
  • Practice time: about 13 minutes (applying the chain rule, a numeric stage-rate check, a chain-versus-product drill, and flashcards)
  • Difficulty: standard