The chain rule, visually
What you’ll learn
Section titled “What you’ll learn”The product rule handled functions multiplied together; this lesson handles the other way functions combine, nested one inside another, like sin(x²) or (3x + 1)². The single capability it builds: apply the chain rule and explain it as multiplying rates through a composition. This one is worth caring about beyond calculus class, because it is the single most-used rule in machine learning, the engine of backpropagation.
The chain rule is d/dx(f(g(x))) = f'(g(x)) · g'(x): the outer derivative (evaluated at the inner function) times the inner derivative. You will read a composition as a two-step pipeline (x -> g -> u -> f) whose stage-rates compound, so the rates multiply. You will drill the classic error, that the outer derivative is f'(g(x)), not f'(x) (for sin(x²) it is cos(x²), not cos(x)), work several examples ((3x+1)² cross-checked by expanding, sin(x²), (sin x)³ to contrast nesting, the double-nest sin(cos x), and a preview e^(2x) -> 2e^(2x)), and see that backpropagation is precisely this rule applied through a network’s layers, which is why long chains can make gradients vanish or explode.
Where this fits
Section titled “Where this fits”This is lesson 6 of Phase 2 (The differentiation toolkit). It and the previous lesson split one 3B1B chapter: lesson 5 took the product rule (functions multiplied), this takes the chain rule (functions nested). Together they cover the two ways functions most commonly combine, and they lean on the power rule (lesson 3) and trig derivatives (lesson 4) in their examples. This lesson is also reciprocal with Track 11’s backpropagation lesson: this is the math, that is the application to neural networks. Phase 2 continues with e (lesson 7), implicit differentiation (lesson 8), and limits (lesson 9).
Before you start
Section titled “Before you start”Prerequisite (within this track): lesson 5, The product rule, visually, for the nudge-and-look method and the habit of identifying how functions combine. The worked examples use the power rule (lesson 3) and trig derivatives (lesson 4), so keep those handy. Comfort reading a nested expression like sin(x²) as “an inner function inside an outer one” is the key skill; no coding, nothing installed. The practice is pen and paper.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Apply the chain rule d/dx(f(g(x))) = f’(g(x)) * g’(x) to nested functions
- Explain why the rates of each stage multiply through a composition (read as a pipeline)
- Avoid the classic error by evaluating the outer derivative at the inner function, not at x
- Distinguish compositions (chain rule) from products (product rule), and connect the chain rule to backpropagation
Time and difficulty
Section titled “Time and difficulty”- Read time: about 11 minutes
- Practice time: about 13 minutes (applying the chain rule, a numeric stage-rate check, a chain-versus-product drill, and flashcards)
- Difficulty: standard