Skip to content

Practice: The chain rule

Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.

1. State the chain rule, and read it in words.

Show answer

d/dx(f(g(x))) = f'(g(x)) · g'(x): the outer function’s derivative (evaluated at the inner function) times the inner function’s derivative. Two pieces, multiplied.

2. Why do the rates multiply through a composition?

Show answer

A composition is a pipeline: x -> g -> u = g(x) -> f -> f(u). A nudge in x drives u at rate g'(x), and that motion in u drives the output at rate f'(u). The effects compound: if u moves 3× as fast and f amplifies by 2×, the output moves 2·3 = 6 times the original rate. Each transformation contributes its own multiplying factor.

3. What is the “evaluated at the inner function” step, and why does it trip people up?

Show answer

The outer derivative is f'(g(x)), not f'(x): the outer function acts on the intermediate value u = g(x), so its rate must be measured there. For sin(x²), the outer derivative is cos(x²), not cos(x), because sine is acting on . Forgetting to evaluate the outer derivative at the inner function is the number-one chain-rule error.

4. For sin(x²) versus (sin x)³, which function is outer and which is inner?

Show answer

sin(x²): outer is sine, inner is (so the derivative is cos(x²)·2x). (sin x)³: outer is the cube , inner is sin x (so the derivative is 3 sin²x · cos x). Same two ingredients, opposite nesting, different answers. Identifying which is outer and which is inner is the first move every time.

5. How do you differentiate a deeply nested function like sin(cos(x²))?

Show answer

Apply the chain rule once per layer, peeling from the outside in: differentiate the outermost function (evaluated at everything inside it), multiply by the derivative of what was inside, and repeat until you reach the bare x. Each layer contributes exactly one factor to the product; do not stop early.

6. How do you tell when to use the chain rule versus the product rule?

Show answer

Check the structure. Functions multiplied (f · g) use the product rule and give a sum of two terms. Functions nested (f(g(x)), one inside another) use the chain rule and give a product of rates. Decide which structure you actually have before reaching for a rule.

Try it yourself, part 1: apply the chain rule

Section titled “Try it yourself, part 1: apply the chain rule”

Pen and paper, about 6 minutes. Identify outer and inner, then multiply the outer derivative (evaluated at the inner) by the inner derivative.

(a) (2x + 3)⁴

(b) cos(x³)

(c) Numeric check: for (2x + 3)² at x = 1, find the outer stage-rate and inner stage-rate separately, multiply, and cross-check by expanding.

Show answer

(a) Outer u⁴ (derivative 4u³), inner 2x + 3 (derivative 2):

4(2x+3)³ · 2 = 8(2x+3)³

(b) Outer cos u (derivative -sin u), inner (derivative 3x²):

-sin(x³) · 3x² = -3x²·sin(x³)

The outer derivative is -sin(x³), evaluated at the inner , not -sin(x).

(c) At x = 1: inner 2x + 3 = 5, changing at rate g'(x) = 2. Outer has derivative 2u, which at u = 5 is 10. Multiply the stage-rates: 10 · 2 = 20. Cross-check: (2x+3)² = 4x² + 12x + 9, derivative 8x + 12, which at x = 1 is 20. The two factors 10 and 2 are exactly the outer and inner stage-rates, and their product is the whole composition’s rate.

Try it yourself, part 2: which rule, and what is outer/inner?

Section titled “Try it yourself, part 2: which rule, and what is outer/inner?”

About 3 minutes. For each expression, say whether it is a product (use the product rule) or a composition (use the chain rule). For the compositions, name the outer and inner functions.

  1. x² · sin(x)
  2. sin(x²)
  3. (cos x)⁵
  4. (x + 1)(x - 1)
Show answer
  1. Product (f · g with f = x², g = sin x): product rule. Derivative 2x·sin x + x²·cos x.
  2. Composition: outer = sine, inner = . Chain rule. Derivative cos(x²)·2x.
  3. Composition: outer = u⁵, inner = cos x. Chain rule. Derivative 5(cos x)⁴·(-sin x) = -5 cos⁴x sin x.
  4. Product (f · g with f = x + 1, g = x - 1): product rule. Derivative 1·(x-1) + (x+1)·1 = 2x. (Cross-check: (x+1)(x-1) = x² - 1, derivative 2x.)

The discriminator is structure: multiplied side by side is a product; one tucked inside another is a composition.

Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.

Q. What is the chain rule?
A.

d/dx(f(g(x))) = f'(g(x)) · g'(x): the outer function’s derivative (evaluated at the inner function) times the inner function’s derivative. Two pieces, multiplied.

Q. Why do rates multiply through a composition?
A.

A composition is a pipeline x -> g -> u -> f. A nudge in x drives u at rate g'(x), which drives the output at rate f'(u). The effects compound, so the total rate is the product f'(g(x)) · g'(x). Each layer adds a multiplying factor.

Q. What is the 'evaluated at' gotcha?
A.

The outer derivative is f'(g(x)), not f'(x). For sin(x²) it is cos(x²), not cos(x), because sine acts on . The outer function operates downstream on u = g(x), so its rate is read there. This is the number-one chain-rule error.

Q. Differentiate sin(x²).
A.

Outer sine (derivative cosine), inner (derivative 2x): cos(x²) · 2x = 2x·cos(x²). Note cos(x²), evaluated at the inner function, not cos(x).

Q. Differentiate (sin x)³, and contrast with sin(x²).
A.

Outer (derivative 3u²), inner sin x (derivative cos x): 3(sin x)²·cos x = 3 sin²x cos x. Opposite nesting from sin(x²) (where sine is outer), so a different answer. Identify outer vs inner first.

Q. How do you handle a deeply nested function?
A.

Apply the chain rule once per layer, peeling outside in: differentiate the outermost function (at everything inside), multiply by the derivative of what was inside, repeat to the bare x. One factor per layer; do not stop early.

Q. Chain rule versus product rule: how to tell?
A.

Functions multiplied (f · g) use the product rule and give a sum of two terms. Functions nested (f(g(x))) use the chain rule and give a product of rates. Check the structure before reaching for a rule.

Q. Differentiate e^(2x) with the chain rule.
A.

Accepting that e^x is its own derivative (next lesson): outer e^u (derivative e^u), inner 2x (derivative 2), so e^(2x)·2 = 2e^(2x). The chain rule is what lets exponentials with a rate in the exponent differentiate cleanly.

Q. Why is the chain rule the most-used calculus rule in ML?
A.

A neural network is a deep composition of layers, and the derivative of the loss with respect to a buried parameter is the chain rule applied layer by layer, which is exactly backpropagation. Frameworks implement it as automatic differentiation, applied astronomically often per step. (Vanishing/exploding gradients = many chain-rule factors multiplied.)