Skip to content

Practice: The product rule

Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.

1. State the product rule, and name the tempting wrong guess.

Show answer

d/dx(f · g) = f' · g + f · g': two terms, a sum. The tempting wrong guess is f' · g' (the product of the derivatives), which almost everyone reaches for first and which is wrong.

2. In the rectangle picture, what are the three pieces of added area when you nudge x?

Show answer

With f as width and g as height, nudging x by dx adds: a top strip (f · g' · dx), a side strip (f' · g · dx), and a small corner block (f' · g' · dx²) where the two strips overlap. The two strips are first-order in dx and survive; the corner is second-order and vanishes.

3. Why does the product rule have exactly two terms?

Show answer

Because the rectangle has two independent ways to grow: the width can move while the height holds still (the side strip, f' · g), and the height can move while the width holds still (the top strip, f · g'). Two ways to grow, two terms. Each term lets one factor change while the other rides along.

4. Why is f' · g' wrong, in terms of the picture?

Show answer

f' · g' is the corner block, the area gained by changing both the width and the height at once. It is the product of two tiny quantities (f'·dx and g'·dx), so it is second-order in dx and vanishes in the limit. The wrong guess captures exactly the one piece that does not survive, and misses the two strips that do.

5. In each term, how are the factors paired?

Show answer

Each term pairs one factor’s derivative with the other factor left undifferentiated: f' · g and f · g', never f' · f or g' · g. The side strip is g (full height) times the width’s growth; the top strip is f (full width) times the height’s growth.

6. How does the rule extend to three factors?

Show answer

d/dx(f · g · h) = f' · g · h + f · g' · h + f · g · h': one term per factor, each differentiating a single factor while the other two ride along undifferentiated. The same “one factor changes at a time” logic gives one term per factor for any number of factors.

Try it yourself, part 1: apply the product rule

Section titled “Try it yourself, part 1: apply the product rule”

Pen and paper, about 6 minutes. Differentiate each product with f' · g + f · g'.

(a) x⁴ · x² (then cross-check against the power rule by first simplifying the product)

(b) x² · cos(x) (recall d/dx(cos x) = -sin x)

Show answer

(a) f = x⁴ (f' = 4x³), g = x² (g' = 2x):

f'·g + f·g' = 4x³·x² + x⁴·2x = 4x⁵ + 2x⁵ = 6x⁵

Cross-check: x⁴ · x² = x⁶, whose derivative by the power rule is 6x⁵. The two agree.

(b) f = x² (f' = 2x), g = cos x (g' = -sin x):

f'·g + f·g' = 2x·cos x + x²·(-sin x) = 2x cos x - x² sin x

Two terms, one from each factor taking its turn. The wrong f'·g' guess would have given just 2x·(-sin x) = -2x sin x, missing the 2x cos x term entirely.

Try it yourself, part 2: watch the corner vanish

Section titled “Try it yourself, part 2: watch the corner vanish”

About 4 minutes, arithmetic only. Take f = x² and g = x at the point x = 3, nudged by dx = 0.01. So f = 9, f' = 6, g = 3, g' = 1.

Steps. (1) Compute the top strip (f·g'·dx), side strip (f'·g·dx), and corner (f'·g'·dx²). (2) Divide the two strips by dx to get the rate. (3) Compare to f'·g + f·g', and to the power-rule derivative of the simplified product.

Show answer
top strip: f·g'·dx = 9·1·0.01 = 0.09 (first-order, survives)
side strip: f'·g·dx = 6·3·0.01 = 0.18 (first-order, survives)
corner: f'·g'·dx² = 6·1·0.0001 = 0.0006 (second-order, tiny)
strips / dx = (0.09 + 0.18) / 0.01 = 27

That matches f'·g + f·g' = 6·3 + 9·1 = 18 + 9 = 27. And the simplified product is x²·x = x³, whose power-rule derivative is 3x² = 3·9 = 27. All three agree. The corner (0.0006) is already about 450 times smaller than the strips, and shrinking dx further makes it vanish faster still: it is exactly the f'·g' piece the limit discards.

Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.

Q. What is the product rule?
A.

d/dx(f · g) = f' · g + f · g': two terms, a sum. Not the tempting one-term guess f' · g'. A product of two functions has a derivative that is a sum, not a product.

Q. What is the rectangle picture of the product rule?
A.

f · g is the area of a rectangle (width f, height g). Nudging x adds a top strip (f·g'·dx) and a side strip (f'·g·dx), plus a corner block (f'·g'·dx²). Over dx and in the limit, the strips give f·g' + f'·g; the corner vanishes.

Q. Why does the product rule have two terms?
A.

The rectangle grows two independent ways: width moves while height holds (f'·g), and height moves while width holds (f·g'). Two ways to grow, two terms, each letting one factor change while the other rides along.

Q. Why is f'·g' the wrong answer?
A.

It is the corner block, the area from changing both width and height at once. Being the product of two tiny quantities (f'·dx, g'·dx), it is second-order in dx and vanishes in the limit. The wrong guess is exactly the piece that does not survive.

Q. How are the factors paired in each term?
A.

Each term pairs one factor’s derivative with the other factor undifferentiated: f'·g and f·g', never f'·f or g'·g. Side strip = full height g times width’s growth; top strip = full width f times height’s growth.

Q. Differentiate x² · x³ with the product rule, and cross-check.
A.

(2x)(x³) + (x²)(3x²) = 2x⁴ + 3x⁴ = 5x⁴. Cross-check: x²·x³ = x⁵, power-rule derivative 5x⁴. They agree.

Q. Differentiate x · sin(x).
A.

f = x (f' = 1), g = sin x (g' = cos x): 1·sin x + x·cos x = sin x + x cos x. Two terms; the wrong f'·g' guess would give only cos x.

Q. How does the product rule extend to three factors?
A.

d/dx(f·g·h) = f'·g·h + f·g'·h + f·g·h': one term per factor, each differentiating a single factor while the others ride along. The rule counts the ways the product can change one factor at a time.

Q. Where does the product rule show up in machine learning?
A.

Everywhere a product appears: networks multiply weights by activations, attention multiplies learned weights by parameter-dependent values. Backpropagation applies the product rule at each such product, one gradient term per factor, billions of times per run.