The product rule, via changing rectangles

So far the toolkit handles powers of the input (the power rule) and the trig functions (last lesson). But functions rarely arrive alone; they get combined. The most basic combination is multiplication: what is the derivative of f of the input times g of the input?

The tempting guess, the one almost everyone makes first, is that the derivative of a product is the product of the derivatives: f-prime times g-prime. It is wrong, and seeing exactly why it is wrong is the whole lesson. The correct answer, the product rule, has two terms:

d/dx( f(x) · g(x) ) = f'(x) · g(x) + f(x) · g'(x)

One picture explains where both terms come from, why there are two of them, and why the one-term guess fails. The picture is a rectangle.

A product is the area of a rectangle

Let f of the input be the width of a rectangle and g of the input its height. Then the product f of the input times g of the input is the rectangle’s area. As the input changes, both the width and the height change, so the rectangle morphs, and the rate at which its area changes is the derivative we are after.

Two values multiplied is a rectangle: width f(x), height g(x), area f(x) times g(x). When f or g changes, that rectangle reshapes; tracking what gets added gives the product rule, the topic of the next picture.

Nudge x and watch the rectangle grow

Increase the input by a tiny step. Two independent things happen:

The width f grows by f-prime times the step (the rate of change of f times the nudge). The right edge slides outward.
The height g grows by g-prime times the step. The top edge slides up.

The new area added is an L-shaped border around the original rectangle, and it splits into three pieces:

A strip across the top: width f, height g-prime times the step. Its area is f times g-prime times the step.
A strip up the side: width f-prime times the step, height g. Its area is f-prime times g times the step.
A small corner block where the two strips overlap: width f-prime times the step, height g-prime times the step. Its area is f-prime times g-prime times the step squared.

Grow the rectangle: width gains f'·dx, height gains g'·dx. Three new pieces appear: a top strip area f·(g'·dx), a right strip area (f'·dx)·g, and a small corner area (f'·g')·dx². The corner shrinks faster than the strips when dx shrinks, so the rate of growth is the sum of the strip areas alone: f·g' + f'·g.

Add the pieces, divide, and let dx vanish

The total area gained is the sum of the three pieces:

added area = f · g' · dx  +  f' · g · dx  +  f' · g' · dx^2

The rate of change is this divided by the step:

f · g'  +  f' · g  +  f' · g' · dx

Now let the step shrink to zero. The first two terms have no step left in them and survive. The third term, f-prime times g-prime times the step, still carries a factor of the step, so it vanishes (it came from the corner block, which is second-order in the step, just like the corner square in the power-rule lesson). What remains is:

d/dx( f · g ) = f' · g  +  f · g'

The product rule, read straight off the rectangle.

Why two terms, and why not f’ · g’

Here is the payoff, and the reason this lesson exists. The rectangle has two independent ways to grow: the width can move while the height holds still (that is the side strip, f-prime times g), and the height can move while the width holds still (that is the top strip, f times g-prime). Two ways to grow, two terms. The product rule is the sum of “f changes while g rides along” and “g changes while f rides along.”

What about the tempting wrong answer, f-prime times g-prime? That is the corner block, the area you gain by changing both the width and the height at once. And the corner block is the one piece that vanishes in the limit, because it is the product of two tiny quantities (f-prime times the step, and g-prime times the step), making it second-order in the step. So f-prime times g-prime is not just a different answer; it is precisely the piece the limit throws away. The gut-instinct guess captures the one contribution that does not survive, and misses the two that do.

Put numbers on it to feel the corner shrink. Take f equal to the input and g equal to the input cubed at the point where the input is 2, so f is 2, f-prime is 1, g is 8, g-prime is 12. Nudge by a step of 0.01. The top strip adds f times g-prime times the step: 2 times 12 times 0.01, which is 0.24; the side strip adds f-prime times g times the step: 1 times 8 times 0.01, which is 0.08; the corner adds f-prime times g-prime times the step squared: 1 times 12 times 0.0001, which is 0.0012. The corner is already two hundred times smaller than the strips, and halving the step again would shrink it fourfold while only halving the strips. Divide the surviving strips by the step: 0.24 plus 0.08, all over 0.01, which is 32, which is exactly f-prime times g plus f times g-prime: 1 times 8 plus 2 times 12, which is 32, the derivative of the input times the input cubed, that is the input to the fourth, at the point where the input is 2.

Worked examples

A cross-check against the power rule. Take f equal to the input squared and g equal to the input cubed, so f-prime is 2 times the input and g-prime is 3 times the input squared. The product rule gives:

d/dx(x^2 · x^3) = (2x)(x^3) + (x^2)(3x^2) = 2x^4 + 3x^4 = 5x^4

Now check it directly: the input squared times the input cubed is the input to the fifth, whose derivative by the power rule is 5 times the input to the fourth. The two methods agree, which is exactly the kind of consistency that should make you trust the rule.

An algebraic times a trig function. Take f equal to the input and g equal to sine of the input, so f-prime is 1 and g-prime is cosine of the input (from last lesson). The product rule gives:

d/dx(x · sin(x)) = (1)(sin(x)) + (x)(cos(x)) = sin(x) + x·cos(x)

Two terms, one from each function taking its turn to change. There is no way to get this with the wrong f-prime times g-prime guess, which would have given just cosine of the input.

Two trig functions. Take f equal to sine of the input and g equal to cosine of the input, so f-prime is cosine of the input and g-prime is negative sine of the input. The product rule gives:

d/dx(sin(x) · cos(x)) = (cos(x))(cos(x)) + (sin(x))(-sin(x)) = cos^2(x) - sin^2(x)

This uses both trig derivatives from last lesson inside one product. (As a bonus consistency check, cosine squared of the input minus sine squared of the input is the double-angle identity for cosine of twice the input, and since sine of the input times cosine of the input equals one-half times sine of twice the input, its derivative should be cosine of twice the input, which matches.)

Three factors, same idea

The “one term per factor, each taking its turn to change” pattern generalizes immediately. For a product of three functions, group them as the quantity f times g, all times h, and apply the rule twice, and you get:

d/dx( f · g · h ) = f' · g · h  +  f · g' · h  +  f · g · h'

One term per factor, each differentiating a single function while the other two ride along undifferentiated. The same logic gives one term per factor for any number of factors. A four-dimensional box would grow by four slabs; the rule just counts the ways the product can change one factor at a time.

Why this matters when you use AI

The product rule runs quietly under every gradient computation that involves a product, which is almost all of them. Neural networks are built from weights multiplied by activations, and attention mechanisms multiply learned weights by values that themselves depend on parameters. When backpropagation computes a gradient through any such product, it applies the product rule: the gradient picks up one term for each factor’s contribution, exactly the “f changes, g rides along, and vice versa” structure from the rectangle. It is not an exotic tool; it is one of the handful of rules automatic differentiation applies billions of times per training run, mostly out of sight.

Common pitfalls

Guessing f-prime times g-prime. This is the single most common product-rule mistake, and the rectangle shows why it is wrong: f-prime times g-prime is the corner block, the only piece that vanishes in the limit. The real derivative is the two surviving strips, f-prime times g plus f times g-prime.

Losing a term. The product rule always has two terms (for a product of two functions). If you wrote only one, you forgot that the rectangle grows in two independent directions. Each factor gets a turn to change while the other holds still.

Mismatching the pairings. Each term pairs one factor’s derivative with the other factor undifferentiated: f-prime times g, and f times g-prime, never f-prime times f, or g-prime times g. The strip across the top is f (full width) times the height’s growth; the strip up the side is g (full height) times the width’s growth.

Forgetting the corner is second-order. The corner block, f-prime times g-prime times the step squared, is dropped not by hand-waving but because it shrinks faster than the step, the same reason the power-rule lesson dropped its corner square. Two strips are first-order and survive; the corner is second-order and dies.

What you should remember

The product rule says the derivative of f times g equals f-prime times g plus f times g-prime, two terms, not the tempting one-term guess f-prime times g-prime. A product of two functions has a derivative that is a sum, not a product.
The picture is a rectangle of area f times g that grows in two directions. Nudging the input adds a top strip (f times g-prime) and a side strip (f-prime times g); their sum is the rule. The f-prime times g-prime corner is the piece that vanishes in the limit, which is exactly the wrong guess.
Each term lets one factor change while the other rides along. f changes while g holds (f-prime times g), then g changes while f holds (f times g-prime). Two independent ways to grow, two terms, verified against the power rule on the input squared times the input cubed equals the input to the fifth.

When two functions multiply, their combined rate of change is the sum of each one’s contribution: f moves while g waits, then g moves while f waits. Add them, and never multiply the two derivatives. The next lesson takes on the other way functions combine, nesting one inside another, with the chain rule.