Summary: The product rule
When two functions are multiplied, the natural guess for the derivative, multiply the derivatives, is wrong. The right answer, the product rule, has two terms, and one picture, a rectangle whose width and height both grow, shows exactly where both terms come from and why the tempting one-term guess fails. It is the same nudge-and-look reasoning as the power and trig lessons, applied to a rectangle’s area. This is the scan-it-in-five-minutes version.
Core ideas
Section titled “Core ideas”- The product rule:
d/dx(f · g) = f' · g + f · g'. Two terms, a sum, not the temptingf' · g'. - The picture: a growing rectangle. Let
fbe the width andgthe height, sof · gis the area. Nudgexbydxand the area gains a top strip (f · g' · dx), a side strip (f' · g · dx), and a tiny corner block (f' · g' · dx²). Divide bydxand letdx -> 0: the two strips survive, the corner vanishes, leavingf' · g + f · g'. - Why two terms. The rectangle grows two independent ways: width moves while height holds (
f' · g), and height moves while width holds (f · g'). Two ways to grow, two terms, each letting one factor change while the other rides along. - Why
f' · g'is wrong. It is the corner block, the area from changing both dimensions at once. Being the product of two tiny quantities, it is second-order indxand dies in the limit. The wrong guess is precisely the piece that does not survive. - Worked and cross-checked.
x² · x³gives(2x)(x³) + (x²)(3x²) = 5x⁴, which matches the power rule onx⁵.x · sin xgivessin x + x cos x;sin x · cos xgivescos²x - sin²x(which iscos 2x). A numeric nudge (f = x²,g = xatx = 3) shows the strips summing to the rate 27 while the corner stays a tiny0.0006. - More factors, same idea.
d/dx(f · g · h) = f' · g · h + f · g' · h + f · g · h': one term per factor, each taking its turn to change.
What changes for you
Section titled “What changes for you”You stop reaching for f' · g' and carry a picture that makes the right answer obvious: a product is a rectangle’s area, and it grows in two independent directions, so its rate of change is a sum of two contributions, never a product. The corner block is a vivid reminder of which piece the limit throws away, the exact piece the wrong guess keeps. This rule runs quietly under nearly every gradient in machine learning, because networks multiply weights by activations everywhere and attention multiplies learned weights by parameter-dependent values; backpropagation applies the product rule at each such product, one gradient term per factor, billions of times per training run. The next lesson takes on the other way functions combine, nesting one inside another, with the chain rule.