Cheatsheet: The product rule
The rule
Section titled “The rule”d/dx( f · g ) = f' · g + f · g'Two terms. Not f' · g' (that is the most common mistake).
Three factors: d/dx(f·g·h) = f'·g·h + f·g'·h + f·g·h' (one term per factor).
The picture: a growing rectangle
Section titled “The picture: a growing rectangle”f · g is the area of a rectangle, width f, height g. Nudge x by dx:
| Piece | Size | Survives? |
|---|---|---|
| Top strip | f · g' · dx | Yes (first-order) |
| Side strip | f' · g · dx | Yes (first-order) |
| Corner block | f' · g' · dx^2 | No (second-order, vanishes) |
Added area / dx -> f · g' + f' · g. The corner block is exactly the wrong f' · g' guess, and it is the piece that dies in the limit.
Why two terms
Section titled “Why two terms”The rectangle grows in two independent directions: width moves while height holds (f' · g), height moves while width holds (f · g'). Two ways to grow, two terms. Changing both at once is the corner, which vanishes.
Worked examples
Section titled “Worked examples”| Product | f, f’ / g, g’ | Product rule | Result |
|---|---|---|---|
x^2 · x^3 | 2x / 3x^2 | 2x·x^3 + x^2·3x^2 | 5x^4 (= power rule on x^5) |
x · sin x | 1 / cos x | 1·sin x + x·cos x | sin x + x cos x |
sin x · cos x | cos x / -sin x | cos x·cos x + sin x·(-sin x) | cos^2 x - sin^2 x (= cos 2x) |
The first confirms the rule against the power rule; the last two use the trig derivatives from the previous lesson.
Why it matters for AI
Section titled “Why it matters for AI”Networks multiply weights by activations everywhere; attention multiplies learned weights by parameter-dependent values. Backpropagation applies the product rule at every such product, one gradient term per factor. It is one of the handful of rules autodiff applies constantly, out of sight.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Guessing
f' · g'. That is the vanishing corner, not the derivative. Usef' · g + f · g'. - Losing a term. Always two terms; the rectangle grows two ways.
- Mismatching pairings. Each term pairs one factor’s derivative with the other factor undifferentiated.
- Forgetting the corner is second-order. It dies because it shrinks faster than
dx.
The one-line version
Section titled “The one-line version”The derivative of a product is f'·g + f·g', the two strips a rectangle gains when its width and height each grow in turn, never the product of the derivatives.