Skip to content

Cheatsheet: The product rule

d/dx( f · g ) = f' · g + f · g'

Two terms. Not f' · g' (that is the most common mistake).

Three factors: d/dx(f·g·h) = f'·g·h + f·g'·h + f·g·h' (one term per factor).

f · g is the area of a rectangle, width f, height g. Nudge x by dx:

PieceSizeSurvives?
Top stripf · g' · dxYes (first-order)
Side stripf' · g · dxYes (first-order)
Corner blockf' · g' · dx^2No (second-order, vanishes)

Added area / dx -> f · g' + f' · g. The corner block is exactly the wrong f' · g' guess, and it is the piece that dies in the limit.

The rectangle grows in two independent directions: width moves while height holds (f' · g), height moves while width holds (f · g'). Two ways to grow, two terms. Changing both at once is the corner, which vanishes.

Productf, f’ / g, g’Product ruleResult
x^2 · x^32x / 3x^22x·x^3 + x^2·3x^25x^4 (= power rule on x^5)
x · sin x1 / cos x1·sin x + x·cos xsin x + x cos x
sin x · cos xcos x / -sin xcos x·cos x + sin x·(-sin x)cos^2 x - sin^2 x (= cos 2x)

The first confirms the rule against the power rule; the last two use the trig derivatives from the previous lesson.

Networks multiply weights by activations everywhere; attention multiplies learned weights by parameter-dependent values. Backpropagation applies the product rule at every such product, one gradient term per factor. It is one of the handful of rules autodiff applies constantly, out of sight.

  • Guessing f' · g'. That is the vanishing corner, not the derivative. Use f' · g + f · g'.
  • Losing a term. Always two terms; the rectangle grows two ways.
  • Mismatching pairings. Each term pairs one factor’s derivative with the other factor undifferentiated.
  • Forgetting the corner is second-order. It dies because it shrinks faster than dx.

The derivative of a product is f'·g + f·g', the two strips a rectangle gains when its width and height each grow in turn, never the product of the derivatives.