Summary: The product rule

When two functions are multiplied, the natural guess for the derivative, multiply the derivatives, is wrong. The right answer, the product rule, has two terms, and one picture, a rectangle whose width and height both grow, shows exactly where both terms come from and why the tempting one-term guess fails. It is the same nudge-and-look reasoning as the power and trig lessons, applied to a rectangle’s area. This is the scan-it-in-five-minutes version.

Core ideas

The product rule: d/dx(f · g) = f' · g + f · g'. Two terms, a sum, not the tempting f' · g'.
The picture: a growing rectangle. Let f be the width and g the height, so f · g is the area. Nudge x by dx and the area gains a top strip (f · g' · dx), a side strip (f' · g · dx), and a tiny corner block (f' · g' · dx²). Divide by dx and let dx -> 0: the two strips survive, the corner vanishes, leaving f' · g + f · g'.
Why two terms. The rectangle grows two independent ways: width moves while height holds (f' · g), and height moves while width holds (f · g'). Two ways to grow, two terms, each letting one factor change while the other rides along.
Why f' · g' is wrong. It is the corner block, the area from changing both dimensions at once. Being the product of two tiny quantities, it is second-order in dx and dies in the limit. The wrong guess is precisely the piece that does not survive.
Worked and cross-checked. x² · x³ gives (2x)(x³) + (x²)(3x²) = 5x⁴, which matches the power rule on x⁵. x · sin x gives sin x + x cos x; sin x · cos x gives cos²x - sin²x (which is cos 2x). A numeric nudge (f = x², g = x at x = 3) shows the strips summing to the rate 27 while the corner stays a tiny 0.0006.
More factors, same idea. d/dx(f · g · h) = f' · g · h + f · g' · h + f · g · h': one term per factor, each taking its turn to change.

What changes for you

You stop reaching for f' · g' and carry a picture that makes the right answer obvious: a product is a rectangle’s area, and it grows in two independent directions, so its rate of change is a sum of two contributions, never a product. The corner block is a vivid reminder of which piece the limit throws away, the exact piece the wrong guess keeps. This rule runs quietly under nearly every gradient in machine learning, because networks multiply weights by activations everywhere and attention multiplies learned weights by parameter-dependent values; backpropagation applies the product rule at each such product, one gradient term per factor, billions of times per training run. The next lesson takes on the other way functions combine, nesting one inside another, with the chain rule.