Cheatsheet: The power rule from geometry
The power rule
Section titled “The power rule”d/dt( t^n ) = n · t^(n-1)The n counts the faces that grow when you nudge one dimension; the t^(n-1) is the size of each face.
Where it comes from (geometry)
Section titled “Where it comes from (geometry)”| Function | Picture | Growth when side -> t + dt | Derivative |
|---|---|---|---|
t^2 | square, side t | 2 strips (t·dt each) + corner (dt^2) | 2t |
t^3 | cube, side t | 3 slabs (t^2·dt each) + smaller terms | 3t^2 |
The corner / edge terms (dt^2, dt^3) vanish faster than the strips as dt -> 0, so they drop; the strips survive.
Works for any power
Section titled “Works for any power”1/t = t^(-1) -> d/dt = -1·t^(-2) = -1/t^2√t = t^(1/2) -> d/dt = (1/2)·t^(-1/2) = 1/(2√t)Negative and fractional exponents follow the same rule.
Two linearity rules
Section titled “Two linearity rules”| Rule | Statement | Why |
|---|---|---|
| Constant multiple | d/dt(c·f) = c·d/dt(f) | Stretching a graph vertically by c scales every slope by c. |
| Sum | d/dt(f+g) = d/dt(f)+d/dt(g) | Rates of a sum are the sum of the rates. |
A constant has derivative 0 (it never changes).
Worked polynomial
Section titled “Worked polynomial”d/dt( 3t^4 + 2t^2 - 7 ) = 3·4t^3 + 2·2t + 0 = 12t^3 + 4tThree mechanical lines, no binomial expansion.
Why it matters for AI
Section titled “Why it matters for AI”These rules are what automatic differentiation applies to compute gradients, chaining the derivative of each elementary operation through the network. The power rule is everywhere because squaring is: mean squared error has terms (prediction - target)^2, whose derivative is 2·(prediction - target), so the gradient is proportional to the error.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Memorizing without the picture.
n= faces that grow,t^(n-1)= each face’s size; rederive from the square/cube. - Keeping the corner term. The
dt^2/dt^3pieces vanish; the strips/slabs survive. - Forgetting a constant’s derivative is 0. Shifting a graph up/down changes no slopes.
- Thinking it is whole-powers only. Holds for negative and fractional
ntoo.
The one-line version
Section titled “The one-line version”The power rule d/dt(t^n) = n·t^(n-1) is just “how many faces grow, times each face’s size” when you nudge a cube’s side, and the constant-multiple and sum rules let it differentiate any polynomial on sight.