Skip to content

Cheatsheet: The power rule from geometry

d/dt( t^n ) = n · t^(n-1)

The n counts the faces that grow when you nudge one dimension; the t^(n-1) is the size of each face.

FunctionPictureGrowth when side -> t + dtDerivative
t^2square, side t2 strips (t·dt each) + corner (dt^2)2t
t^3cube, side t3 slabs (t^2·dt each) + smaller terms3t^2

The corner / edge terms (dt^2, dt^3) vanish faster than the strips as dt -> 0, so they drop; the strips survive.

1/t = t^(-1) -> d/dt = -1·t^(-2) = -1/t^2
√t = t^(1/2) -> d/dt = (1/2)·t^(-1/2) = 1/(2√t)

Negative and fractional exponents follow the same rule.

RuleStatementWhy
Constant multipled/dt(c·f) = c·d/dt(f)Stretching a graph vertically by c scales every slope by c.
Sumd/dt(f+g) = d/dt(f)+d/dt(g)Rates of a sum are the sum of the rates.

A constant has derivative 0 (it never changes).

d/dt( 3t^4 + 2t^2 - 7 )
= 3·4t^3 + 2·2t + 0
= 12t^3 + 4t

Three mechanical lines, no binomial expansion.

These rules are what automatic differentiation applies to compute gradients, chaining the derivative of each elementary operation through the network. The power rule is everywhere because squaring is: mean squared error has terms (prediction - target)^2, whose derivative is 2·(prediction - target), so the gradient is proportional to the error.

  • Memorizing without the picture. n = faces that grow, t^(n-1) = each face’s size; rederive from the square/cube.
  • Keeping the corner term. The dt^2/dt^3 pieces vanish; the strips/slabs survive.
  • Forgetting a constant’s derivative is 0. Shifting a graph up/down changes no slopes.
  • Thinking it is whole-powers only. Holds for negative and fractional n too.

The power rule d/dt(t^n) = n·t^(n-1) is just “how many faces grow, times each face’s size” when you nudge a cube’s side, and the constant-multiple and sum rules let it differentiate any polynomial on sight.