Practice: The power rule from geometry
Self-check
Section titled “Self-check”Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.
1. State the power rule, and say what the n and the t^(n-1) mean geometrically.
Show answer
d/dt(t^n) = n · t^(n-1). Picture t^n as an n-dimensional cube of side t. Nudging the side grows the cube by a slab on each of its n faces, each slab having the size of one face, t^(n-1), times the thickness. The n counts the faces that grow; the t^(n-1) is the size of each face.
2. Using the growing-square picture, why is the derivative of t² equal to 2t?
Show answer
t² is the area of a square of side t. Nudge the side to t + dt and you add a thin strip along the top (t · dt) and an identical strip along the right (t · dt), plus a tiny corner square (dt²). Added area = 2t·dt + dt²; divide by dt to get 2t + dt; let dt -> 0 and the corner vanishes, leaving 2t. The 2 is the two strips, the t is each strip’s length.
3. Why is it safe to drop the corner term (dt²) but not the strips?
Show answer
The two strips add about 2t · dt, proportional to dt (first order). The corner adds dt², the product of two shrinking quantities, which vanishes much faster: at dt = 0.001 each strip is about 0.001·t while the corner is 0.000001, a thousand times smaller. In the limit the corner’s share of the rate is exactly zero. Dropping higher-power dt terms is the precise statement that they vanish faster than the term you keep.
4. State the two linearity rules and why each is intuitive.
Show answer
Constant-multiple: d/dt(c·f) = c · d/dt(f) (stretching a graph vertically by c scales every slope by c). Sum: d/dt(f + g) = d/dt(f) + d/dt(g) (the rate of a total is the sum of the rates). Together with the power rule, they differentiate any polynomial term by term.
5. Does the power rule work for negative and fractional powers? Give the derivatives of 1/t and √t.
Show answer
Yes, one rule for every power. 1/t = t^(-1) gives -1 · t^(-2) = -1/t² (negative, since 1/t falls as t grows). √t = t^(1/2) gives (1/2) · t^(-1/2) = 1/(2√t). Rewrite the function as a power, then apply n · t^(n-1).
6. What is the derivative of a constant, and why?
Show answer
Zero. A constant never changes, so its rate of change is zero; shifting a graph up or down changes none of its slopes. In a polynomial, the constant term simply drops out when you differentiate (e.g. the -7 in 3t⁴ + 2t² - 7).
Try it yourself, part 1: differentiate on sight
Section titled “Try it yourself, part 1: differentiate on sight”Pen and paper, about 6 minutes. Use the power rule plus the constant-multiple and sum rules, term by term, no binomial expansion.
(a) 5t³ - 2t² + 4t - 9
(b) 2√t + 3/t² (hint: rewrite as 2t^(1/2) + 3t^(-2) first)
Show answer
(a) Term by term:
d/dt(5t³) = 5 · 3t² = 15t²d/dt(-2t²) = -2 · 2t = -4td/dt(4t) = 4 · 1 = 4d/dt(-9) = 0sum: 15t² - 4t + 4(b) Rewrite, then apply n · t^(n-1):
d/dt(2t^(1/2)) = 2 · (1/2)·t^(-1/2) = t^(-1/2) = 1/√td/dt(3t^(-2)) = 3 · (-2)·t^(-3) = -6t^(-3) = -6/t³sum: 1/√t - 6/t³The fractional power gives a fractional power back; the negative power gives a negative derivative (since 3/t² falls as t grows). One rule, applied term by term.
Try it yourself, part 2: watch the square grow
Section titled “Try it yourself, part 2: watch the square grow”About 4 minutes, arithmetic only. Confirm the growing-square reasoning numerically for t² at t = 5, nudging by dt = 0.01.
Steps. (1) Compute the added area (5.01)² - 5² directly. (2) Identify how much of that is the two strips (2 · 5 · 0.01) versus the corner (0.01²). (3) Divide the added area by dt = 0.01 to get the average rate, and compare it to the power-rule answer 2t = 10.
Show answer
added area = (5.01)² - 5² = 25.1001 - 25 = 0.1001 two strips: 2 · 5 · 0.01 = 0.10 (first order in dt; the bulk) corner: 0.01² = 0.0001 (tiny; vanishes fastest)average rate = 0.1001 / 0.01 = 10.01The power rule says d/dt(t²) = 2t = 2·5 = 10. The numeric rate 10.01 is just above it, and the excess (0.01) is exactly the corner’s contribution (dt). Shrink dt further and the rate closes in on 10: the strips give the 2t, and the corner is the vanishing leftover. You just saw “2 strips + corner” in actual numbers.
Flashcards
Section titled “Flashcards”Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.
Q. What is the power rule?
d/dt(t^n) = n · t^(n-1). The n counts the faces of an n-dimensional cube that grow when you nudge one dimension; the t^(n-1) is the size of each face. The single most-used fact in differentiation.
Q. Why is the derivative of t² equal to 2t (growing square)?
Nudging a side-t square to t + dt adds two strips (t·dt each) plus a corner (dt²). Added area = 2t·dt + dt²; over dt that is 2t + dt; as dt -> 0 the corner vanishes, leaving 2t. Two strips, each of length t.
Q. Why is the derivative of t³ equal to 3t² (growing cube)?
Nudging a side-t cube adds a thin slab (t²·dt) on each of three faces, plus edge/corner terms in dt² and dt³ that vanish faster. Added volume ≈ 3t²·dt; over dt and as dt -> 0, that is 3t². Three faces, each of area t².
Q. Why drop the corner term but keep the strips?
The strips are first order in dt (~2t·dt) and survive after dividing by dt; the corner is higher order (dt²) and vanishes faster as dt -> 0. Dropping higher-power dt terms is the precise statement that they vanish faster than the term you keep.
Q. What are the constant-multiple and sum rules?
Constant-multiple: d/dt(c·f) = c·d/dt(f) (scaling a graph vertically by c scales every slope by c). Sum: d/dt(f+g) = d/dt(f) + d/dt(g) (the rate of a sum is the sum of the rates).
Q. Does the power rule work for negative and fractional powers?
Yes. 1/t = t^(-1) gives -1·t^(-2) = -1/t²; √t = t^(1/2) gives (1/2)·t^(-1/2) = 1/(2√t). Rewrite the function as a power, then apply n·t^(n-1).
Q. What is the derivative of a constant, and why?
Zero. A constant never changes, so it has no rate of change; shifting a graph up or down changes none of its slopes. In a polynomial the constant term drops out.
Q. Differentiate 3t⁴ + 2t² - 7.
Term by term: 3·4t³ = 12t³, 2·2t = 4t, and the -7 gives 0. Sum: 12t³ + 4t. Three mechanical lines, no binomial expansion.
Q. Why is the power rule everywhere in machine learning?
Because squaring is everywhere: mean squared error has terms (prediction - target)², whose derivative (power rule) is 2·(prediction - target). So the gradient is proportional to the error, which is why squared-error training nudges parameters in proportion to how wrong they are.