Cheatsheet: The derivative as a rate
The paradox and the fix
Section titled “The paradox and the fix”- Paradox: a derivative is the “rate of change at an instant,” but nothing changes in zero time.
- Fix: the derivative is the value the average rate approaches as the measuring interval shrinks toward zero. Not “rate at an instant”; “the limit of the rate as the span vanishes.”
Rise over run, with the run shrinking
Section titled “Rise over run, with the run shrinking”average rate over [t, t+dt] = (change in quantity) / dtderivative = limit of that ratio as dt -> 0Worked: free-fall velocity, s(t) = 16t^2
Section titled “Worked: free-fall velocity, s(t) = 16t^2”avg velocity = (16(t+dt)^2 - 16t^2) / dt = (32t·dt + 16·dt^2) / dt = 32t + 16·dtas dt -> 0: instantaneous velocity = 32tAt t = 2: 32·2 = 64 ft/s.
Worked: derivative of s(t) = t^3 from scratch
Section titled “Worked: derivative of s(t) = t^3 from scratch”(t+dt)^3 - t^3 = 3t^2·dt + 3t·dt^2 + dt^3divide by dt: 3t^2 + 3t·dt + dt^2as dt -> 0: s'(t) = 3t^2Shrinking dt does not complicate the calculation, it cleans it (the dt terms vanish). Cubic to quadratic. (The power rule, named next lesson.)
Geometric: secant to tangent
Section titled “Geometric: secant to tangent”Average rate over [t, t+dt] = slope of the secant line through two points. As dt -> 0 the two points merge and the secant rotates into the tangent line. The derivative is the slope of the tangent at a point.
What dy/dx means
Section titled “What dy/dx means”Shorthand for the limit, not a fraction of infinitesimals. “The rate at which y changes with x,” computed as rise-over-run with the run shrinking to zero. The derivative is itself a function: position -> velocity at every instant.
Why it matters for AI
Section titled “Why it matters for AI”Training follows the derivative of the loss downhill: “if I nudge this parameter a little, how much does the loss change” is rise-over-run as the run shrinks. The gradient is a vector of such derivatives (one per parameter), recomputed each step; automatic differentiation gets the exact limit by applying derivative rules through the network.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Reading
dy/dxas a fraction of infinitesimals. It is limit notation. - “Rate at an instant.” It is the rate the average approaches as the span vanishes.
- Forgetting it is a function.
s'(t)gives a slope at every point; “att=2” gives a number. - Plugging
dt = 0directly. That is0/0. Simplify first, then takedt -> 0.
The one-line version
Section titled “The one-line version”A derivative is the limit of rise over run as the run shrinks to zero, which is the slope of the tangent line, and dy/dx is shorthand for that limit, not a fraction.