Skip to content

Summary: Higher-order derivatives

A derivative is itself a function, so you can differentiate it again. The second derivative f'' measures how the slope is changing, which turns out to mean acceleration in physics and curvature on a graph. It is the tool that tells a maximum from a minimum, and it underwrites the second-order methods that improve on plain gradient descent. This is the scan-it-in-five-minutes version.

  • Higher derivatives are derivatives of derivatives. f -> f' -> f'' -> f''' -> ...; in Leibniz, d²f/dx², d³f/dx³, and so on. Differentiating x⁴ repeatedly gives 4x³, 12x², 24x, 24, 0: every polynomial eventually flattens to zero, which the Taylor series in the next lesson exploits.
  • In physics, the second derivative of position is acceleration. s(t) -> s'(t) = velocity -> s''(t) = acceleration. Newton’s F = ma is mass times this second derivative; classical mechanics is written in second derivatives. (The third is jerk.) For free-fall s(t) = -16t² + 64t + 32, velocity is -32t + 64, acceleration is the constant -32 (gravity), and at the top of the arc (t = 2, height 96 ft) the velocity is zero but the acceleration is still -32.
  • On a graph, the second derivative is curvature. f'' > 0: graph cups upward (smiling). f'' < 0: cups downward (frowning). f'' = 0 with a sign change: inflection point. The picture: positive curvature is a cup that holds water; negative curvature spills it.
  • The second-derivative test classifies critical points. At f'(x) = 0: f'' > 0 is a local minimum (cup holds water), f'' < 0 is a local maximum, f'' = 0 is inconclusive. For f(x) = x³ - 3x: critical points at x = ±1; f'' = 6x gives a min at (1, -2) and a max at (-1, 2), with inflection at the origin.
  • Two signature cases. sin'' = -sin (the oscillation equation f'' = -f that governs springs, pendulums, sound, AC, light); and every derivative of e^x, to any order, is e^x (the relentless self-reproduction that makes its Taylor series clean).

The second derivative stops being a curiosity (“the derivative of the derivative”) and becomes the precise object behind two intuitive ideas: acceleration in time, curvature in space. With it you can map the full shape of a curve from two derivatives, sort the hills from the valleys without plotting, and recognize that the oscillation equation f'' = -f is why sine governs everything that swings or waves. In machine learning, the second derivative is the engine of second-order optimization: Newton’s method uses the Hessian (the matrix of all second partial derivatives) to take better-informed steps than plain gradient descent, Adam keeps a running estimate of how the gradient is changing as an informal curvature signal, K-FAC approximates the Hessian for efficient training, and the analysis of a model’s loss landscape, saddle points, flat basins, sharp ridges, is entirely a story about second derivatives. The track’s final lesson takes higher derivatives to their natural limit: using the whole tower of them at a single point to rebuild a function as a polynomial, the Taylor series.