Higher-order derivatives

Several lessons ago, a quiet point slipped by: the derivative of a function is itself a function. Differentiating the position function 16 times time squared did not give a number, it gave 32 times time, a new function with a value at every point. And if the first derivative is a function, you can differentiate it, getting the second derivative. Differentiate again for the third derivative, and so on. This lesson is about what those higher derivatives mean, because the second one in particular has two vivid interpretations and one very practical job.

Notation

After the first derivative, written with one prime, the higher ones stack up as two primes for the second, three primes for the third, and then, since the primes get unwieldy, a parenthesized four for the fourth, a parenthesized five for the fifth, and onward. The Leibniz notation writes the second derivative as d-squared-f over d-x-squared, the third with a three in place of the two, and so on. The two notations mean the same thing and you will see both, often in the same text. Read the second derivative as “the derivative of the derivative of the function.”

To see the tower in action, differentiate the function x to the fourth power repeatedly with the power rule:

f(x)    = x^4
f'(x)   = 4x^3
f''(x)  = 12x^2
f'''(x) = 24x
f^(4)(x) = 24
f^(5)(x) = 0

Each derivative drops the exponent by one, and after the fourth derivative a degree-4 polynomial has flattened to a constant; differentiate once more and it is zero. Every polynomial eventually differentiates to zero this way. That fact, that a function is fully described by a finite tower of derivatives, is exactly what the Taylor series in the next lesson will exploit.

What the second derivative means in physics

The cleanest interpretation comes from motion. If the position function gives position over time, then its first derivative is velocity (the rate position changes), and its second derivative is acceleration (the rate velocity changes). The second derivative of position is acceleration, and this is not a metaphor: Newton’s law that force equals mass times acceleration is the same as saying force equals mass times the second derivative of position. The entire framework of classical mechanics is written in second derivatives.

There is even a name for the third derivative of position, the rate at which acceleration changes: jerk. It is what you feel when a car’s braking suddenly eases or tightens, and motion-planning systems for robots and elevators limit it to keep rides smooth. Past that, higher derivatives of position rarely get used, but the pattern is clear: each derivative is the rate of change of the one before.

Worked example. Throw a ball upward with height given by negative 16 times time squared, plus 64 times time, plus 32 (in feet, after some number of seconds). Differentiate once for velocity, which gives negative 32 times time plus 64. Differentiate again for acceleration, which gives negative 32, a constant. That constant is gravity, always pulling down at the same rate regardless of where the ball is. At time 2 seconds, the velocity works out to negative 32 times 2 plus 64, which equals 0: the ball is momentarily still, at the top of its arc, at a height of negative 64 plus 128 plus 32, which is 96 feet. The acceleration there is still negative 32, because gravity never lets up even at the peak. Three derivatives, three layers of the motion: where it is, how fast, and how the speed is changing.

What the second derivative means on a graph: concavity

The first derivative is the slope of the function. The second derivative is the rate the slope is changing, which shows up visually as curvature, or concavity.

When the second derivative is positive: the slope is increasing, so the graph curves upward, like a cup or a smiling parabola.
When the second derivative is negative: the slope is decreasing, so the graph curves downward, like a frown.
When the second derivative is zero with a sign change: the curvature flips from one to the other. That point is an inflection point, where the graph switches between cupping up and cupping down.

The second derivative reads off a curve's shape. Where it is negative, the curve is concave down, bending like the top of a hill; where it is positive, the curve is concave up, bending like the bottom of a valley. For y = x³ the second derivative is 6x: negative for x < 0, positive for x > 0, and exactly zero at the origin. That crossing is the inflection point, where the curvature flips from one to the other.

The practical job: telling a max from a min

This is why the second derivative earns its keep. A point where the first derivative is zero is a critical point, where the graph is momentarily flat, which could be the top of a hill, the bottom of a valley, or a flat spot on a slope. The second derivative classifies it, the second-derivative test:

If the second derivative is positive at the critical point, the graph is cupping upward there, so it is a local minimum (the bottom of a valley).
If the second derivative is negative, the graph is cupping downward, so it is a local maximum (the top of a hill).
If the second derivative is zero, the test is inconclusive and you need to look closer.

A first, simple test. Take the parabola that is x squared. Its derivative is 2 times the input, zero only when the input is 0, so there is one critical point. The second derivative is 2, a positive constant, so the second derivative at zero is 2, which is greater than 0: the critical point is a minimum, the bottom of the bowl at the origin, the point 0 comma 0. And since the second derivative is positive everywhere, the parabola cups upward at every point, which matches what you already know about x squared. The test confirms the obvious, which is how you trust it on the cases that are not obvious.

A fuller example. Map the shape of the function x cubed minus 3 times the input. The first derivative is 3 times x squared, minus 3. Setting it to zero, 3 times x squared equals 3, so an input of 1 and an input of negative 1 are the critical points. The second derivative is 6 times the input. Test each:

At an input of 1: the second derivative is 6, which is greater than 0, so this is a local minimum, with value 1 minus 3, which equals negative 2.
At an input of negative 1: the second derivative is negative 6, which is less than 0, so this is a local maximum, with value negative 1 plus 3, which equals 2.

And the second derivative, 6 times the input, is zero when the input is 0, where it changes sign (negative to the left, positive to the right), so an input of 0 is an inflection point. From two derivatives, you have the full shape: a hill at the point negative 1 comma 2, a valley at the point 1 comma negative 2, and a curvature flip at the origin, without plotting a single extra point.

Two functions worth differentiating twice

Sine. Differentiate the sine function twice: its first derivative is cosine, then its second derivative is negative sine. So the second derivative equals the negative of the original function. The second derivative of sine is its own negative, and the same holds for cosine, whose second derivative is negative cosine. This is the oscillation equation, that the second derivative equals the negative of the function, the relationship the trig-derivatives lesson first hinted at: it describes anything whose acceleration points back toward the center in proportion to displacement, which is exactly springs, pendulums, sound, light, and alternating current. The second derivative is where “why sine governs all oscillation” becomes a precise statement.

The exponential. Differentiate Euler’s number raised to the input and you get the same thing back; differentiate again and you get it again. Every derivative of Euler’s number raised to the input, to any order, is itself. This relentless self-reproduction is the deepest fact about the exponential, and it is exactly what will make its Taylor series, in the next lesson, come out so clean.

Why this matters when you use AI

The second derivative is the engine of second-order optimization. Where gradient descent uses only the first derivative (the slope) to decide which way to step, second-order methods like Newton’s method also use curvature, the second derivative, to take better-informed steps and converge much faster on well-behaved problems. In many variables this curvature information is packaged in the Hessian, the matrix of all second partial derivatives.

In deep learning the full Hessian is usually too large to compute directly, but second-order information leaks in everywhere. The widely used Adam optimizer keeps a running estimate of how the gradient is changing, an informal curvature signal that lets it adapt its step sizes. Methods like K-FAC approximate the Hessian to train networks more efficiently. And the analysis of a model’s loss landscape, whether training is stuck at a saddle point, sitting in a wide flat basin, or balanced on a sharp ridge, is entirely a story about second derivatives, the curvature of the loss in different directions. Practitioners rarely compute these by hand, but the geometry of training is second-derivative geometry.

Common pitfalls

Forgetting that the second derivative is the rate of change of the slope, not the slope. The first derivative is the slope; the second derivative is how the slope itself is changing. A function can have a large positive slope (first derivative greater than zero) while that slope is shrinking (second derivative less than zero); the graph rises but levels off.

Misreading the second-derivative test. A second derivative greater than zero at a critical point means minimum (cupping up holds water), and a second derivative less than zero means maximum. People often flip these; anchor on the cup shape, not on memorized signs.

Treating a second derivative of zero as automatically an inflection point. It is only an inflection point if the curvature actually changes sign there. A second derivative of zero is necessary but not sufficient; check that the sign flips.

Stopping the physical interpretation at velocity. Position differentiates to velocity, velocity to acceleration, acceleration to jerk. The second derivative of position is acceleration, not velocity; keep the layers straight.

What you should remember

A derivative is a function, so you can differentiate it again: the second derivative, the third derivative, and onward (or d-squared-f over d-x-squared in Leibniz notation). In physics, position differentiates to velocity to acceleration (the second derivative of position is acceleration, the heart of force equals mass times acceleration).
The second derivative is curvature. A positive second derivative means the graph cups upward (slope increasing), a negative second derivative means it cups downward, and a sign change is an inflection point. This powers the second-derivative test: at a critical point (where the first derivative is zero), a positive second derivative is a minimum and a negative second derivative is a maximum.
Two signature cases: the second derivative of sine is negative sine (the oscillation equation, where the second derivative equals the negative of the function, governing everything that swings or waves), and every derivative of Euler’s number raised to the input is itself. In machine learning, second-derivative (curvature) information drives Newton’s method, the Hessian, adaptive optimizers, and loss-landscape analysis.

The derivative of a derivative measures how change itself is changing, which is acceleration in time and curvature in space, and which sorts hills from valleys. The final lesson takes this idea to its limit: using a function’s whole tower of higher derivatives at a single point to rebuild the entire function as a polynomial, the Taylor series.