Skip to content

Lesson: Integration and the fundamental theorem

The very first lesson of this track found the area of a circle by slicing it into thin rings and adding them up. That was integration, performed informally before we had the word. Calculus has two halves, and the last eight lessons built the first half thoroughly: differentiation, the study of rates. This lesson formalizes the second half, accumulation, and then states the theorem that binds the two halves into one subject.

The headline result, the fundamental theorem of calculus, is one of the most consequential facts in mathematics, and it says something almost too convenient to believe: to add up a quantity over a range, you find a function whose rate of change is that quantity, and subtract its values at the endpoints. Accumulation is undone by differentiation. Let us build to it.

The general problem: given a function f of the input, what is the total accumulation of that function over an interval from a lower limit to an upper limit? Geometrically, this is the area under the curve of the function between the lower and upper limits. The circle lesson was one instance, with the function being two pi r and the area being pi R squared; now we want the area under any curve.

The trouble is the same one the circle posed: the region has a curved top, so there is no simple area formula. The fix is the same too. Slice and add.

Chop the interval from the lower limit to the upper limit into many thin vertical strips, each of a small width: the length of the interval divided by the number of strips. Approximate each strip by a rectangle whose height is the function’s value somewhere in that strip. The strip’s area is about that height times the small width, and the total area is approximately the sum of all of them:

area ≈ f(x_1)·Δx + f(x_2)·Δx + ... + f(x_n)·Δx

This is a Riemann sum. As you use more and thinner strips (let the number of strips grow and the width shrink toward zero), the staircase of rectangles hugs the curve ever more closely, and the sum approaches the true area. That limit, exactly the kind of limit made precise in the previous lesson, is the definite integral:

∫_a^b f(x) dx = lim (n->infinity) sum of f(x_i)·Δx

The notation tells the story once you read it. The integral sign is an elongated S, for “sum.” The differential, the small d followed by the input, is the width of a strip, shrunk toward zero. The whole expression, the integral of the function from the lower limit to the upper limit, literally means “sum up the function’s value times a tiny width, across the input from the lower limit to the upper limit, in the limit as the width vanishes.” It is the slice-and-add of the first lesson, written down carefully.

Watch the limit converge with actual rectangles, for the area under the curve y equals the input squared, from zero to one (whose true value we will confirm is one third). Using more and more rectangles with right-edge heights, the sum works out to a tidy formula, and plugging in a larger number of rectangles gives:

n = 4 -> 0.469
n = 10 -> 0.385
n = 100 -> 0.338

The Riemann sums march steadily down toward one third, which is zero point three three three and so on, getting closer as the rectangles thin. The integral is the value they approach, exactly the “approaches a limit” idea from the previous lesson, now applied to a sum of areas instead of a single ratio.

A coarse Riemann sum and a fine Riemann sum approximating the area under y = x squared on the interval from 0 to 1 Two side-by-side coordinate panels, each plotting the same smooth curve y equals x squared over the interval from 0 to 1, drawn in teal. The left panel shows a coarse approximation: five thick amber-filled rectangles standing under the curve, each one as tall as the curve at the middle of its strip, leaving visible gaps and overshoots where the staircase misses the smooth curve. The right panel shows a fine approximation: twenty-four thin amber-filled rectangles whose tops hug the curve so closely that the staircase nearly disappears into the smooth area beneath the curve. The left panel is labeled "coarse (few strips)" and the right panel is labeled "fine (many strips, toward the integral)". The teaching point: as the strips become thinner and more numerous, the rectangle sum converges to the true area under the curve, which is the definite integral. coarse (few strips) x y = x² 0 1 fine (many strips, toward the integral) x y = x² 0 1 ∫₀¹ x² dx thinner strips
Approximate the area under y = x² by standing thin rectangles on the interval and adding them up. With a few coarse strips (left) the staircase clearly misses the smooth curve. With many fine strips (right) the rectangle tops hug the curve and the gaps all but vanish. As the strips get thinner and more numerous, the sum converges to the true area under the curve, the definite integral of x² from 0 to 1.

Computing that limit directly, summing infinitely many shrinking rectangles, would be miserable for most functions. The fundamental theorem makes it unnecessary. It says: if the capital F function is an antiderivative of the original function, meaning a function whose derivative is the original function, then the definite integral is just the change in the antiderivative across the interval:

∫_a^b f(x) dx = F(b) - F(a)

To find the area under the function, you do not sum rectangles. You find a function whose rate of change is the original function, evaluate it at the two endpoints, and subtract. Accumulation is computed by running differentiation backward.

This is the central fact of calculus because it welds its two halves together. The first lesson glimpsed it on the circle: the accumulated area, pi R squared, had derivative two pi R, the very circumference being accumulated. The fundamental theorem says that is no coincidence: the rate of change of an accumulation is the thing being accumulated, and conversely, to accumulate, you reverse the rate. Differentiation and integration are inverse operations, and this theorem is the precise statement of it.

Antiderivatives are your derivative rules, run backward

Section titled “Antiderivatives are your derivative rules, run backward”

Since integration is “find a function whose derivative is the original function,” every derivative rule from this track becomes an integration rule in reverse.

The power rule, the derivative of the input raised to an exponent equals the exponent times the input raised to one less than that exponent, reverses into:

∫ x^n dx = x^(n+1)/(n+1) + C (for n ≠ -1)

Raise the exponent by one and divide by the new exponent; check it by differentiating back. The one exception is when the exponent is negative one, where this would divide by zero; there the antiderivative of one over the input is the natural log of the absolute value of the input, plus a constant, exactly because the implicit-differentiation lesson found that the derivative of the natural log of the input is one over the input. Likewise, Euler’s number raised to the input is its own antiderivative (from the Euler’s number lesson), and sine and cosine swap with a sign (from the trig lesson). Your differentiation knowledge is your integration knowledge, read in the other direction.

A polynomial. Compute the integral of the input squared, from zero to one. The antiderivative of the input squared is the input cubed over three (raise the exponent, divide by three). By the fundamental theorem:

∫_0^1 x^2 dx = [ x^3/3 ]_0^1 = 1^3/3 - 0^3/3 = 1/3

The area under the parabola, y equals the input squared, from zero to one is exactly one third. No rectangles summed; just an antiderivative evaluated at the endpoints.

An exponential. Compute the integral of Euler’s number raised to the input, from zero to one. Since Euler’s number raised to the input is its own antiderivative:

∫_0^1 e^x dx = [ e^x ]_0^1 = e^1 - e^0 = e - 1 ≈ 1.718

The one that needs a logarithm. Compute the integral of one over the input, from one to two. This is the exponent-equals-negative-one case, so the antiderivative is not a power but the natural log of the absolute value of the input:

∫_1^2 (1/x) dx = [ ln|x| ]_1^2 = ln(2) - ln(1) = ln(2) - 0 ≈ 0.693

The area under one over the input, from one to two, is exactly the natural log of two, which is also where the natural logarithm gets its values: the natural log of a number is the area under one over the input out to that number. The result that the derivative of the natural log of the input is one over the input, from the implicit-differentiation lesson, is what makes this work.

A trig function. Compute the integral of sine of the input, from zero to pi. The antiderivative of sine is negative cosine (because the derivative of negative cosine is sine):

∫_0^π sin(x) dx = [ -cos(x) ]_0^π = -cos(π) - (-cos(0)) = -(-1) + 1 = 2

The area under one full hump of the sine curve is exactly two, a surprisingly clean number for such a curvy region.

Closing the circle. Return to the first lesson. Its circle-area derivation, slicing the disk into rings of circumference two pi r and a tiny thickness in the radius, was secretly the integral of two pi r with respect to the radius, from zero to the full radius. Now compute it formally. The antiderivative of two pi r is pi r squared (the constant two pi times r squared over two), so:

∫_0^R 2πr dr = [ πr^2 ]_0^R = πR^2 - 0 = πR^2

The track opened with this result derived by hand-waving; the fundamental theorem now delivers it in one line. The opening question has its formal answer.

Indefinite versus definite, and the constant

Section titled “Indefinite versus definite, and the constant”

Two related objects share the integral sign. The definite integral, with a lower limit and an upper limit, has limits and produces a single number, the accumulated total. The indefinite integral, the integral of the function with no limits attached, produces a function, the general antiderivative, written as the antiderivative plus a constant.

That plus-a-constant is there because adding any constant to an antiderivative leaves its derivative unchanged: if the derivative of the antiderivative is the original function, then the derivative of the antiderivative plus a constant is the original function too, since the derivative of a constant is zero. So the function has not one antiderivative but a whole family, all differing by a constant. For a definite integral the constant cancels out, the antiderivative at the upper limit plus a constant, minus the antiderivative at the lower limit plus the same constant, leaves just the difference of antiderivative values, which is why we ignore it when computing areas, but it matters when the antiderivative itself is the answer.

Integration is the mathematics of continuous probability, and continuous probability is everywhere in machine learning.

A probability density must integrate to one over all outcomes, that is, the integral of the density across every outcome equals one, and the probability that a quantity lands between two values is the area under the density there, the integral of the density from the lower value to the upper value. An expected value, the mean of a continuous distribution, is the integral of the input times the density. Entropy and KL divergence, which appear in the loss functions of generative models and variational methods, are integrals over distributions (the negative integral of the density times the log of the density, and its relative version). And continuous-time models, such as neural differential equations and the diffusion models behind modern image generators, perform their forward pass by solving an integral numerically, and backpropagate through it with techniques that are integration run in reverse.

In practice these integrals are usually computed numerically (sampling-based Monte Carlo, or quadrature) rather than by finding antiderivatives, because real densities rarely have clean ones. But the theory rests entirely on the fundamental theorem: it is what connects the rates a model learns (gradients, score functions) to the accumulations it cares about (probabilities, expected losses).

Forgetting the fundamental theorem and trying to sum rectangles. You almost never compute an integral as a literal Riemann sum by hand. Find an antiderivative and use the antiderivative at the upper limit minus the antiderivative at the lower limit. The Riemann sum is the definition; the fundamental theorem is the tool.

Dropping the plus-a-constant on an indefinite integral. The indefinite integral of a function is a family of functions differing by a constant, so the answer is the antiderivative plus a constant. Omitting the constant claims there is only one antiderivative, which is false.

Mixing up the exponent-equals-negative-one case. The power rule for antiderivatives, raising the exponent by one and dividing by the new exponent, divides by zero when the exponent is negative one. The antiderivative of one over the input is not a power; it is the natural log of the absolute value of the input, plus a constant.

Confusing the definite and indefinite integral. With limits you get a number; without limits you get a function. They share notation but are different objects, and the constant lives only on the indefinite one.

  • The definite integral of a function from the lower limit to the upper limit is the area under that function across that interval, defined as the limit of Riemann sums (thin rectangles, each a height times a tiny width, as the width shrinks to zero). It is the precise version of the slice-and-add that opened this track.
  • The fundamental theorem of calculus says the definite integral equals the antiderivative at the upper limit minus the antiderivative at the lower limit, where the antiderivative is any function whose derivative is the original function. To accumulate a quantity, find a function whose rate of change is that quantity and subtract its endpoint values. Differentiation and integration are inverse operations, which is why every derivative rule reverses into an integration rule.
  • Antiderivatives come from running your rules backward: the integral of the input raised to an exponent is the input raised to one more than that exponent, divided by the new exponent, plus a constant (except when the exponent is negative one, which gives the natural log of the absolute value of the input, plus a constant); the integral of Euler’s number raised to the input is itself plus a constant; the integral of sine is negative cosine plus a constant. This is the engine behind continuous probability, expected values, and the loss functions of modern generative models.

The track opened by finding a circle’s area through slicing and adding; it now has the formal machinery that did so, and the theorem that makes accumulation the inverse of rate. The next lesson dwells on why the fundamental theorem is true, unpacking geometrically how an area can equal a difference of antiderivative values.