Summary: The essence of calculus

You know the area of a circle is πR², but almost nobody can say why. Rebuilding it from scratch turns out to contain the whole of calculus in miniature: the trick of slicing a hard problem into easy pieces and adding them up, the two big ideas (rates and accumulation), and the surprising fact that those two ideas are inverses of each other. This lesson is the orientation for the whole track; everything after it is machinery for those two questions. This is the scan-it-in-five-minutes version.

Core ideas

The method: slice, then add. A curved boundary is hard to attack head-on, so break it into many small pieces that are each easy, then sum them as the pieces shrink toward zero. This is the move behind almost all of calculus.
The circle-area derivation. Carve the disk into thin concentric rings. One ring at radius r, thickness dr, unrolls into a thin rectangle of area ≈ 2πr · dr. Summing all the rings is the area under the line 2πr, a triangle of base R and height 2πR, so area = (1/2) · R · 2πR = πR². The formula you memorized is the area under a straight line. (At R = 3: base 3, height 6π, area 9π = π · 3².)
The two pillars. Differentiation asks how fast a function changes at each instant (the rate); integration asks how much it accumulates over a range (the total). The circle’s area πR² is the accumulated (integrated) circumference 2πr.
They are inverses (the Fundamental Theorem). Accumulate the circumference 2πr to get the area πR², then take the area’s rate of change, A'(R) = 2πR, and you get the circumference back. Integration builds up, differentiation breaks down, and they undo each other. You watched this happen on a circle before any term was defined carefully.
dr is an ordinary small number. Not a mystical infinitesimal: a real, small width that we let shrink toward zero, with the approximation sharpening as it does. Each ring’s error shrinks faster than the rings add up, so in the limit the slice-and-add answer is exact, not merely close.

What changes for you

Calculus stops being a bag of rules to memorize and becomes two questions you can always ask of any function: how fast is it changing, and how much does it accumulate? That reframing is load-bearing for AI. A model trains by computing its loss’s rate of change with respect to each parameter and nudging downhill, which is the derivative idea, and gradient descent is just “follow the slope,” repeated millions of times; backpropagation is an organized way to compute those rates through many layers. Continuous probability, how models handle uncertainty, measures likelihood as the area under a density curve, an integral. The d and the integral sign that look like decoration in a paper are describing exactly the two ideas you just rebuilt on a circle. The next lesson zooms in on the rate side and makes “how fast, right now” precise.