Skip to content

Summary: Taylor series

This track opened by deriving a circle’s area from rings; thirteen lessons later, the last result is the most powerful single one. The Taylor series rebuilds any well-behaved function near a point as a plain polynomial, using nothing but the function’s derivatives at that point. It is the capstone because it needs everything you have learned, and it is the bridge to the math that machine learning actually runs on. This is the scan-it-in-five-minutes version.

  • The Taylor expansion. Near a point a: f(x) ≈ f(a) + f'(a)·(x-a) + (f''(a)/2!)·(x-a)² + (f'''(a)/3!)·(x-a)³ + .... Each term uses one more derivative at a, divided by k!, attached to a power of (x-a).
  • Why factorials. They make the matching property exact: the polynomial and f share value, slope, concavity, and every higher derivative at a. Differentiating (x-a)^k exactly k times produces k!; dividing by k! cancels that buildup so the k-th derivative of the k-th term at a is exactly f^(k)(a).
  • The geometric build. Keep one term and you have the tangent line; add the next and the line bends into a parabola matching concavity; each further term wraps the polynomial one more order tighter around f near a.
  • Three canonical series at 0. e^x = 1 + x + x²/2! + x³/3! + ... (every derivative is 1; test at x=1: 1+1+.5+.167+.042+.008 = 2.717 ≈ e). sin x = x - x³/3! + x⁵/5! - ... (odd powers, alternating; partial sums at x=1 march 1.000, 0.833, 0.842, 0.84147 = sin 1). cos x = 1 - x²/2! + x⁴/4! - ... (even powers, alternating).
  • Earlier lessons made rigorous. The small-angle sin(x) ≈ x is the first-order Taylor term (next term x - x³/6 is sharper). L’Hôpital is the first-order Taylor ratio f'(a)/g'(a). A polynomial equals its own (finite, terminating) Taylor series, because its tower of derivatives runs out.
  • Newton’s method is Taylor at work. Replace f with its first-order Taylor approximation (the tangent line) and jump to where it crosses zero: x - f(x)/f'(x). For √2 from x_0 = 1.5: 1.5 → 1.41667 → 1.41421 in two steps. Each step roughly doubles the correct digits.

The arc that began with a circle now closes with a single polynomial standing in for any function you like, built from nothing but the rates you know how to find. That single idea is the calculus engine of machine learning: gradient descent is a first-order Taylor step on the loss (approximate by the tangent plane, walk downhill of the linear model); Newton’s method is a second-order Taylor step (add the Hessian term, build a parabola, jump to its bottom); the neural tangent kernel is a first-order Taylor expansion of a network in its parameters at initialization (and underwrites infinite-width training analysis). Even at the silicon level, when a processor evaluates sin, exp, log, it is computing a truncated Taylor-style polynomial, because polynomials are all the hardware can do directly. When a paper writes a gradient step, a second-order method, or a tangent-kernel argument, it is speaking Taylor. With Track 8 complete, those moves read as ideas you understand rather than opaque symbols, which was the point of the track from the first lesson.