Cheatsheet: Taylor series
The Taylor expansion
Section titled “The Taylor expansion”f(x) ≈ f(a) + f'(a)(x-a) + (f''(a)/2!)(x-a)^2 + (f'''(a)/3!)(x-a)^3 + ...Each term uses one more derivative of f evaluated at a. Approximates f near a.
Why the factorials
Section titled “Why the factorials”Differentiating (x-a)^k exactly k times produces k!. Dividing by k! cancels it, so the k-th derivative of the k-th term at a equals f^(k)(a). This is the matching property: the polynomial and f share value, slope, concavity, … all orders, at a.
Geometric build (term by term)
Section titled “Geometric build (term by term)”| Terms kept | Shape |
|---|---|
| 1st order | tangent line |
| 2nd order | parabola matching concavity |
| higher | wraps tighter around f near a |
Local: great near a, may diverge far away.
Canonical series at a = 0
Section titled “Canonical series at a = 0”e^x = 1 + x + x^2/2! + x^3/3! + ... (all derivatives = 1)sin x = x - x^3/3! + x^5/5! - ... (odd powers, alternating)cos x = 1 - x^2/2! + x^4/4! - ... (even powers, alternating)Check: e^1 ≈ 1+1+.5+.167+.042+.008 = 2.717 (true 2.71828). sin(π/2): 1.571-0.646+0.080-0.005 ≈ 0.9998.
Earlier lessons, now rigorous
Section titled “Earlier lessons, now rigorous”- Small-angle
sin(x) ≈ x= the first-order Taylor term;x - x^3/6is sharper (sin(0.5): 0.4792 vs true 0.4794). - L’Hopital = the first-order Taylor ratio
f'(a)/g'(a). - Higher-order derivatives = the ingredients; a polynomial equals its own (terminating) Taylor series.
Why it matters for AI (the big one)
Section titled “Why it matters for AI (the big one)”- Gradient descent = first-order Taylor: approximate the loss by its tangent plane, step downhill.
- Newton’s method = second-order Taylor: add the Hessian term, a parabola; jump to its bottom. (Zero-finding: set first-order Taylor to 0 ->
x - f(x)/f'(x).) - Neural tangent kernel: first-order Taylor of a network in its parameters at init.
- Hardware computes
sin,exp,logas Taylor-style polynomials.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Derivatives at
x, nota. Every coefficient isf^(k)(a), measured at the center. - Dropping the factorials. The
k-th term isf^(k)(a)/k!. - Expecting accuracy far from
a. Taylor is local. - Confusing a truncation with the function. A few terms approximate; the full series equals (where it converges).
The one-line version
Section titled “The one-line version”A Taylor series rebuilds a function near a point from its derivatives there, which is why gradient descent (first-order) and Newton’s method (second-order) are Taylor approximations of a loss.