Skip to content

Cheatsheet: Taylor series

f(x) ≈ f(a) + f'(a)(x-a) + (f''(a)/2!)(x-a)^2 + (f'''(a)/3!)(x-a)^3 + ...

Each term uses one more derivative of f evaluated at a. Approximates f near a.

Differentiating (x-a)^k exactly k times produces k!. Dividing by k! cancels it, so the k-th derivative of the k-th term at a equals f^(k)(a). This is the matching property: the polynomial and f share value, slope, concavity, … all orders, at a.

Terms keptShape
1st ordertangent line
2nd orderparabola matching concavity
higherwraps tighter around f near a

Local: great near a, may diverge far away.

e^x = 1 + x + x^2/2! + x^3/3! + ... (all derivatives = 1)
sin x = x - x^3/3! + x^5/5! - ... (odd powers, alternating)
cos x = 1 - x^2/2! + x^4/4! - ... (even powers, alternating)

Check: e^1 ≈ 1+1+.5+.167+.042+.008 = 2.717 (true 2.71828). sin(π/2): 1.571-0.646+0.080-0.005 ≈ 0.9998.

  • Small-angle sin(x) ≈ x = the first-order Taylor term; x - x^3/6 is sharper (sin(0.5): 0.4792 vs true 0.4794).
  • L’Hopital = the first-order Taylor ratio f'(a)/g'(a).
  • Higher-order derivatives = the ingredients; a polynomial equals its own (terminating) Taylor series.
  • Gradient descent = first-order Taylor: approximate the loss by its tangent plane, step downhill.
  • Newton’s method = second-order Taylor: add the Hessian term, a parabola; jump to its bottom. (Zero-finding: set first-order Taylor to 0 -> x - f(x)/f'(x).)
  • Neural tangent kernel: first-order Taylor of a network in its parameters at init.
  • Hardware computes sin, exp, log as Taylor-style polynomials.
  • Derivatives at x, not a. Every coefficient is f^(k)(a), measured at the center.
  • Dropping the factorials. The k-th term is f^(k)(a)/k!.
  • Expecting accuracy far from a. Taylor is local.
  • Confusing a truncation with the function. A few terms approximate; the full series equals (where it converges).

A Taylor series rebuilds a function near a point from its derivatives there, which is why gradient descent (first-order) and Newton’s method (second-order) are Taylor approximations of a loss.