Practice: Taylor series

Self-check

Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.

1. State the Taylor expansion of f near a point a.

Show answer

f(x) ≈ f(a) + f'(a)·(x-a) + (f''(a)/2!)·(x-a)² + (f'''(a)/3!)·(x-a)³ + .... Each term uses one more derivative of f evaluated at the center a and attaches it (divided by k!) to a power of (x-a). The first term pins the value, the second the slope, the third the concavity, and each further term matches one more order of behavior at a.

2. Why are the 1/k! factorials there?

Show answer

To make the matching property exact. Differentiating (x-a)^k exactly k times produces k! = k·(k-1)···2·1. Dividing the k-th term by k! cancels that buildup, so that the k-th derivative of the k-th term, evaluated at a, comes out to precisely f^(k)(a). That is what guarantees the polynomial and f share value, slope, concavity, and so on, all the way up, at a.

3. Why are the Taylor series of e^x, sin x, and cos x so clean?

Show answer

Because their higher derivatives at 0 are trivial. Every derivative of e^x is e^x, and e^0 = 1, so every coefficient is 1: e^x = 1 + x + x²/2! + x³/3! + .... The derivatives of sin and cos at 0 cycle through {0, 1, 0, -1}, killing the even (or odd) powers and alternating signs: sin x = x - x³/3! + x⁵/5! - ... and cos x = 1 - x²/2! + x⁴/4! - .... The earlier lessons paid off.

4. How is the small-angle approximation sin(x) ≈ x a Taylor result?

Show answer

It is the first-order Taylor term of sine at 0: sin(x) ≈ x. The next term reveals how good the approximation is and how to improve it: sin(x) ≈ x - x³/6. The intuition introduced in the trig lesson (“the slope of sine at 0 is 1”) was really the first slice of an infinite series.

5. How is L’Hôpital’s rule a Taylor result?

Show answer

L’Hôpital is the first-order Taylor ratio. Near a point, each function is approximately f(a) + f'(a)(x-a). When numerator and denominator both vanish at a, the constants drop out, the shared (x-a) cancels, and the ratio is governed by f'(a)/g'(a). L’Hôpital is Taylor wearing a different hat.

6. Why is Taylor “the” calculus idea machine learning leans on most?

Show answer

Because the field’s core training algorithms are Taylor approximations of the loss. Gradient descent approximates the loss by its tangent plane (first-order Taylor) and steps downhill of that linear model. Newton’s method keeps the second-derivative term too (second-order Taylor), building a parabola and jumping to its bottom. The neural tangent kernel is a first-order Taylor expansion of a network at initialization. And hardware computes sin, exp, and log as truncated Taylor-style polynomials.

Try it yourself, part 1: build cosine, then watch it converge

Pen and paper (a calculator helps), about 7 minutes. Build cosine’s Taylor series at a = 0 and use it to approximate cos(1) (where the true value is 0.54030).

Steps. (1) Compute the first four derivatives of cos x at 0 and write the first four (nonzero) terms of the series. (2) Evaluate the partial sums at x = 1, keeping 1, 2, 3, and 4 terms in turn, and compare each to the true cos(1) ≈ 0.54030.

Show answer

Derivatives at 0: cos(0) = 1, -sin(0) = 0, -cos(0) = -1, sin(0) = 0, then cos(0) = 1 again. The coefficients are 1, 0, -1, 0, 1, 0, -1, ..., so only even powers survive, with alternating signs:

cos x = 1 - x²/2! + x⁴/4! - x⁶/6! + ...

Convergence at x = 1:

1 term  (1)                       ->  1.00000
2 terms (1 - 1/2)                  ->  0.50000
3 terms (1 - 1/2 + 1/24)           ->  0.54167
4 terms (1 - 1/2 + 1/24 - 1/720)   ->  0.54028

The partial sums march 1.00000 -> 0.50000 -> 0.54167 -> 0.54028, closing in on cos(1) ≈ 0.54030 and matching to four decimals after just four terms. Each new term lets the polynomial wrap one more order tighter around the cosine curve near 0.

Try it yourself, part 2: Newton’s method (Taylor for zeros)

About 5 minutes. Newton’s method replaces a function with its first-order Taylor approximation (the tangent line) and jumps to where that line crosses zero, with update x_(n+1) = x_n - f(x_n)/f'(x_n). Use it to approximate √5, the positive zero of f(x) = x² - 5 (so f'(x) = 2x), starting from x_0 = 2. Run three steps and compare to √5 ≈ 2.236068.

Show answer

x_0 = 2
f(2)  = 4 - 5 = -1,    f'(2)  = 4
x_1 = 2 - (-1)/4 = 2 + 0.25 = 2.25

f(2.25)   = 5.0625 - 5 = 0.0625,   f'(2.25)   = 4.5
x_2 = 2.25 - 0.0625/4.5 ≈ 2.25 - 0.01389 = 2.23611

f(2.23611)  ≈ 0.000192,             f'(2.23611) ≈ 4.47222
x_3 = 2.23611 - 0.000192/4.47222 ≈ 2.23611 - 0.000043 = 2.236068

After three steps, x_3 = 2.236068, matching √5 = 2.236068 to six decimals. The convergence is dramatic because the tangent line is an excellent local stand-in for f: each step roughly doubles the number of correct digits (1 → 4 → 6+). The most-used root-finder in scientific computing is, underneath, this lesson’s first Taylor term.

Flashcards

Ten cards spanning the synthesis. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.

Q. State the Taylor expansion of f near a point a.

f(x) ≈ f(a) + f'(a)(x-a) + (f''(a)/2!)(x-a)² + (f'''(a)/3!)(x-a)³ + .... Each term uses one more derivative of f at a, divided by k!, attached to a power of (x-a).

Q. Why are the 1/k! factorials there?

To make the matching property exact. Differentiating (x-a)^k exactly k times produces k!; dividing by k! cancels it, so the k-th derivative of the k-th term at a is exactly f^(k)(a). The polynomial and f then share value, slope, concavity, … at a.

Q. What is the Taylor series of e^x at 0, and why is it so clean?

e^x = 1 + x + x²/2! + x³/3! + .... Every derivative of e^x is e^x, and e^0 = 1, so every coefficient is 1. Differentiating the series term by term reproduces it, the polynomial signature of the self-derivative property.

Q. What are the Taylor series of sin and cos at 0?

sin x = x - x³/3! + x⁵/5! - x⁷/7! + ... (odd powers, alternating signs). cos x = 1 - x²/2! + x⁴/4! - x⁶/6! + ... (even powers, alternating signs). The cycle of derivatives at 0 (0, 1, 0, -1, ...) zeroes one parity and alternates the other.

Q. How is the small-angle sin(x) ≈ x a Taylor result?

It is the first-order Taylor term of sine at 0. The next term sharpens it: sin(x) ≈ x - x³/6. At x = 0.5: plain x gives 0.5, x - x³/6 gives 0.4792, true sin(0.5) = 0.4794. The “approximation” was the first slice of a series all along.

Q. How is L'Hôpital's rule a Taylor result?

It is the first-order Taylor ratio. Near a 0/0 point, each function is f(a) + f'(a)(x-a); the constants vanish, the (x-a) cancels, leaving f'(a)/g'(a). L’Hôpital is Taylor’s first-order ratio in disguise.

Q. What is the matching property?

The Taylor polynomial and the function share value, slope, second derivative, third derivative, and so on, all the way up, at the expansion point a. The factorials make this exact; without them the matching fails.

Q. What is Newton's method, in Taylor terms?

It replaces f with its first-order Taylor approximation (the tangent line) and jumps to where that line crosses zero. Update: x - f(x)/f'(x). For √5 from x_0 = 2: 2 → 2.25 → 2.23611 → 2.236068, matching to six decimals in three steps.

Q. Why is gradient descent a first-order Taylor step?

Because each step approximates the loss near the current parameters by its tangent plane, loss(θ + step) ≈ loss(θ) + gradient·step, and moves in the steepest-downhill direction of that linear model. Newton’s method keeps the second-order term too (the Hessian, a parabola).

Q. Where else does Taylor show up in ML and computation?

The neural tangent kernel is a first-order Taylor expansion of a network at initialization (making infinite-width training tractable). And hardware evaluates sin, exp, log as truncated Taylor-style polynomials, because polynomials are all the silicon can do directly.