Lesson: Limits done carefully
The limit has been hiding under every derivative in this track. Back in the second lesson, the derivative was defined as the rise-over-run ratio (the change in the function over a small step) and the value it approaches as a small step shrinks to zero. We leaned on the word “approaches” without examining it. This lesson examines it, because the idea is more subtle than it looks, and because the awkward zero-over-zero forms the rate definition keeps producing need a tool to handle them.
Three threads here, and they serve one purpose: to pin down what a limit is and to compute the ones that resist a simple plug-in.
Most limits are just “plug in”
Section titled “Most limits are just “plug in””The limit of a function asks: what value does the output head toward as the input heads toward the target point? For the well-behaved (continuous) functions you meet most of the time, the answer is simply the value there. To find the limit of the input squared as the input approaches 2, plug in 2 for the input and get 4. Done. For continuous functions, the limit equals the value, and there is nothing more to say.
lim (x->2) x^2 = 2^2 = 4The reason limits are worth a whole lesson is the cases where plugging in fails. The derivative is the headline example: the change in the function over a small step becomes zero-over-zero if you set the small step to zero directly, which is meaningless. Zero-over-zero is an indeterminate form: it could approach any value, or none, depending on how the top and bottom shrink. Computing such limits is the real content.
What “approaches” precisely means
Section titled “What “approaches” precisely means”To handle the tricky cases (and to be sure the easy ones are really right), mathematicians made “approaches” precise. The definition, called the epsilon-delta definition, says:
The limit of the function as the input approaches the target point equals L means: for every target precision epsilon greater than zero you demand on the output, there is an input window delta greater than zero around the target point such that whenever the input is within delta of the target point (but not equal to it), the output is within epsilon of L.
That is the formal statement, named here for completeness, but the takeaway is the plain-English version, not the Greek letters:
You can force the output as close to L as you want, just by making the input close enough to the target point.
If someone demands the output be within one-thousandth of L, you can find a small enough window around the target point that guarantees it. If they demand one-millionth, you can find an even smaller window. No matter how tight the target, a window exists. That “for any demanded precision, a window exists” is what it means for the limit to be L.
Epsilon-delta in action
Section titled “Epsilon-delta in action”See it work on the limit of the input squared as the input approaches 2, which equals 4. Someone demands the output be within epsilon of 4; you must find a window of inputs around 2 that delivers it. The gap in the output is:
| x^2 - 4 | = | x - 2 | · | x + 2 |When the input is near 2, the factor (the distance from the input to negative 2) is near 4, so the output gap is roughly 4 times the distance from the input to 2. To make that smaller than epsilon, keep the distance from the input to 2 smaller than about epsilon over 4. So choosing the window delta to be about epsilon over 4 works: any input within epsilon over 4 of 2 lands within (about) epsilon of 4.
To make it airtight rather than approximate, control the wandering factor (the distance from the input to negative 2) first. If you agree to stay within 1 of 2 (so the input lies between 1 and 3), then that factor never exceeds 5. Now the output gap equals the distance from the input to 2 times that factor, which stays below 5 times the distance from the input to 2, so to force the output gap below epsilon it is enough to keep the distance from the input to 2 below epsilon over 5. Choosing delta to be the smaller of 1 and epsilon over 5 then guarantees the output gap stays below epsilon. For a demand of epsilon equals 0.1, that is delta equals 0.02; for epsilon equals 0.001, delta equals 0.0002; tighten the target and the window shrinks to match. The structure is the whole point: hand me any precision epsilon, and I hand you back a window delta that delivers it. The definition is mechanical once you see it as a challenge-and-response.
Why bother with this machinery instead of just trusting intuition? Because “gets close to” hides real subtleties: functions that jump, functions that oscillate infinitely fast near a point, functions defined one way at the target point and another way around it. The epsilon-delta definition handles all of them with one uniform standard, and it is the bedrock the rest of calculus is proved on.
When there is no limit at all
Section titled “When there is no limit at all”The precise definition earns its keep by ruling cases out, not just in. Consider the sine of one over the input, as the input approaches zero. As the input shrinks, one over the input races off toward infinity, so the sine swings between negative 1 and 1 faster and faster, infinitely many full oscillations in any window around zero. There is no single value it settles toward. By the epsilon-delta standard this is decisive: pick epsilon equals 0.5, and no matter how small a window delta you choose around zero, the function still takes values near positive 1 and near negative 1 inside it, so it cannot stay within 0.5 of any candidate value L. The limit does not exist, and the definition says so cleanly, where a vague “gets close to” would leave you guessing. Distinguishing “approaches a value” from “has no limit” is exactly the kind of judgment the formal definition makes reliable.
L’Hôpital’s rule: rescuing 0/0
Section titled “L’Hôpital’s rule: rescuing 0/0”Now the practical tool. Consider:
lim (x->0) sin(x) / xPlug in zero for the input and you get sine of zero over zero, which is zero-over-zero, indeterminate. But the limit clearly exists: graph sine of the input over the input and it heads smoothly toward 1 as the input approaches zero. How do you compute it?
L’Hôpital’s rule: if the limit of the numerator over the denominator has the indeterminate form zero-over-zero or infinity-over-infinity, then it equals the limit of the derivatives of the top and bottom, taken separately:
lim (x->a) f(x)/g(x) = lim (x->a) f'(x)/g'(x)Differentiate the numerator and the denominator each on its own (this is not the quotient rule, the two are differentiated separately), then try the limit again. For sine of the input over the input:
lim (x->0) sin(x)/x = lim (x->0) cos(x)/1 = cos(0)/1 = 1The derivative of sine is cosine (from the trig lesson), the derivative of the input is 1, and now plugging in zero for the input is fine. The limit is 1, which is exactly the small-angle approximation that sine of the input is approximately the input from that earlier lesson, now proved rather than asserted.
A second example, with Euler’s number. The limit, as the input approaches zero, of Euler’s number raised to the input minus 1, all over the input, is also zero-over-zero (since Euler’s number raised to the zero, minus 1, is zero). Apply L’Hôpital, using that Euler’s number raised to the input is its own derivative:
lim (x->0) (e^x - 1)/x = lim (x->0) e^x/1 = e^0/1 = 1Applying it twice. Some limits stay indeterminate after one pass. Take the limit, as the input approaches zero, of 1 minus cosine of the input, all over the input squared. Plugging in gives zero-over-zero. One round of L’Hôpital:
lim (x->0) (1 - cos(x))/x^2 = lim (x->0) sin(x)/(2x)That is still zero-over-zero. Apply L’Hôpital again:
lim (x->0) sin(x)/(2x) = lim (x->0) cos(x)/2 = cos(0)/2 = 1/2The answer is one-half. When one application does not resolve the form, you apply the rule again, as long as the indeterminate form persists.
The infinity-over-infinity case. L’Hôpital handles infinity-over-infinity as well as zero-over-zero. Take the limit, as the input approaches infinity, of the natural logarithm of the input over the input. Both the top and the bottom run off to infinity, so it is infinity-over-infinity, indeterminate. Apply the rule, using the natural-log derivative we found in the implicit-differentiation lesson (the derivative of the natural log is one over the input):
lim (x->∞) (ln x)/x = lim (x->∞) (1/x)/1 = lim (x->∞) 1/x = 0The limit is zero, which says something concrete: the input grows faster than its natural logarithm, so their ratio collapses to nothing in the long run. This is the formal version of “logarithms grow slowly,” a fact that matters whenever you compare how fast two quantities scale.
Why L’Hôpital works
Section titled “Why L’Hôpital works”The rule is not magic; it is reading the leading behavior. Near the target point, a function is approximately its value plus its slope times the displacement from the point (this is the first-order Taylor approximation, the seed of the final lesson). The same holds for the denominator. When both the numerator and the denominator are zero at the point (the zero-over-zero case), the constant terms vanish, and:
f(x)/g(x) ≈ f'(a)·(x - a) / ( g'(a)·(x - a) ) = f'(a)/g'(a)The shared displacement factor cancels, leaving the ratio of the derivatives. L’Hôpital simply replaces each function by the rate that dominates its behavior near the point, which is exactly what determines where the ratio is heading.
Why this matters when you use AI
Section titled “Why this matters when you use AI”Limits are rarely something a machine-learning practitioner computes by hand, but they are the foundation the field’s guarantees rest on. Convergence analysis, the proofs that gradient descent actually settles toward a minimum rather than wandering forever, is stated and proved in the language of limits. Continuous-time models such as neural differential equations are the limit of a discrete update as the step size shrinks to zero, the same small-step-shrinks-to-zero move that defined the derivative. And universal approximation theorems, the results that say a neural network can approximate any reasonable function, are limit-based existence statements. The intuition you build here, that “approaches” means “can be forced arbitrarily close,” is the same intuition underneath every claim that an algorithm converges or that an architecture can represent what you need.
Common pitfalls
Section titled “Common pitfalls”Using L’Hôpital when the form is not indeterminate. The rule applies only to zero-over-zero or infinity-over-infinity. If plugging in gives a determinate value like three over five or two over zero, L’Hôpital does not apply, and using it gives a wrong answer. Always check the form first.
Using the quotient rule instead. L’Hôpital differentiates the numerator and denominator separately, giving the derivative of the numerator over the derivative of the denominator, not the quotient rule’s combination. They are different operations; do not confuse them.
Stopping when the form is still zero-over-zero. If one application leaves another indeterminate form, apply the rule again (and again), as in the 1 minus cosine example over the input squared. Stop only when plugging in gives a determinate value.
Setting the variable equal to the limit point too early. In a limit you simplify first and substitute last. Setting the small step to zero in the rise-over-run ratio before simplifying gives zero-over-zero; the whole technique is about handling the approach without ever landing exactly on the point.
What you should remember
Section titled “What you should remember”- A limit is the value the output approaches as the input approaches the target point, made precise by epsilon-delta: for any output precision epsilon you demand, there is an input window delta that delivers it. The plain version: you can force the output as close to L as you like by making the input close enough to the target point. For continuous functions the limit is just the value (plug in).
- Indeterminate forms (zero-over-zero, infinity-over-infinity) are the interesting case, and L’Hôpital’s rule handles them: replace the numerator over the denominator with the derivative of the numerator over the derivative of the denominator (each differentiated separately), then try again, repeating if the form persists. So the limit of sine of the input over the input as the input approaches zero is 1, and the limit of Euler’s number raised to the input minus 1, all over the input, as the input approaches zero, is also 1.
- L’Hôpital works because it keeps only the leading behavior: near the point, each function is its slope times the displacement, the shared factor cancels, and the ratio of the derivatives remains. This first-order picture is the seed of Taylor series, and the limit concept itself underwrites every convergence and approximation guarantee in machine learning.
Every derivative was a limit, and now the limit itself is on solid ground: “approaches” means “can be forced arbitrarily close,” and the stubborn zero-over-zero forms yield to L’Hôpital by swapping functions for their rates. With differentiation fully built, the next lesson turns to the other half of calculus, integration, and the theorem that ties the two together.