Lesson: Implicit differentiation and related rates
Every derivative in this track so far started from y as a clean function of the input: one variable written cleanly as a function of the other. But most relationships in the wild are not like that. The equation of a circle, x squared plus y squared equals 25, ties x and y together without making either a clean function of the other; to solve for y you would have to split into an upper half and a lower half and drag in square roots. Yet the circle obviously has a definite slope at every point on it. How do you find that slope without first untangling the equation?
The answer is implicit differentiation, and the good news is that it introduces no new machinery. It is the chain rule from two lessons ago, applied to a relationship rather than a function.
The move: treat y as a function of x, then differentiate everything
Section titled “The move: treat y as a function of x, then differentiate everything”Here is the whole idea. Even though you do not have a formula for y in terms of x, the relationship still makes y depend on x: pick an x on the curve and the equation pins down the matching y. So treat y as an unknown function of x and differentiate both sides of the equation with respect to the input.
The only subtlety is what happens to terms containing y. Since y is secretly a function of x, any expression in y is a composition, and differentiating it invokes the chain rule. The key instance:
d/dx( y^2 ) = 2y · dy/dxThis is exactly the chain rule: y squared is really y-of-x, all squared, an outer “square” wrapped around the inner function y-of-x, so its derivative is two times y-of-x times the inner derivative, which is the derivative of the output with respect to the input. That derivative of the output with respect to the input is the thing we are solving for; it rides along on every y term.
Worked example: the circle
Section titled “Worked example: the circle”Differentiate both sides of x squared plus y squared equals 25 with respect to the input:
d/dx(x^2) + d/dx(y^2) = d/dx(25)2x + 2y · dy/dx = 0The x squared term differentiates normally to two-x; the y squared term picks up the chain-rule derivative of the output with respect to the input; the constant 25 differentiates to 0. Now solve for that derivative:
dy/dx = -x / yA clean formula for the slope at any point on the circle, found without ever solving for y.
A geometric check
Section titled “A geometric check”Test it at the point three comma four, which is on the circle (9 plus 16 equals 25). The formula gives a slope of negative three-quarters there. Is that right? The radius from the origin out to three comma four has slope four-thirds. A circle’s tangent line is always perpendicular to its radius, and the slope perpendicular to four-thirds is its negative reciprocal, negative three-quarters. The implicit derivative matches the geometry exactly: the derivative of the output with respect to the input equals negative x over y, the tangent slope.
Why this is just the chain rule
Section titled “Why this is just the chain rule”It is worth saying plainly: implicit differentiation is not a separate technique to memorize. Every y term you differentiate is a composition (some function of y, where y is itself a function of x), so the chain rule applies and deposits a derivative of the output with respect to the input. Differentiating x squared plus y squared equals 25 term by term, the x squared is an ordinary derivative and the y squared is a chain-rule derivative, and collecting the derivative-of-the-output terms and solving is just algebra. If you can do the chain rule, you can do this.
Worked example: cross-checking against a known derivative
Section titled “Worked example: cross-checking against a known derivative”Take the hyperbola x times y equals 1. Differentiate both sides; the left side is a product, so the product rule applies:
d/dx(x · y) = (1)·y + x·(dy/dx) = 0so y plus x times the derivative of the output with respect to the input equals 0, giving that derivative equals negative y over x. Now, this curve we can solve: y equals one over x, so substitute to get the derivative equals negative one over x, all over x, which is negative one over x squared. That is exactly the derivative of one over x from the power-rule lesson (the negative-exponent case, where the derivative of x to the negative one is negative x to the negative two). The implicit method and the direct method agree, which is the kind of consistency that should make you trust implicit differentiation on curves you cannot solve.
Worked example: a relation you cannot untangle
Section titled “Worked example: a relation you cannot untangle”Now one where solving for y is genuinely hopeless: x squared plus x-times-y plus y squared equals 7. Differentiate term by term. The x squared gives two-x. The x-times-y is a product, giving y plus x times the derivative of the output with respect to the input. The y squared gives two-y times that same derivative. The constant gives 0:
2x + y + x·dy/dx + 2y·dy/dx = 0Collect the derivative-of-the-output terms: that derivative times the quantity x plus two-y equals negative the quantity two-x plus y, so
dy/dx = -(2x + y) / (x + 2y)There is no clean way to write y as a function of x for this curve (a tilted ellipse), yet implicit differentiation hands you the slope at every point directly. This is the real payoff: the method does not care whether the relation can be untangled.
Worked example: deriving a brand-new derivative
Section titled “Worked example: deriving a brand-new derivative”Implicit differentiation can also hand you derivatives you did not have before. Consider the relation Euler’s number raised to the y equals x. (This is just the natural logarithm in disguise: it says y equals the natural log of x, since the logarithm is what undoes the exponential.) Differentiate both sides with respect to the input. The left side needs the chain rule on Euler’s number raised to the y, using the self-derivative property of Euler’s number from the e lesson:
d/dx( e^y ) = e^y · dy/dx and d/dx( x ) = 1so Euler’s number raised to the y, times the derivative of the output with respect to the input, equals 1, giving that derivative equals one over Euler’s number raised to the y. But Euler’s number raised to the y equals x by the original relation, so:
dy/dx = 1/xThat is the derivative of the natural log of x, and we just produced it without any special logarithm rule, purely by differentiating Euler’s number raised to the y equals x implicitly and substituting back. The natural logarithm’s derivative, the derivative of the natural log of x being one over x, falls out of Euler’s number’s self-derivative property plus this one technique.
Related rates: when both variables move in time
Section titled “Related rates: when both variables move in time”Implicit differentiation has a twin that shows up constantly in physics and engineering. Suppose x and y both change over time t, while staying tied by a constraint. Then their rates of change are linked, and differentiating the constraint with respect to time reveals how.
The classic example: a 10-foot ladder leans against a wall, its base x feet from the wall and its top y feet up, so x squared plus y squared equals 100 always holds. Differentiate with respect to time:
2x · dx/dt + 2y · dy/dt = 0If the base slides outward at a horizontal rate of 1 foot per second, then the vertical rate, the rate of the top with respect to time, equals negative x over y times that horizontal rate, which is negative x over y feet per second: the top slides down (negative), and the speed depends on where the ladder currently is. Put numbers on it: when the base is x equals 6 feet out, the top is at y equals 8 feet (since 6 squared plus 8 squared equals 100), and the top descends at a vertical rate of negative 6 over 8, which is negative 0.75 feet per second. Later, when the base has slid to x equals 8, the top is at y equals 6 and descends at a vertical rate of negative 8 over 6, roughly negative 1.33 feet per second, noticeably faster. Near vertical (small x, large y) the top barely moves; near horizontal (large x, small y) it plummets. The two rates are bound together by the constraint, and implicit differentiation is what extracts the link.
Why this matters when you use AI
Section titled “Why this matters when you use AI”Implicit differentiation is not in the inner loop of a typical training run the way the chain rule is, but it is the foundation of any method that works with constraints or fixed points rather than explicit formulas.
Constrained optimization (Lagrange multipliers and the KKT conditions behind constrained training problems) computes how an optimum shifts along a constraint surface, which is implicit differentiation of the constraint. Deep equilibrium models define a layer as the solution to a fixed-point equation, where a function of z and the input returns z itself, rather than an explicit formula, and computing gradients through such a layer uses implicit differentiation directly, which is what lets these models train without storing every intermediate step. Score-based diffusion models lean on related machinery when differentiating through the differential equations they solve. The common thread: whenever a relationship is defined implicitly rather than solved explicitly, this is the tool that still gets you a derivative.
Common pitfalls
Section titled “Common pitfalls”Forgetting the derivative factor on y terms. Differentiating y squared gives two-y times the derivative of the output with respect to the input, not just two-y. That derivative is the chain-rule factor that appears because y is a function of x. Dropping it is the number-one implicit-differentiation error.
Not applying the product rule to mixed terms. A term like x-times-y needs the product rule: its derivative is y plus x times the derivative of the output with respect to the input. Treating it as a single variable’s derivative loses a term.
Trying to solve for y first. The whole point is that you often cannot, and you do not need to. Differentiate the relation as it stands, then solve the resulting equation for the derivative of the output with respect to the input algebraically.
Confusing the slope-derivative with the time-derivative. Implicit differentiation with respect to the input gives a slope (the derivative of the output with respect to the input); related rates differentiate with respect to time and give linked speeds (the horizontal rate and the vertical rate over time). Same technique, different variable to differentiate against; be clear which one the problem wants.
What you should remember
Section titled “What you should remember”- Implicit differentiation finds the derivative of the output with respect to the input for a relation you cannot solve for y. Treat y as a function of x, differentiate both sides, attach that derivative to every y term (that is the chain rule), then solve algebraically. For the circle x squared plus y squared equals 25 this gives a derivative of negative x over y, the tangent slope, which checks out as perpendicular to the radius.
- It is the chain rule, not a new rule. Each y term is a composition (a function of y-of-x), so differentiating it produces the derivative-of-the-output factor. The hyperbola x times y equals 1 cross-checks against the direct derivative, negative one over x squared, confirming the method.
- Related rates is the time-based twin. When x and y both change with time under a constraint, differentiating the constraint with respect to time links their rates: the sliding ladder’s relation, two-x times the horizontal rate plus two-y times the vertical rate equals 0, ties how fast the top falls to how fast the base slides.
When you cannot solve a relationship for one variable, you do not have to: differentiate it as it stands, let the chain rule attach the derivative of the output with respect to the input to every y, and solve. The next lesson goes back to the foundation underneath all of this, the limit, and handles the awkward zero-over-zero forms that the rate definition can produce, with L’Hôpital’s rule.