Summary: Implicit differentiation
Every derivative so far assumed you could write y as a clean function of x. But most real relations between two quantities cannot be untangled that way: the circle x² + y² = 25 ties x and y together without making either a clean function of the other. Implicit differentiation finds the slope anyway, and it introduces no new machinery, it is the chain rule applied to a relationship. This is the scan-it-in-five-minutes version.
Core ideas
Section titled “Core ideas”- The move. Treat
yas an unknown function ofx, differentiate both sides of the relation with respect tox, attach ady/dxto everyyterm (that is the chain rule), then solve algebraically fordy/dx. You never solve the relation fory. - The key instance.
d/dx(y²) = 2y · dy/dx, becausey²is the composition(y(x))²: outer derivative2ytimes inner derivativedy/dx. - The circle, worked.
x² + y² = 25differentiates to2x + 2y·dy/dx = 0, sody/dx = -x/y. At(3, 4)that is-3/4, which is perpendicular to the radius (slope4/3), exactly the tangent slope, found without solving fory. - It is just the chain rule. Every
yterm is a composition, so differentiating it deposits ady/dx; thexterms differentiate normally. Mixed terms likexyuse the product rule (y + x·dy/dx). The hyperbolaxy = 1cross-checks: implicit gives-y/x = -1/x², matching the power rule on1/x. - It can produce new derivatives. From
e^y = x(i.e.y = ln x), differentiating givese^y·dy/dx = 1, sody/dx = 1/e^y = 1/x. The derivative ofln(x)is1/x, with no special log rule. - Related rates is the time twin. When
xandyboth change with time under a constraint, differentiate the constraint with respect totto link the rates. The sliding ladderx² + y² = 100gives2x·dx/dt + 2y·dy/dt = 0, so the top falls faster (-0.75then-1.33ft/s) as the base slides out.
What changes for you
Section titled “What changes for you”When a relationship cannot be solved for one variable, you no longer have to: differentiate it as it stands, let the chain rule attach a dy/dx to every y, and solve. That single technique covers tangent slopes on curves, derivatives you did not have a rule for (like ln x), and time-linked rates in physics and engineering. In machine learning it is the foundation of any method that works with constraints or fixed points rather than explicit formulas: constrained optimization (Lagrange multipliers and the KKT conditions), deep equilibrium models (a layer defined as the solution to a fixed-point equation, differentiated implicitly so it trains without storing every step), and the differential-equation machinery inside score-based diffusion models. The next lesson goes back to the foundation underneath all of this, the limit, and handles the awkward 0/0 forms with L’Hôpital’s rule.