Summary: Implicit differentiation

Every derivative so far assumed you could write y as a clean function of x. But most real relations between two quantities cannot be untangled that way: the circle x² + y² = 25 ties x and y together without making either a clean function of the other. Implicit differentiation finds the slope anyway, and it introduces no new machinery, it is the chain rule applied to a relationship. This is the scan-it-in-five-minutes version.

Core ideas

The move. Treat y as an unknown function of x, differentiate both sides of the relation with respect to x, attach a dy/dx to every y term (that is the chain rule), then solve algebraically for dy/dx. You never solve the relation for y.
The key instance. d/dx(y²) = 2y · dy/dx, because y² is the composition (y(x))²: outer derivative 2y times inner derivative dy/dx.
The circle, worked. x² + y² = 25 differentiates to 2x + 2y·dy/dx = 0, so dy/dx = -x/y. At (3, 4) that is -3/4, which is perpendicular to the radius (slope 4/3), exactly the tangent slope, found without solving for y.
It is just the chain rule. Every y term is a composition, so differentiating it deposits a dy/dx; the x terms differentiate normally. Mixed terms like xy use the product rule (y + x·dy/dx). The hyperbola xy = 1 cross-checks: implicit gives -y/x = -1/x², matching the power rule on 1/x.
It can produce new derivatives. From e^y = x (i.e. y = ln x), differentiating gives e^y·dy/dx = 1, so dy/dx = 1/e^y = 1/x. The derivative of ln(x) is 1/x, with no special log rule.
Related rates is the time twin. When x and y both change with time under a constraint, differentiate the constraint with respect to t to link the rates. The sliding ladder x² + y² = 100 gives 2x·dx/dt + 2y·dy/dt = 0, so the top falls faster (-0.75 then -1.33 ft/s) as the base slides out.

What changes for you

When a relationship cannot be solved for one variable, you no longer have to: differentiate it as it stands, let the chain rule attach a dy/dx to every y, and solve. That single technique covers tangent slopes on curves, derivatives you did not have a rule for (like ln x), and time-linked rates in physics and engineering. In machine learning it is the foundation of any method that works with constraints or fixed points rather than explicit formulas: constrained optimization (Lagrange multipliers and the KKT conditions), deep equilibrium models (a layer defined as the solution to a fixed-point equation, differentiated implicitly so it trains without storing every step), and the differential-equation machinery inside score-based diffusion models. The next lesson goes back to the foundation underneath all of this, the limit, and handles the awkward 0/0 forms with L’Hôpital’s rule.