Cheatsheet: Implicit differentiation
The move
Section titled “The move”For a relation that ties x and y together (but is not solved for y):
- Treat
yas a function ofx. - Differentiate both sides with respect to
x. - Attach a
dy/dxto everyyterm (chain rule):d/dx(y^2) = 2y · dy/dx. - Solve the resulting equation for
dy/dxalgebraically.
Why it is just the chain rule
Section titled “Why it is just the chain rule”y^2 is really (y(x))^2, a composition, so its derivative is 2·y(x)·y'(x) = 2y·dy/dx. Every y term is a composition; the dy/dx is the chain-rule factor. No new technique.
Worked examples
Section titled “Worked examples”| Relation | Differentiate | Solve | dy/dx |
|---|---|---|---|
x^2 + y^2 = 25 | 2x + 2y·y' = 0 | -x/y | |
xy = 1 | y + x·y' = 0 (product rule) | sub y=1/x | -y/x = -1/x^2 |
x^2 + xy + y^2 = 7 | 2x + y + x·y' + 2y·y' = 0 | -(2x+y)/(x+2y) |
Circle check at (3,4): dy/dx = -3/4, perpendicular to the radius (slope 4/3). The hyperbola’s -1/x^2 matches the power rule on 1/x.
Related rates (the time twin)
Section titled “Related rates (the time twin)”When x and y both change with time t under a constraint, differentiate the constraint w.r.t. t:
Ladder x^2 + y^2 = 100: 2x·dx/dt + 2y·dy/dt = 0base slides out dx/dt = 1 -> top falls dy/dt = -x/yThe two rates are linked by the constraint.
Why it matters for AI
Section titled “Why it matters for AI”The foundation of methods using constraints or fixed points:
- Constrained optimization (Lagrange / KKT): how an optimum shifts along a constraint surface.
- Deep equilibrium models: a layer defined as a fixed point
f(z,x)=z; gradients use implicit differentiation. - Score-based diffusion: differentiating through the ODEs solved at training time.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Forgetting the
dy/dxonyterms.d/dx(y^2) = 2y·dy/dx, not2y. - Skipping the product rule on mixed terms.
d/dx(xy) = y + x·dy/dx. - Trying to solve for
yfirst. Often impossible and unnecessary; differentiate as-is. - Confusing
dy/dxwithdy/dt. Slope (vsx) versus linked rates (vs timet).
The one-line version
Section titled “The one-line version”To differentiate a relation you cannot solve for y, treat y as a function of x and differentiate everything, letting the chain rule attach a dy/dx to each y term, then solve for dy/dx.