Trig derivatives from geometry

The power rule from last lesson handled the input squared, the input cubed, one over the input, the square root of the input, anything that is a power of the input. But sine of the input and cosine of the input are not powers of anything, so the power rule says nothing about them. They need their own derivation, and like everything in this track, it comes from a picture rather than a formula to memorize. The picture is a single point moving around the unit circle, and from it both trig derivatives drop out at once.

Here is what we are after, the two facts most people memorize and few can explain:

d/dx( sin(x) ) = cos(x)
d/dx( cos(x) ) = -sin(x)

By the end you will have read both straight off the geometry, and you will see why the minus sign lands on cosine and not on sine.

Sine and cosine are just coordinates

Start with what sine and cosine actually are. Put a point on the unit circle (radius 1, centered at the origin) at angle theta, measured in radians counterclockwise from the positive horizontal axis. That point sits at coordinates:

( cos(x), sin(x) )

That is the definition. The cosine of the input is the horizontal coordinate of the point; the sine is the vertical coordinate. As the input grows, the point travels counterclockwise around the circle, and cosine of the input and sine of the input are nothing more than its shadow on the horizontal and vertical axes. Differentiating sine and cosine means asking how fast those two coordinates change as the point moves.

The unit circle is where cos and sin live. Walk an angle x counter-clockwise from the positive x-axis; you land at the point (cos x, sin x). The two coordinates are exactly the horizontal and vertical legs of the radius triangle: read cos x off the x-axis, sin x off the y-axis. That single picture organizes every trig identity.

The point moves at unit speed

Now nudge the angle from theta by a tiny step. How far does the point travel along the circle? This is exactly why angles in calculus are measured in radians: a radian is defined so that the arc length along a unit circle equals the angle. So increasing the angle by that tiny step moves the point an equal arc length around the circle. The point covers the same amount of distance as the change in angle: it moves at unit speed.

That single fact, unit speed, is the whole engine of the derivation, and it is the deep reason calculus insists on radians. We will come back to what breaks if you use degrees.

The velocity points perpendicular to the position

A point moving around a circle is always heading tangent to the circle, at a right angle to the line from the center out to the point (the radius). Think of a ball on a string swung in a circle: at every instant it is moving sideways relative to the string, never along it. So the point’s velocity vector is perpendicular to its position vector.

Because the point moves at unit speed, that velocity vector has length 1, and it points 90 degrees counterclockwise from the position (counterclockwise because the point is traveling counterclockwise). So to get the velocity, take the position vector and rotate it a quarter turn counterclockwise.

Rotating any vector with components a and b by 90 degrees counterclockwise turns it into a new vector whose first component is negative b and whose second component is a (a fact from the linear algebra track, the rotation that sends i-hat to j-hat). Apply that to the position whose components are cosine of the input and sine of the input:

velocity = rotate ( cos(x), sin(x) ) by 90° CCW = ( -sin(x), cos(x) )

As x increases, the point (cos x, sin x) moves along the circle. Its velocity vector points along the tangent at right angles to the radius, has length 1 (one unit of angle moves the point one unit of arc), and points counter-clockwise. The two components of that velocity, read off the axes, are -sin x and cos x: exactly the derivatives of cos x and sin x.

Read off both derivatives

The velocity vector is, by definition, the rate of change of the position. So its components are the rates of change of the two coordinates:

The horizontal coordinate is cosine of the input, and the horizontal component of velocity is negative sine of the input. So the derivative of cosine is negative sine.
The vertical coordinate is sine of the input, and the vertical component of velocity is cosine of the input. So the derivative of sine is cosine.

Both derivatives, from one picture. The minus sign lands on cosine because as the point climbs counterclockwise through the right side of the circle, its horizontal coordinate is shrinking (the point is moving leftward), so the rate of change of cosine is negative there. Sine’s coordinate, meanwhile, is growing, so its rate is positive. The geometry decides the sign; you do not have to remember it.

Sanity checks on the graph

Test the formulas at angles you can picture, against the shape of the curves.

At an input of 0: the sine of 0 is 0, and the formula says the slope of sine there is the cosine of 0, which is 1. The sine graph crosses the origin climbing with slope 1, which is exactly what it does. Meanwhile the cosine of 0 is 1, at its peak, and the formula says its slope is negative the sine of 0, which is 0, flat at the top, also correct.
At an input of pi over 2: the sine of pi over 2 is 1, at its peak, and its slope should be the cosine of pi over 2, which is 0, flat, correct. And the cosine of pi over 2 is 0, crossing zero on the way down, with slope negative the sine of pi over 2, which is negative 1, descending at slope 1, also correct.

Four checks, four matches. The formulas are not arbitrary; they describe the slopes you can already see in the curves.

Two payoffs from the derivative

The small-angle approximation. Near an input of 0, what does sine of the input look like? It passes through the sine of 0, which is 0, with slope the cosine of 0, which is 1, so close to zero the sine curve is nearly the straight line equal to the input itself. Hence the famous approximation that sine of the input is roughly the input for small inputs, used constantly in physics and signal processing. It is just the derivative at zero, telling you the curve’s initial direction. (This is the first sliver of a bigger idea, approximating a function by reading off its derivatives, that the final lesson develops into Taylor series.)

Why sine is everywhere in physics. Differentiate sine of the input twice: the first derivative is cosine of the input, and the derivative of that is negative sine of the input. So sine satisfies

f''(x) = -f(x)

a function whose second derivative is its own negative. Cosine satisfies the very same equation (its second derivative is the derivative of negative sine, which is negative cosine), which is why the two always travel together as the sine-and-cosine pair. That equation describes anything whose acceleration pulls it back toward the center in proportion to how far it has strayed: a mass on a spring, a pendulum, a vibrating string, alternating current, a sound wave, a light wave. This is why sine and cosine show up across all of physics. They are the natural shape of oscillation, and that fact is encoded in their derivatives.

Why radians, really

Everything above rested on the point moving at unit speed, which came from radians making arc length equal to angle. If you measured the angle in degrees instead, a full trip around the circle would be 360 units of angle for 2 pi units of arc length, so the point would move at a speed of 2 pi over 360, which is pi over 180, not 1. Every trig derivative would then carry an ugly factor of pi over 180: the derivative of sine would come out as pi over 180 times cosine of the input. Radians exist precisely to make that factor equal 1, so that calculus on trig functions stays clean. That is the real reason the convention is not just tradition.

Why this matters when you use AI

Trig functions are not in the inner loop of every model the way the power rule and chain rule are, but where they appear, these derivatives appear with them.

The clearest case is positional encoding in transformers. The original transformer architecture (Vaswani et al., 2017) tags each token with its position using sine and cosine waves of different frequencies, so the model can tell word order. Those are exactly the sine and cosine from this lesson, and differentiating through them during training uses exactly these derivatives. Trig derivatives also turn up in 3D rotation (rotation matrices are built from sine and cosine, so differentiable rendering and pose-estimation networks need their derivatives) and in signal processing, where Fourier analysis decomposes a signal into sine and cosine components. Wherever oscillation or rotation enters a model, the point-on-a-circle picture is quietly underneath.

Common pitfalls

Memorizing the pair without the picture. “sine goes to cosine, cosine goes to negative sine” is easy to scramble (which one gets the minus?). The picture fixes it: the minus is on cosine because the horizontal coordinate shrinks as the point climbs counterclockwise. Hold the moving point and you can never get the sign backward.

Forgetting the radian assumption. The clean derivative where sine differentiates to cosine only holds when the input is in radians. In degrees the derivatives pick up a factor of pi over 180. If a calculation gives a stray pi over 180, check whether something slipped into degrees.

Thinking trig derivatives are a separate topic. They are the same nudge-and-look idea as the power rule, applied to a point on a circle instead of a growing square. The method is identical; only the shape changed.

Reading the velocity as the position. The point’s position has components cosine of the input and sine of the input; its velocity has components negative sine of the input and cosine of the input. The derivative is the velocity, not the position. Mixing them up swaps and mis-signs the derivatives.

What you should remember

Sine and cosine are the vertical and horizontal coordinates of a point on the unit circle, and their derivatives are the components of that point’s velocity as it moves around the circle at unit speed.
The velocity is the position rotated a quarter turn counterclockwise: the position with components cosine of the input and sine of the input becomes a velocity with components negative sine of the input and cosine of the input. Reading off the components gives the derivative of sine as cosine and the derivative of cosine as negative sine, with the minus on cosine because its coordinate shrinks as the point climbs.
The picture pays off twice: near zero, sine of the input is roughly the input itself (the derivative at zero is 1), and differentiating sine twice gives negative sine, the equation where a function’s second derivative is its own negative that makes sine and cosine the universal shape of oscillation in physics. Radians are what keep all of this free of stray factors.

Sine and cosine were not a new memorization burden; they were one more thing to read off a picture, this time a point circling at unit speed. The next lesson returns to combining functions, with the product rule and the chain rule: what to do when functions are multiplied together or nested one inside another.