Cheatsheet: Why e is special
The defining property
Section titled “The defining property”d/dx( e^x ) = e^xe is the unique base for which the exponential is its own derivative. e ≈ 2.71828 is a consequence of this, not the definition.
The derivative of any exponential
Section titled “The derivative of any exponential”d/dx( a^x ) = M(a) · a^x M(a) = ln(a), a constant set by the baseThe derivative is always the function itself times a base-dependent multiplier:
base a | multiplier M(a) |
|---|---|
| 2 | ≈ 0.693 (< 1) |
e | exactly 1 |
| 3 | ≈ 1.099 (> 1) |
The multiplier crosses 1 between bases 2 and 3; that crossing point is e.
See it in numbers
Section titled “See it in numbers”Slope of e^x (via (e^h - 1)/h, small h):
- at
x = 0: slope ≈ 1, ande^0 = 1(match) - at
x = 1: slope ≈ 2.718, ande^1 ≈ 2.718(match)
Base 2 at x = 0: slope ≈ 0.693, but 2^0 = 1 (no match). Only e is self-derivative.
With the chain rule
Section titled “With the chain rule”d/dx( e^(kx) ) = k · e^(kx) (e.g. d/dx e^(3x) = 3 e^(3x))d/dx( e^(x^2) ) = 2x · e^(x^2)e^(kx) is the solution to f'(x) = k · f(x) (“rate proportional to current value”).
Why e is everywhere
Section titled “Why e is everywhere”Any process where rate is proportional to current value follows e^(kt):
compound interest, population growth, radioactive decay (k < 0), capacitor charge/discharge.
Why it matters for AI
Section titled “Why it matters for AI”- Softmax:
e^(x_i) / sum(e^(x_j)), the output of essentially every classifier (incl. next-token prediction). - Sigmoid:
1/(1 + e^(-x)); its derivativeσ(x)(1-σ(x))falls out of the chain rule one^(-x). - Continuous-time models (neural ODEs, diffusion) solve
f' = (something)·f, whose solutions are exponentials.
Pitfalls to dodge
Section titled “Pitfalls to dodge”eis its digits. No, it is defined by the self-derivative property; the decimal is a consequence.- Treating
e^xlike a powerx^n. Variable in the exponent (self-derivative), not the base (power rule). Different rules. - Forgetting the
kone^(kx). Derivative isk·e^(kx), note^(kx). - Assuming every exponential self-derives. Only base
e; base 2 carries a factorln 2 ≈ 0.693.
The one-line version
Section titled “The one-line version”e is the base that makes the exponential its own derivative, which is why e^(kt) is the natural shape of anything whose rate is proportional to its current value, from compound interest to softmax.