Skip to content

Cheatsheet: Why e is special

d/dx( e^x ) = e^x

e is the unique base for which the exponential is its own derivative. e ≈ 2.71828 is a consequence of this, not the definition.

d/dx( a^x ) = M(a) · a^x M(a) = ln(a), a constant set by the base

The derivative is always the function itself times a base-dependent multiplier:

base amultiplier M(a)
2≈ 0.693 (< 1)
eexactly 1
3≈ 1.099 (> 1)

The multiplier crosses 1 between bases 2 and 3; that crossing point is e.

Slope of e^x (via (e^h - 1)/h, small h):

  • at x = 0: slope ≈ 1, and e^0 = 1 (match)
  • at x = 1: slope ≈ 2.718, and e^1 ≈ 2.718 (match)

Base 2 at x = 0: slope ≈ 0.693, but 2^0 = 1 (no match). Only e is self-derivative.

d/dx( e^(kx) ) = k · e^(kx) (e.g. d/dx e^(3x) = 3 e^(3x))
d/dx( e^(x^2) ) = 2x · e^(x^2)

e^(kx) is the solution to f'(x) = k · f(x) (“rate proportional to current value”).

Any process where rate is proportional to current value follows e^(kt): compound interest, population growth, radioactive decay (k < 0), capacitor charge/discharge.

  • Softmax: e^(x_i) / sum(e^(x_j)), the output of essentially every classifier (incl. next-token prediction).
  • Sigmoid: 1/(1 + e^(-x)); its derivative σ(x)(1-σ(x)) falls out of the chain rule on e^(-x).
  • Continuous-time models (neural ODEs, diffusion) solve f' = (something)·f, whose solutions are exponentials.
  • e is its digits. No, it is defined by the self-derivative property; the decimal is a consequence.
  • Treating e^x like a power x^n. Variable in the exponent (self-derivative), not the base (power rule). Different rules.
  • Forgetting the k on e^(kx). Derivative is k·e^(kx), not e^(kx).
  • Assuming every exponential self-derives. Only base e; base 2 carries a factor ln 2 ≈ 0.693.

e is the base that makes the exponential its own derivative, which is why e^(kt) is the natural shape of anything whose rate is proportional to its current value, from compound interest to softmax.