Summary: Why e is special
Everyone knows e ≈ 2.71828, and almost nobody knows why that particular number earns its own letter. The answer is not its digits, it is a behavior: e is the one base for which the exponential is its own derivative, d/dx(e^x) = e^x. That single property is why e shows up everywhere, from compound interest and radioactive decay to softmax and the sigmoid. This is the scan-it-in-five-minutes version.
Core ideas
Section titled “Core ideas”eis defined by behavior, not digits. It is the unique base whered/dx(e^x) = e^x(rate equals value at every point). The decimal is a consequence of that, not the definition.- The derivative of any exponential is
M(a)·a^x. Factoringa^(x+h) = a^x·a^hpullsa^xout of the limit, leaving a base-dependent constantM(a) = ln a. The shape is always a copy of the function; the base only sets the multiplier out front. - The multiplier crosses 1 between bases 2 and 3.
M(2) ≈ 0.693(below 1) andM(3) ≈ 1.099(above 1), so it passes through exactly 1 at some base in between. That base ise ≈ 2.71828, and thered/dx(e^x) = 1·e^x = e^x. - In numbers. The slope of
e^xatx = 0is about 1 (ande^0 = 1); atx = 1it is about 2.718 (ande^1 ≈ 2.718). Base 2 fails this: its slope at 0 is≈ 0.693, not 1. - With the chain rule,
d/dx(e^(kx)) = k·e^(kx). This is the solution tof'(x) = k·f(x), “rate proportional to current value,” the equation behind compound interest, population growth, radioactive decay (k < 0), and circuits.eis the fingerprint of self-proportional change. - A machine-learning gift. The sigmoid
σ(x) = 1/(1 + e^(-x))has derivativeσ(x)·(1 - σ(x)), which falls straight out of the chain rule one^(-x), so its derivative is almost free onceσ(x)is computed.
What changes for you
Section titled “What changes for you”e stops being a magic decimal to memorize and becomes the answer to a precise question: which base makes the exponential its own derivative? Once you see that, the constant’s ubiquity makes sense, because “rate proportional to current value” is one of the most common patterns in nature, and e^(kt) is its natural shape. The same property threads through machine learning: softmax (e^(x_i) normalized) is the output of essentially every classifier, including next-token prediction in language models; the sigmoid activation is built from e^(-x) and has a clean, cheap derivative; and continuous-time models like neural ODEs and diffusion solve equations whose solutions are exponentials. Almost anywhere a model expresses a probability or a smooth proportional change, e is underneath. The next lesson turns to implicit differentiation, applying these rules to relationships not neatly solved for one variable.