Summary: Score-based diffusion via SDEs, the unifying view

The previous three lessons derived the same trained network and the same noise-prediction loss from three different starting points. This lesson made the equivalence formal via continuous-time stochastic differential equations.

What this lesson did

Wrote the forward SDE that the DDPM Markov chain is a discretization of (variance-preserving form for DDPM; variance-exploding form for the NCSN framework from L11).
Wrote the reverse SDE that runs from standard noise back to the data distribution; the drift involves the score function of the noised distribution at every time. The Anderson reverse-time SDE result is what makes this work.
Connected the noise predictor and the score function by a fixed scalar: the score equals the negative noise prediction divided by the cumulative noise standard deviation. L11 and L12 produce the same network with the same loss, viewed through two equivalent mathematical lenses.
Derived the probability flow ODE (a deterministic ODE whose marginals match the reverse SDE), which gives both a deterministic sampler (the L13 DDIM mechanism formally) and a tractable likelihood evaluation (the answer to L9’s “indirect” entry for diffusion).
Placed L11, L12, L13, L14 on one map: one continuous-time framework, three discretizations, three samplers, one trained network.

What to remember in three lines

The forward process is an SDE. The DDPM Markov chain is the discretization of the variance-preserving form; the L11 NCSN framework is the variance-exploding form. The reverse process is also an SDE, whose drift involves the score function of the noised distribution.
The noise predictor and the score function are the same vector up to a fixed scalar. The score equals the negative noise prediction divided by the cumulative noise standard deviation. L11 and L12 train the same network with the same loss; the framing differs.
The probability flow ODE is the deterministic sampler with tractable likelihood. Integrating the ODE backward in time produces a deterministic sample (L13 DDIM is approximately this). Integrating it forward and tracking the log-determinant of the Jacobian gives a tractable log-likelihood evaluation, the answer to L9’s “indirect” entry for diffusion.

Where this is going

The capstone (lesson 15) returns to L1’s four-paradigm map with all the math filled in. Autoregressive language models, GAN-based face generators, latent-diffusion image generators, and modern hybrid systems each get placed explicitly. The map you opened the track with becomes the map you close it with, this time with every paradigm’s training objective, sampling procedure, and trade-off characterized in full.