Skip to content

Diffusion models I, the forward and reverse processes

This is lesson 12 of Track 19 (Generative Models and Diffusion), the third lesson of Phase 3 (energy-score-diffusion), and the first of three on the diffusion paradigm. By the end you will be able to write the forward Markov chain that progressively noises data, derive and use the closed-form shortcut that lets you sample any timestep in one operation, write the reverse process parameterized as a noise predictor, write the simplified DDPM training loss and recognize it as the L11 denoising-score-matching loss at the timestep’s noise level, and walk the training and sampling loops step by step. You will also see the §6 in-body checkpoint pattern applied with the five-layer framing the track has been building, with the six diffusion-specific categories (use-case, provenance, sector-specific incl. medical imaging, training-data IP, likeness/consent, prompt-injection risks). The source curricula are Stanford CS236 Lecture 16 and Berkeley CS294-158 Lecture 6.

This is lesson 12 of 15, the third lesson of Phase 3, and the first of three diffusion lessons. The previous lesson (L11) derived the denoising-score-matching loss from a score-based perspective; this lesson derives the same loss from the DDPM Markov-chain perspective. The next lesson, Diffusion models II, covers the practical speed-ups (DDIM, classifier-free guidance) that made diffusion deployable at scale. Lesson 14 closes Phase 3 by returning to the score-based view and making the equivalence explicit via the SDE (stochastic differential equation) perspective. Lesson 15 is the synthesis capstone returning to the L1 four-paradigm map with the diffusion paradigm fully unpacked.

Prerequisites: the previous lesson, Score matching and score-based generation, for the noise-prediction equivalence and the multi-noise-level extension that diffusion formalizes. Also lesson 5 (Latent variables and the ELBO) for the ELBO derivation framework that diffusion’s training objective rests on. Math background: comfort with multivariate Gaussians, expectations, the chain rule of conditional probability, and one parameterization-equivalence step (rewriting the reverse mean as a function of a noise predictor).

This lesson is denser than L10 and L11 because it derives the full DDPM framework: forward chain, closed-form shortcut (a Gaussian-product algebraic step), reverse parameterization, ELBO simplification to the noise-prediction MSE, and the training and sampling loops. The simplified loss is the punchline; getting there requires several algebraic steps that the lesson walks but does not fully derive (the full derivation is in the Ho et al. 2020 paper referenced in References). Worked numerical examples cover the closed-form forward sampling at two β-schedules and a single DDPM training step end-to-end.

  • Write the forward Markov-chain step q(x_t | x_{t-1}) with the β-schedule and explain why the forward process is fixed (no learnable parameters)
  • Derive the closed-form forward shortcut x_t = sqrt(ᾱ_t)·x_0 + sqrt(1−ᾱ_t)·ε and explain why it is the computational hinge of diffusion training
  • Write the reverse process parameterization p_θ(x_{t-1} | x_t) with the DDPM noise-predictor reparameterization μ_θ(x_t, t) in terms of ε_θ
  • Write the simplified DDPM training loss L_simple = ||ε − ε_θ(x_t, t)||² and recognize it as denoising score matching at noise level sqrt(1 − ᾱ_t)
  • Walk the training and sampling loops in pseudocode, and apply the §6 in-body checkpoint pattern with the six diffusion-specific categories and five-layer scope-test framing
  • Read time: about 16 minutes
  • Practice time: about 18 minutes (a six-question self-check, closed-form forward sampling computations across two β-schedules and three timesteps each, a hand-walked DDPM training step end to end, and flashcards)
  • Difficulty: standard, on the harder end of Phase 3 (the densest derivation in the track so far; the simplification from full ELBO to noise-prediction MSE is a chain of algebraic steps that the lesson walks but does not fully derive, with reference to the DDPM paper for the missing algebra)