Skip to content

Diffusion models II, training and sampling

This is the production-grade follow-up to lesson 12. There, you built the DDPM training loop (predict noise from a noised input) and the DDPM sampling loop (run the reverse Markov chain a thousand times). The thousand-step number was the practical blocker that kept diffusion in the research demo bucket through 2020. This lesson covers the two moves (DDIM and classifier-free guidance) that turned diffusion into the dominant image-generation paradigm by 2021 onward. By the end you will be able to write the DDIM update in two lines, explain why DDIM can take twenty-times-larger steps than DDPM without sacrificing quality, write the classifier-free guidance noise-prediction blend, and read the latency-quality Pareto frontier that determines sampling-step counts across every production diffusion system. The source curricula are Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), the primary anchor for the sampler-design material, and Stanford CS236 (Stefano Ermon) for the broader diffusion framing.

This is lesson 13 of 15 and the second of three diffusion lessons. Lesson 12 built DDPM in its original form; this lesson covers what made it production-grade; lesson 14 returns to the L11 score-based view and shows the formal equivalence between DDPM, DDIM, and the score-based framing through the continuous-time stochastic differential equation perspective. The capstone at lesson 15 returns to L1’s four-paradigm map with all the math filled in.

Prerequisites: lesson 12 (DDPM forward and reverse processes), and ideally lessons 10 and 11 (energy-based models and score matching), which set up the score-based view that lesson 14 will pick up. The math here is mostly algebraic manipulation of the closed-form forward shortcut from lesson 12 (no new derivations of the same depth as L11’s denoising-score-matching trick); the conceptual move is to re-use the trained noise predictor in a different inference loop. This lesson is more architecturally focused than the previous ones in Phase 3.

This lesson is lighter on derivation than L11 or L12. The DDIM update is one re-parameterization of the L12 closed-form shortcut, written as a sampler instead of as a training target. The classifier-free guidance formula is a one-line weighted blend. The depth in this lesson is in understanding why these moves work (the deterministic non-Markovian structure for DDIM, the trained both-modes network for classifier-free guidance) rather than in deriving them from scratch. Worked numerical examples cover a three-step DDIM trajectory and a classifier-free guidance blend at three guidance scales.

  • Write the DDIM update step in two lines (predict-the-clean-sample, re-noise-to-target) and explain why it is deterministic and non-Markovian
  • Explain why DDIM can take large steps where DDPM cannot (deterministic non-Markovian structure vs stochastic Markov chain)
  • Write the classifier-free guidance interpolation as a weighted blend of conditional and unconditional noise predictions, and explain the guidance-scale trade-off
  • Describe the latency-quality Pareto frontier and place a thousand-step DDPM, fifty-step DDIM, and ten-step distilled sampler on it
  • Apply the §6 in-body checkpoint pattern (carried from L12) with the six diffusion-specific categories and five-layer scope-test framing
  • Read time: about 14 minutes
  • Practice time: about 16 minutes (a six-question self-check, a hand-walked DDIM three-step trajectory, a classifier-free guidance blend at three guidance scales, and flashcards)
  • Difficulty: standard (production-grade diffusion sampling in a Stage D math-heavy track; less derivation-heavy than L10 / L11 / L12, more architecturally focused)