Skip to content

References: Diffusion models I, the forward and reverse processes

Source curricula (multi-source structural mirror; cited as further study):
PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 16: Score Based Diffusion Models
Instructor: Stefano Ermon
Course URL: https://deepgenerativemodels.github.io/
Syllabus: https://deepgenerativemodels.github.io/syllabus.html
License: standard course-page link-out; cited as further study
SECONDARY (CS294-158 has a dedicated diffusion lecture)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 6: Diffusion Models
Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
License: standard course-page link-out; cited as further study
Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this sits in the track.

  • Score matching and score-based generation (previous lesson). L11 derived the denoising-score-matching objective; this lesson derives the same loss from the Markov-chain perspective. The L11 perspective treats noise as a continuous parameter σ; this lesson treats it as a discrete timestep t with cumulative noise sqrt(1 − ᾱ_t). The conceptual identification (noise level ↔ timestep) is what makes the two derivations equivalent.

  • Diffusion models II, training and sampling (next lesson, L13). L13 covers the practical sampling-speed optimizations that turned diffusion from theoretically clean to production-grade: DDIM (reducing T from 1000 to ~50 deterministic steps), classifier-free guidance (text-conditioning with adjustable strength), and the diffusion-specific aspects of inference cost. L13 will also cover the §6 watch with the same five-layer pattern applied at the conditioning level.

  • Score-based diffusion via SDEs, the unifying view (L14). L14 returns to the score-matching view from L11 and makes the equivalence with the DDPM Markov-chain view explicit via the stochastic differential equation perspective. Both DDPM and continuous-time score-based models are discretizations of the same underlying SDE.

  • Latent variables and the ELBO (lesson 5). Diffusion’s training objective is derived as an ELBO over the chain of latents x_1, ..., x_T, generalizing the single-latent VAE ELBO of lesson 5. The L5 machinery is the workhorse here; the diffusion-specific simplification is the noise-prediction reparameterization that collapses the chain of KL terms into a single MSE loss.