References: Score-based diffusion via SDEs
Primary source
Section titled “Primary source”- Stanford CS236, Deep Generative Models, Lectures 13 and 16 (Stefano Ermon). The primary anchor for the score-based view and the SDE unification. Course page: deepgenerativemodels.github.io. Lecture 13 introduces score matching; Lecture 16 develops the score-based generative modeling framework and the SDE perspective.
- Berkeley CS294-158 Sp24, Deep Unsupervised Learning, Lecture 6 (Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu). Secondary framing covering the same diffusion-paradigm material with a different emphasis. Course page: sites.google.com/view/berkeley-cs294-158-sp24/.
Foundational papers (the math this lesson is built on)
Section titled “Foundational papers (the math this lesson is built on)”- “Reverse-Time Diffusion Equation Models” (Anderson, 1982). The reverse-time SDE result that makes score-based sampling possible. The paper predates diffusion-model usage by four decades; the result became operationally important for generative modeling only in the 2020s.
- “Score-Based Generative Modeling through Stochastic Differential Equations” (Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole, 2021). The synthesis paper that unifies denoising score matching, DDPM, and DDIM under the SDE framework. Section 3 introduces the SDE formulation; section 4 covers the probability flow ODE; section 5 covers the unified training and sampling procedures. This is the lesson’s central reference.
- “Generative Modeling by Estimating Gradients of the Data Distribution” (Song, Ermon, 2019). The NCSN paper that introduced multi-noise-level score matching as a practical generative-modeling framework. The continuous-time limit of NCSN is the variance-exploding SDE.
Diffusion-paradigm references (Phase 3 connection)
Section titled “Diffusion-paradigm references (Phase 3 connection)”- “Denoising Diffusion Probabilistic Models” (Ho, Jain, Abbeel, 2020). The DDPM paper from L12. The variance-preserving SDE is the continuous-time limit of the DDPM Markov chain.
- “Denoising Diffusion Implicit Models” (Song, Meng, Ermon, 2020). The DDIM paper from L13. The lesson 14 framing shows DDIM is approximately a discretization of the probability flow ODE for the variance-preserving SDE.
- “Maximum Likelihood Training of Score-Based Diffusion Models” (Song, Durkan, Murray, Ermon, 2021). Companion to the SDE synthesis paper, developing the maximum-likelihood interpretation and the probability-flow-ODE-based likelihood evaluation in detail.
Further reading (continuous-time samplers and theoretical extensions)
Section titled “Further reading (continuous-time samplers and theoretical extensions)”- “DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps” (Lu, Zhou, Bao, Chen, Li, Zhu, 2022). Higher-order ODE solver for the probability flow ODE. Reaches near-asymptote quality at ten to fifteen steps; faster than DDIM at the same step count.
- “Elucidating the Design Space of Diffusion-Based Generative Models” (Karras, Aittala, Aila, Laine, 2022). A clean reformulation of the SDE-and-ODE framework with practical design guidance. Often cited as the modern reference for designing new diffusion architectures.
- “Consistency Models” (Song, Dhariwal, Chen, Sutskever, 2023). A different framework for few-step sampling that trains a network to be consistent across the probability-flow-ODE trajectory. Single-step sampling becomes possible at quality trade-offs.
Likelihood-evaluation references (the L9 cross-paradigm bridge)
Section titled “Likelihood-evaluation references (the L9 cross-paradigm bridge)”- The likelihood-evaluation procedure derived in this lesson is what allows diffusion models to be compared on the same likelihood-comparison tables as autoregressive and flow models. See Song et al. 2021 (maximum likelihood training) above for the detailed cost analysis.
- “Variational Diffusion Models” (Kingma, Salimans, Poole, Ho, 2021). An ELBO-style maximum-likelihood framework for diffusion that ties the L12 ELBO derivation to the lesson 14 probability-flow-ODE evaluation. Useful if you want the full theory connecting all three Phase 3 framings.
Evaluation toolkit (continued from L9 and carries to L15)
Section titled “Evaluation toolkit (continued from L9 and carries to L15)”- The L9 cross-paradigm fingerprint table classified diffusion as “indirect” for likelihood evaluation. This lesson is where that classification gets refined: the probability flow ODE gives a tractable (though non-free) log-likelihood, putting diffusion on the same likelihood-comparison footing as autoregressive and flow models.
- FID across step counts and CLIP scores carry forward from L9 and L13 as the standard sample-quality and conditioning-fidelity metrics.
Tools and implementations
Section titled “Tools and implementations”- The score-based SDE framework is implemented in the official Song et al. 2021 codebase at github.com/yang-song/score_sde.
- A standalone reference implementation of the probability flow ODE is included in the Diffusers library at github.com/huggingface/diffusers.
- The Karras et al. 2022 reformulation is also available at github.com/NVlabs/edm as the EDM reference implementation.
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Stanford CS236: Deep Generative Models (Stefano Ermon) Course page: https://deepgenerativemodels.github.io/Clawdemy's lessons are original prose that follows the pedagogical arc of thissource. We do not reproduce or transcribe it; we cite it as a recommendedcompanion. All rights to the original material remain with its authors.