References: Diffusion models II, training and sampling
Primary source
Section titled “Primary source”- Berkeley CS294-158 Sp24, Deep Unsupervised Learning, Lecture 6: Diffusion Models (Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu). The primary anchor for this lesson’s sampler-design material (DDIM, classifier-free guidance, the latency-quality trade-off). Course page: sites.google.com/view/berkeley-cs294-158-sp24/.
- Stanford CS236, Deep Generative Models, Lecture 16 (and the diffusion sections of the syllabus) (Stefano Ermon). Secondary framing for the diffusion paradigm overall, with the score-based view that lesson 14 will return to. Course page: deepgenerativemodels.github.io.
Foundational papers (the math this lesson is built on)
Section titled “Foundational papers (the math this lesson is built on)”- “Denoising Diffusion Implicit Models” (Song, Meng, Ermon, 2020). The DDIM paper. Section 4 introduces the non-Markovian reformulation; section 5 derives the deterministic sampling update. The key insight (the same trained noise predictor can be used in a different sampling procedure) is the practical move that made diffusion production-grade.
- “Classifier-Free Diffusion Guidance” (Ho, Salimans, NeurIPS 2021 workshop; arXiv 2022). The classifier-free guidance paper. First presented at the NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications; the arXiv preprint posted in July 2022 (hence the arXiv ID 2207.xxxxx). Two pages of math, one of the highest-impact tricks in modern diffusion. The training-time joint conditional/unconditional setup and the inference-time blend are both in section 3.
- “Denoising Diffusion Probabilistic Models” (Ho, Jain, Abbeel, 2020). The DDPM paper from L12, included here because L13 builds directly on its noise-prediction framework. The trained network in this lesson is the same network DDPM trained.
Production-grade systems built on the L13 stack
Section titled “Production-grade systems built on the L13 stack”- “High-Resolution Image Synthesis with Latent Diffusion Models” (Rombach et al., 2022). Stable Diffusion. Uses DDIM-family sampling and classifier-free guidance at production scale. Section 3 explains the latent-diffusion architecture; section 4 covers the sampling and conditioning machinery this lesson covered.
- “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models” (Nichol et al., 2022). An early text-to-image diffusion system from OpenAI. Compares classifier-free guidance to CLIP guidance head-to-head and finds classifier-free guidance wins on both quality and prompt fidelity.
Further reading (faster samplers, distillation)
Section titled “Further reading (faster samplers, distillation)”- “DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps” (Lu et al., 2022). Second-order ODE solver for the diffusion sampling chain. Reaches near-asymptote quality at ten to fifteen sampling steps, faster than DDIM at the same step count.
- “Consistency Models” (Song et al., 2023). A different framework for few-step sampling. Trains the network to be self-consistent across noise levels, so a single forward pass can produce a sample (with quality trade-offs).
- “LCM-LoRA: A Universal Stable-Diffusion Acceleration Module” (Luo et al., 2023). Distillation applied to Stable Diffusion. Brings inference to four to eight steps with minimal quality loss.
Evaluation toolkit (carried from L9)
Section titled “Evaluation toolkit (carried from L9)”- FID (Fréchet Inception Distance) is the standard image-quality metric for diffusion systems. Read across step counts to characterize the latency-quality Pareto frontier.
- CLIP score measures text-image alignment for text-conditioned diffusion. Useful alongside FID for prompt-fidelity evaluation.
- Sample-quality vs step-count Pareto frontier is the right reporting frame for any new sampler claim. A single number (FID at one step count) is not the right comparison; the full curve is.
- Memorization probes detect when a diffusion model reproduces training-image-like content (a relevant question for the §6 watch territory’s IP-and-licensing forum).
Tools and implementations
Section titled “Tools and implementations”- The original DDIM implementation is at github.com/ermongroup/ddim.
- Classifier-free guidance is implemented across nearly every modern diffusion library; the Hugging Face Diffusers library at github.com/huggingface/diffusers is the most widely used.
- Stable Diffusion’s open implementation at github.com/CompVis/stable-diffusion is the canonical reference for a production-scale DDIM + classifier-free guidance system.
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• UC Berkeley CS294-158: Deep Unsupervised Learning (Spring 2024) Course page: https://sites.google.com/view/berkeley-cs294-158-sp24/ Lecture videos: YouTube (link-out only)Clawdemy's lessons are original prose that follows the pedagogical arc of thissource. We do not reproduce or transcribe it; we cite it as a recommendedcompanion. All rights to the original material remain with its authors.