References: Diffusion models II, training and sampling

Primary source

Berkeley CS294-158 Sp24, Deep Unsupervised Learning, Lecture 6: Diffusion Models (Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu). The primary anchor for this lesson’s sampler-design material (DDIM, classifier-free guidance, the latency-quality trade-off). Course page: sites.google.com/view/berkeley-cs294-158-sp24/.
Stanford CS236, Deep Generative Models, Lecture 16 (and the diffusion sections of the syllabus) (Stefano Ermon). Secondary framing for the diffusion paradigm overall, with the score-based view that lesson 14 will return to. Course page: deepgenerativemodels.github.io.

Foundational papers (the math this lesson is built on)

“Denoising Diffusion Implicit Models” (Song, Meng, Ermon, 2020). The DDIM paper. Section 4 introduces the non-Markovian reformulation; section 5 derives the deterministic sampling update. The key insight (the same trained noise predictor can be used in a different sampling procedure) is the practical move that made diffusion production-grade.
“Classifier-Free Diffusion Guidance” (Ho, Salimans, NeurIPS 2021 workshop; arXiv 2022). The classifier-free guidance paper. First presented at the NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications; the arXiv preprint posted in July 2022 (hence the arXiv ID 2207.xxxxx). Two pages of math, one of the highest-impact tricks in modern diffusion. The training-time joint conditional/unconditional setup and the inference-time blend are both in section 3.
“Denoising Diffusion Probabilistic Models” (Ho, Jain, Abbeel, 2020). The DDPM paper from L12, included here because L13 builds directly on its noise-prediction framework. The trained network in this lesson is the same network DDPM trained.

Production-grade systems built on the L13 stack

“High-Resolution Image Synthesis with Latent Diffusion Models” (Rombach et al., 2022). Stable Diffusion. Uses DDIM-family sampling and classifier-free guidance at production scale. Section 3 explains the latent-diffusion architecture; section 4 covers the sampling and conditioning machinery this lesson covered.
“GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models” (Nichol et al., 2022). An early text-to-image diffusion system from OpenAI. Compares classifier-free guidance to CLIP guidance head-to-head and finds classifier-free guidance wins on both quality and prompt fidelity.

Evaluation toolkit (carried from L9)

FID (Fréchet Inception Distance) is the standard image-quality metric for diffusion systems. Read across step counts to characterize the latency-quality Pareto frontier.
CLIP score measures text-image alignment for text-conditioned diffusion. Useful alongside FID for prompt-fidelity evaluation.
Sample-quality vs step-count Pareto frontier is the right reporting frame for any new sampler claim. A single number (FID at one step count) is not the right comparison; the full curve is.
Memorization probes detect when a diffusion model reproduces training-image-like content (a relevant question for the §6 watch territory’s IP-and-licensing forum).

Tools and implementations

The original DDIM implementation is at github.com/ermongroup/ddim.
Classifier-free guidance is implemented across nearly every modern diffusion library; the Hugging Face Diffusers library at github.com/huggingface/diffusers is the most widely used.
Stable Diffusion’s open implementation at github.com/CompVis/stable-diffusion is the canonical reference for a production-scale DDIM + classifier-free guidance system.

Source material

Source curriculum (structural mirror, cited as further study):
• UC Berkeley CS294-158: Deep Unsupervised Learning (Spring 2024)
  Course page: https://sites.google.com/view/berkeley-cs294-158-sp24/
  Lecture videos: YouTube (link-out only)
Clawdemy's lessons are original prose that follows the pedagogical arc of this
source. We do not reproduce or transcribe it; we cite it as a recommended
companion. All rights to the original material remain with its authors.

References: Diffusion models II, training and sampling

Primary source

Foundational papers (the math this lesson is built on)

Production-grade systems built on the L13 stack

Further reading (faster samplers, distillation)

Evaluation toolkit (carried from L9)

Tools and implementations

Source material