Generating by denoising: diffusion
What you’ll learn
Section titled “What you’ll learn”This is lesson 7 of Track 12 (Introduction to Deep Learning) and the second of Phase 2’s two generative lessons. The previous lesson covered the two classic generative designs, the VAE and the GAN. This one covers the idea behind most of today’s most striking image generators, and it sounds almost too strange to work: start from a screen of pure random static and remove the noise a little at a time until a clear image rises out of it.
The lesson builds the intuition for diffusion models: why deliberately wrecking an image teaches a network to create, how repeated denoising from pure static summons a brand-new image, and how a text prompt steers each step toward the picture you asked for.
Where this fits
Section titled “Where this fits”This closes Phase 2’s generative pair. The previous lesson, Teaching machines to imagine, gave you the VAE and the GAN; this lesson adds diffusion and contrasts all three by how they generate (one shot versus gradual). It is also the last “what networks can do” lesson before the track turns, in Phase 3, to a different kind of learning (reinforcement learning) and then to the field’s limitations.
Before you start
Section titled “Before you start”Prerequisites: the previous lesson, Teaching machines to imagine, since this lesson contrasts diffusion against the VAE and the GAN. You need to be comfortable with the idea of a generative model producing new examples; the diffusion mechanism itself is built from scratch here. The text-to-image section mentions representing text as vectors (covered in depth in the transformers track), but the load-bearing point, that the prompt steers the denoising, is explained inline.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Explain the forward noising process and why it needs no learning
- Explain why training a network to remove one step of noise gives it perfect targets to learn from
- Describe how repeated denoising from pure static produces a brand-new image
- Explain why diffusion generates gradually, and how that trades speed for quality and variety
- Explain how a text prompt steers the denoising toward a matching picture
Time and difficulty
Section titled “Time and difficulty”- Read time: about 8 minutes
- Practice time: about 15 minutes (an ordering-and-prediction exercise and flashcards)
- Difficulty: intro