Skip to content

Teaching machines to imagine

This is lesson 6 of Track 12 (Introduction to Deep Learning) and the first of Phase 2’s two generative lessons. Every network in the track so far has been a judge: you hand it an image or a sentence and it hands back a verdict. This lesson turns that arrow around. Generative models learn what the data itself looks like, so they can produce brand-new examples rather than label existing ones.

The lesson draws the line between discriminative models (which judge) and generative models (which create), then builds the intuition for the two classic generative designs: the variational autoencoder (the VAE), which learns a smooth, organized latent space you can sample from, and the generative adversarial network (the GAN), which learns through a contest between a generator and a discriminator.

This is the opener of Phase 2’s generative pair. The previous lesson, From edges to objects, closed the discriminative-vision arc (networks that judge images). This lesson turns to networks that produce them, so the two make a natural before-and-after pair. The next lesson, Generating by denoising: diffusion, covers the approach behind many of today’s most striking image generators, which reaches the same goal by a different route.

Prerequisites: the earlier Track 12 lessons, especially the vision lessons that built up the idea of a network as a classifier. You need to be comfortable with the notion of a neural network that takes an input and produces an output; everything specific to generation is explained inline. No math beyond that, and no prior exposure to VAEs or GANs is assumed.

  • Distinguish discriminative models (which judge) from generative models (which create) by the job each one does
  • Explain the intuition behind a variational autoencoder as a compact, organized latent space you can sample from
  • Explain the intuition behind a generative adversarial network as a generator-versus-discriminator contest
  • Recognize which kind of model an everyday AI system is, and what failure mode to expect from each
  • Read time: about 7 minutes
  • Practice time: about 15 minutes (a sorting exercise and flashcards)
  • Difficulty: intro