Skip to content

Lesson: Teaching machines to imagine

Stop and notice something about every network in this track so far. The digit recognizer, the sequence models, the convolutional vision networks, all of them are judges. You hand them something, an image, a sentence, and they hand back a verdict: this is a 3, this is spam, this is a cat. They take the world in and label it.

This lesson turns that arrow around. What if, instead of labeling a cat photo, a network could draw you a cat that never existed? Instead of judging a sentence, write a new one? That is the leap from discriminative models, which judge, to generative models, which create. It is the same neural-network engine you know, pointed in the opposite direction, and it is the foundation of every AI that produces images, audio, or text. Getting the distinction clear is the whole job of this lesson, so let us start there.

The cleanest way to see the difference is to picture a pile of photos, some cats, some dogs.

A discriminative model learns the boundary between the groups. Its only job is to answer “cat or dog?” for a new photo, so it learns whatever line best separates the two piles. It never needs to know what a cat actually looks like in full; it just needs to know what tells a cat apart from a dog. Every classifier you have met is discriminative: given an input, produce a label.

A generative model learns something harder and richer: what the data itself looks like. It studies the pile of cat photos until it has captured the patterns that make a cat a cat, the shapes, the textures, the typical arrangements, well enough that it can produce a brand-new cat photo that was never in the pile. It is not drawing a boundary between groups; it is learning the shape of a group so thoroughly that it can generate new members of it.

That is the spine of this lesson. Discriminative: learn the line between. Generative: learn the thing itself, then make more of it. The rest is just two clever ways to pull off the generative trick.

The VAE: learn a compact space, then sample from it

Section titled “The VAE: learn a compact space, then sample from it”

The first design starts from a simple idea called an autoencoder. Picture a network shaped like an hourglass: it takes an image, squeezes it down through a narrow middle into a short list of numbers, then expands that list back out into an image. It is trained to make the output match the input, so it must learn to pack the essence of an image into that narrow middle, called the latent code, and unpack it again. The narrow middle forces it to keep only what matters.

That is useful for compression, but it does not yet generate anything new; it only rebuilds what it was given. The variational autoencoder, or VAE, adds one crucial twist: it trains that narrow middle to be smooth and well-organized, a tidy “space” of possibilities where every nearby point decodes into a plausible image. Once the space is organized that way, generating becomes easy. You pick a brand-new point in the latent space, one the network never saw, hand it to the unpacking half, and out comes a new, plausible image.

The intuition worth keeping: a VAE learns a compressed map of “all the faces” (or digits, or whatever it trained on), arranged so that the map has no holes. Picking any spot on the map and decoding it gives you a face. You can even slide smoothly from one point to another and watch one face morph into a different one, because the space in between is filled with plausible faces too.

The second design is more theatrical, and it is one of the most elegant ideas in the field. A generative adversarial network, or GAN, sets two networks against each other in a contest.

One network, the generator, tries to produce fake images good enough to pass as real. The other, the discriminator (a plain classifier, the kind you already know), tries to tell the generator’s fakes from genuine images. Think of a counterfeiter and a detective. The counterfeiter makes fake bills; the detective tries to spot them. Each time the detective catches a fake, the counterfeiter learns and improves; each time a fake slips through, the detective sharpens up. Round after round, both get better.

The clever part is that this contest is the training. The generator never sees a real image directly; it learns entirely from whether it managed to fool the discriminator. As the two escalate, the generator is driven to produce images so convincing that even a well-trained detector cannot reliably call them fake, and at that point you have a network that generates realistic new examples on demand.

The VAE and the GAN reach the same place, a network that can produce new examples, by different roads. The VAE learns an organized space and samples from it, tending toward plausible but sometimes slightly blurry results. The GAN learns through a contest and tends toward sharp, convincing results, though the training is famously finicky to balance. They are the two classic answers to the generative problem, and knowing them is enough to understand the idea. There is a newer approach that now powers many of the most impressive image generators, which works by a completely different trick of removing noise step by step, and that one gets its own lesson next.

Generative models are the engine behind the recent wave of AI that makes things: the systems that produce images from a description, that synthesize voices, that write. Knowing the discriminative-versus-generative split clears up a lot. A spam filter or a medical-image classifier is discriminative; it judges. An image generator or a writing assistant is generative; it produces. They are built on the same neural-network foundation but solve fundamentally different problems, and they fail in different ways, a misjudged label versus a confidently produced fabrication. Recognizing which kind of system you are dealing with tells you what to expect from it and how to check its work.

Thinking generative means the model “understands” or “imagines” like a person. “Imagine” is a friendly shorthand. A generative model learned the statistical shape of its training data and samples from it. The results can be striking, but it is producing patterns like the ones it was trained on, not conjuring from understanding.

Confusing the two model types by their output. The giveaway is the job, not the polish. Discriminative models output a judgment about an input; generative models output a new example. A network that says “cat” is discriminative; one that draws a cat is generative.

Thinking a VAE and a GAN are the same thing. They share a goal (generate new data) but not a method. A VAE samples from a learned, organized space; a GAN learns through a generator-versus-discriminator contest. Different mechanisms, different characteristic results.

Thinking the GAN’s generator studies real images directly. It does not. It only ever learns from the discriminator’s verdicts on its fakes. The realism comes entirely from being pushed to fool a detector that keeps getting better.

  • Discriminative models judge; generative models create. A discriminative model learns the boundary between groups (input → label); a generative model learns what the data itself looks like and produces new examples.
  • A VAE learns a compact, organized latent space. Compress data into a short code, train the space to be smooth, then sample a new point and decode it into a new, plausible example.
  • A GAN learns through a contest. A generator tries to fool a discriminator (counterfeiter versus detective); the escalating arms race drives the generator toward convincing fakes.
  • Same engine, opposite direction. Generative models are the familiar neural network pointed at producing data rather than labeling it, and they power the AI that makes images, audio, and text.

Every network before this one was a judge. A generative model is a maker: it learns the shape of its world well enough to produce new pieces of it. That single turn, from labeling to creating, is what put “generative AI” on the map.

Next: the VAE and the GAN were the classic ways to generate, but the systems behind today’s most striking image generators usually use a different idea entirely. They start from pure noise and remove it a little at a time until an image emerges. The next lesson is about generating by denoising, the diffusion model.