Skip to content

VAE training in practice, the reparameterization trick

This is lesson 6 of Track 19 (Generative Models and Diffusion), and it takes the abstract ELBO from the previous lesson to a concrete trainable architecture: the variational autoencoder. By the end you will be able to describe the VAE’s three pieces (fixed standard Gaussian prior, Gaussian encoder, decoder), state and apply the reparameterization trick that makes the ELBO trainable by SGD, use the closed-form Gaussian KL for the regularizer term, and walk a single VAE training step end to end on a tiny worked example. You will also see where VAEs sit in the modern generative landscape, especially as the perceptual-compression front-end in latent diffusion systems like Stable Diffusion. The source curricula are Stanford CS236 (Lecture 6) and Berkeley CS294-158 (Lecture 4).

This is lesson 6 of 15, the second step of Phase 2 (latent-variable and adversarial paradigms). It builds directly on L5’s ELBO derivation by adding the architectural and training machinery that turns an abstract bound into a real, trainable model. The next lesson, GANs, the minimax game, opens the adversarial paradigm and drops the likelihood objective entirely; the contrast with VAEs (which keep a principled but bounded likelihood objective) sharpens what each paradigm gives up. The reparameterization trick built here recurs in Phase 3, especially in the diffusion training derivation (lesson 14).

Prerequisites: the previous lesson, Latent variables and the ELBO, for the bound itself. The math background reuses everything from Phase 1: comfort with expectations, KL divergence, the chain rule of gradients. One new technical idea (the reparameterization trick) is introduced from scratch with a clean derivation. The closed-form Gaussian KL is given as a formula; deriving it from the definition is a useful exercise but not required for the lesson.

This lesson uses two new tools: the reparameterization trick (one move, applied throughout) and the closed-form Gaussian KL formula (a known result, applied to compute the regularizer cheaply). The arithmetic stays small: four KL computations on different (μ, σ) settings, and one end-to-end VAE training-step walkthrough on a 1D example with a single noise sample. Compared to the L5 derivation density, this lesson is more concrete: less algebraic derivation, more architectural and computational walkthrough.

  • Describe the VAE architecture (Gaussian encoder + decoder + fixed standard Gaussian prior) and explain why log σ² is predicted rather than σ² directly
  • State the reparameterization trick (z = μ + σ·ε with ε ~ N(0, I)) and explain why it makes the ELBO differentiable in the encoder parameters
  • Apply the closed-form Gaussian KL formula 0.5·(σ² + μ² − 1 − log σ²) to compute the regularizer
  • Write the full per-example VAE training loss (Monte Carlo reconstruction + closed-form KL) and walk through one training step end to end
  • Place the VAE in the modern generative landscape, including its role as the compression front-end in latent diffusion systems like Stable Diffusion
  • Read time: about 14 minutes
  • Practice time: about 16 minutes (a six-question self-check, four closed-form Gaussian KL computations, a six-step VAE training-step walkthrough, and flashcards)
  • Difficulty: standard (a Phase 2 lesson; the reparameterization trick is one new technical idea, the KL formula is a known closed form, and the worked example is a single deterministic walk)