Skip to content

Normalizing flows, change of variables for distributions

This is lesson 4 of Track 19 (Generative Models and Diffusion), and the close of Phase 1. By the end you will be able to apply the multidimensional change-of-variables formula to an invertible transformation, read the Jacobian determinant as a density rescaling factor (the same Track-4 determinant that scales volumes, now scaling probability density), and write the NLL training objective for a normalizing flow as one log-density and one log-Jacobian-determinant. You will see why flows uniquely combine exact log p_model(x), one-pass sampling, and a flexible model, and what they pay for that combination (every layer must be invertible with a tractable Jacobian). The source curricula are Stanford CS236 (Lecture 7, Normalizing Flows) and Berkeley CS294-158 (Lecture 3, Flow Models).

This is lesson 4 of 15 and closes Phase 1 (generative foundations). It applies the forward-KL = NLL training objective from L3 to a new parameterization (an invertible network instead of a chain-rule factorization), giving the second likelihood-based paradigm (autoregressive was the first). Phase 2 opens with the next lesson, Latent variables and the ELBO, which brings VAEs as the third likelihood-based paradigm: the encoder-decoder structure looks superficially like a flow but relaxes invertibility, paying for that flexibility with a likelihood that becomes a bound (the ELBO) rather than an exact computation. The boundary checkpoint at this lesson is the first synthesis point of the track: two likelihood-based paradigms already trained on the same KL objective (autoregressive and flow), with the third (VAE) about to arrive on the same objective in bound form.

Prerequisites: the previous lesson, Maximum likelihood and the KL view, for the training objective. The math background pays off here from earlier tracks: Track 4 (especially the determinant lesson) for the determinant as volume scaling, and Track 8 (derivatives, the chain rule, multivariable derivatives) for the Jacobian. This lesson is the cleanest place those earlier tracks land in the AI direction; if either feels rusty, a quick revisit of T4 L6 (The determinant) and T8 multivariable derivatives is worth the minutes before reading.

The lesson uses one formula (the multidimensional change-of-variables formula), one composition fact (log-determinants add when you stack invertible layers), and the NLL from L3. The arithmetic stays light: a 1D worked example and a 2D worked example, each with a constant Jacobian determinant so you can compute it by hand. The practice extends these to a 1D affine map and a 2D linear map, with integrals confirming total probability is conserved. The architectural discussion (coupling layers, triangular Jacobians) is prose with reference to the canonical papers in References.

  • Write the multidimensional change-of-variables formula and read the Jacobian determinant as a density rescaling factor (= Track 4’s determinant)
  • Compute the transformed density of a 1D and 2D worked example and verify that total probability is conserved
  • Apply the change-of-variables formula to write the NLL training objective of a normalizing flow and recognize it as the same forward-KL minimization from L3
  • Explain the two architectural constraints (invertibility, tractable Jacobian) and name the standard solutions (coupling layers, autoregressive layers)
  • Compare normalizing flows to autoregressive models and VAEs on exact likelihood, sampling cost, and architectural flexibility
  • Read time: about 12 minutes
  • Practice time: about 16 minutes (a six-question self-check, a 1D change-of-variables exercise, a 2D change-of-variables exercise with the Jacobian = matrix determinant, and flashcards)
  • Difficulty: standard (a Phase 1 lesson; the math is one formula and one composition fact, with the Track-4 determinant doing the heavy lifting)