Normalizing flows: brief

What you’ll learn

This is lesson 4 of Track 19 (Generative Models and Diffusion), and the close of Phase 1. By the end you will be able to apply the multidimensional change-of-variables formula to an invertible transformation, read the Jacobian determinant as a density rescaling factor (the same Track-4 determinant that scales volumes, now scaling probability density), and write the NLL training objective for a normalizing flow as one log-density and one log-Jacobian-determinant. You will see why flows uniquely combine exact log p_model(x), one-pass sampling, and a flexible model, and what they pay for that combination (every layer must be invertible with a tractable Jacobian). The source curricula are Stanford CS236 (Lecture 7, Normalizing Flows) and Berkeley CS294-158 (Lecture 3, Flow Models).

Where this fits

This is lesson 4 of 15 and closes Phase 1 (generative foundations). It applies the forward-KL = NLL training objective from L3 to a new parameterization (an invertible network instead of a chain-rule factorization), giving the second likelihood-based paradigm (autoregressive was the first). Phase 2 opens with the next lesson, Latent variables and the ELBO, which brings VAEs as the third likelihood-based paradigm: the encoder-decoder structure looks superficially like a flow but relaxes invertibility, paying for that flexibility with a likelihood that becomes a bound (the ELBO) rather than an exact computation. The boundary checkpoint at this lesson is the first synthesis point of the track: two likelihood-based paradigms already trained on the same KL objective (autoregressive and flow), with the third (VAE) about to arrive on the same objective in bound form.

Before you start

Prerequisites: the previous lesson, Maximum likelihood and the KL view, for the training objective. The math background pays off here from earlier tracks: Track 4 (especially the determinant lesson) for the determinant as volume scaling, and Track 8 (derivatives, the chain rule, multivariable derivatives) for the Jacobian. This lesson is the cleanest place those earlier tracks land in the AI direction; if either feels rusty, a quick revisit of T4 L6 (The determinant) and T8 multivariable derivatives is worth the minutes before reading.

About the math

The lesson uses one formula (the multidimensional change-of-variables formula), one composition fact (log-determinants add when you stack invertible layers), and the NLL from L3. The arithmetic stays light: a 1D worked example and a 2D worked example, each with a constant Jacobian determinant so you can compute it by hand. The practice extends these to a 1D affine map and a 2D linear map, with integrals confirming total probability is conserved. The architectural discussion (coupling layers, triangular Jacobians) is prose with reference to the canonical papers in References.

By the end, you’ll be able to

Write the multidimensional change-of-variables formula and read the Jacobian determinant as a density rescaling factor (= Track 4’s determinant)
Compute the transformed density of a 1D and 2D worked example and verify that total probability is conserved
Apply the change-of-variables formula to write the NLL training objective of a normalizing flow and recognize it as the same forward-KL minimization from L3
Explain the two architectural constraints (invertibility, tractable Jacobian) and name the standard solutions (coupling layers, autoregressive layers)
Compare normalizing flows to autoregressive models and VAEs on exact likelihood, sampling cost, and architectural flexibility

Time and difficulty

Read time: about 12 minutes
Practice time: about 16 minutes (a six-question self-check, a 1D change-of-variables exercise, a 2D change-of-variables exercise with the Jacobian = matrix determinant, and flashcards)
Difficulty: standard (a Phase 1 lesson; the math is one formula and one composition fact, with the Track-4 determinant doing the heavy lifting)