References: Normalizing flows, change of variables for distributions

Source material

Source curricula (multi-source structural mirror; cited as further study):

PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 7: Normalizing Flows
  Instructor: Stefano Ermon
  Course URL: https://deepgenerativemodels.github.io/
  Syllabus: https://deepgenerativemodels.github.io/syllabus.html
  License: standard course-page link-out; cited as further study

SECONDARY (also contributed to this lesson's framing)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 3: Flow Models
  Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
  Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
  License: standard course-page link-out; cited as further study

Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

Watch this next

Stanford CS236 (Stefano Ermon), course homepage. Lecture 7 (Normalizing Flows) is the primary anchor. The course notes at deepgenerativemodels.github.io/notes include a written treatment of the change-of-variables derivation with more careful Jacobian bookkeeping than the slides; useful if a step in this lesson felt too fast.
Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), course homepage. Lecture 3 (Flow Models) is the secondary anchor. Its slide deck is especially clear on coupling layers (the RealNVP block) and on the trade-off between architectural expressiveness and Jacobian tractability.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

“NICE: Non-linear Independent Components Estimation” (Dinh, Krueger, Yoshua Bengio, 2014). The paper that introduced the coupling-layer architecture flows use. The construction is so clean it is worth reading even if you skip the experiments; equations 2-6 are the canonical statement of the coupling-layer flow.
“Density estimation using Real NVP” (Dinh, Sohl-Dickstein, Samy Bengio, 2017). The follow-up that generalized NICE into the architecture most modern flows still use. Affine coupling layers with s and t scale-and-translate networks; the paper’s experiments on natural images are the canonical demonstration of flow viability for density estimation.
“Masked Autoregressive Flow for Density Estimation” (Papamakarios, Pavlakou, Murray, 2017). MAF, the autoregressive variant of flows. Read after RealNVP for the comparison: MAF has fast density evaluation but slow sampling; RealNVP has fast both. The trade-off matrix in this paper is the cleanest summary of which flow you want for which purpose.
“Glow: Generative Flow with Invertible 1x1 Convolutions” (Kingma, Dhariwal, 2018). The flow architecture that produced the most striking image-generation results in the flow-only paradigm. The invertible 1x1 convolution trick (treating channel-mixing as an invertible linear map with a tractable LU-decomposed determinant) is a standard tool now.

Adjacent topics

Where this sits in the track.

Maximum likelihood and the KL view (previous lesson). Flows train on the same forward-KL = empirical NLL objective derived in L3. The difference is the parameterization: flows give exact log p_model(x) directly through the change-of-variables formula, where autoregressive models get it via the chain-rule factorization. Same objective, different math.
Latent variables and the ELBO (next lesson, opening Phase 2). Phase 2 opens with VAEs, where the architectural picture is encoder + decoder, similar to flows in spirit (encode x to a latent, decode back). The difference: VAEs do not require invertibility, so they can be much more flexible architecturally, but they pay by losing exact likelihood (the ELBO is a lower bound). Reading flows and then VAEs side by side makes this trade-off concrete.
Track 4 (Visual Math: Linear Algebra) and Track 8 (Visual Math: Calculus). The Jacobian determinant in this lesson is the T4 determinant lesson exactly, lifted from area to probability density. The 1D change-of-variables formula is the T8 chain-rule derivative, lifted to a density rescaling. T19 L4 is where the two earlier tracks pay off most directly in the AI direction.