References: Normalizing flows, change of variables for distributions
Source material
Section titled “Source material”Source curricula (multi-source structural mirror; cited as further study):
PRIMARY (this lesson follows its framing most directly)• Stanford CS236, "Deep Generative Models", Lecture 7: Normalizing Flows Instructor: Stefano Ermon Course URL: https://deepgenerativemodels.github.io/ Syllabus: https://deepgenerativemodels.github.io/syllabus.html License: standard course-page link-out; cited as further study
SECONDARY (also contributed to this lesson's framing)• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 3: Flow Models Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/ License: standard course-page link-out; cited as further study
Clawdemy's lessons are original prose that follows the pedagogical arc of thesetwo courses, anchored on CS236's lecture order with CS294-158 framing pulled inwhere its slide deck and recording are stronger. We do not reproduce ortranscribe the lectures; we cite them as the recommended companions. All rightsto the original course materials remain with the respective instructors andinstitutions.Watch this next
Section titled “Watch this next”-
Stanford CS236 (Stefano Ermon), course homepage. Lecture 7 (Normalizing Flows) is the primary anchor. The course notes at deepgenerativemodels.github.io/notes include a written treatment of the change-of-variables derivation with more careful Jacobian bookkeeping than the slides; useful if a step in this lesson felt too fast.
-
Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), course homepage. Lecture 3 (Flow Models) is the secondary anchor. Its slide deck is especially clear on coupling layers (the RealNVP block) and on the trade-off between architectural expressiveness and Jacobian tractability.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
“NICE: Non-linear Independent Components Estimation” (Dinh, Krueger, Yoshua Bengio, 2014). The paper that introduced the coupling-layer architecture flows use. The construction is so clean it is worth reading even if you skip the experiments; equations 2-6 are the canonical statement of the coupling-layer flow.
-
“Density estimation using Real NVP” (Dinh, Sohl-Dickstein, Samy Bengio, 2017). The follow-up that generalized NICE into the architecture most modern flows still use. Affine coupling layers with
sandtscale-and-translate networks; the paper’s experiments on natural images are the canonical demonstration of flow viability for density estimation. -
“Masked Autoregressive Flow for Density Estimation” (Papamakarios, Pavlakou, Murray, 2017). MAF, the autoregressive variant of flows. Read after RealNVP for the comparison: MAF has fast density evaluation but slow sampling; RealNVP has fast both. The trade-off matrix in this paper is the cleanest summary of which flow you want for which purpose.
-
“Glow: Generative Flow with Invertible 1x1 Convolutions” (Kingma, Dhariwal, 2018). The flow architecture that produced the most striking image-generation results in the flow-only paradigm. The invertible 1x1 convolution trick (treating channel-mixing as an invertible linear map with a tractable LU-decomposed determinant) is a standard tool now.
Adjacent topics
Section titled “Adjacent topics”Where this sits in the track.
-
Maximum likelihood and the KL view (previous lesson). Flows train on the same forward-KL = empirical NLL objective derived in L3. The difference is the parameterization: flows give exact
log p_model(x)directly through the change-of-variables formula, where autoregressive models get it via the chain-rule factorization. Same objective, different math. -
Latent variables and the ELBO (next lesson, opening Phase 2). Phase 2 opens with VAEs, where the architectural picture is encoder + decoder, similar to flows in spirit (encode
xto a latent, decode back). The difference: VAEs do not require invertibility, so they can be much more flexible architecturally, but they pay by losing exact likelihood (the ELBO is a lower bound). Reading flows and then VAEs side by side makes this trade-off concrete. -
Track 4 (Visual Math: Linear Algebra) and Track 8 (Visual Math: Calculus). The Jacobian determinant in this lesson is the T4 determinant lesson exactly, lifted from area to probability density. The 1D change-of-variables formula is the T8 chain-rule derivative, lifted to a density rescaling. T19 L4 is where the two earlier tracks pay off most directly in the AI direction.