Skip to content

References: Normalizing flows, change of variables for distributions

Source curricula (multi-source structural mirror; cited as further study):
PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 7: Normalizing Flows
Instructor: Stefano Ermon
Course URL: https://deepgenerativemodels.github.io/
Syllabus: https://deepgenerativemodels.github.io/syllabus.html
License: standard course-page link-out; cited as further study
SECONDARY (also contributed to this lesson's framing)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 3: Flow Models
Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
License: standard course-page link-out; cited as further study
Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this sits in the track.

  • Maximum likelihood and the KL view (previous lesson). Flows train on the same forward-KL = empirical NLL objective derived in L3. The difference is the parameterization: flows give exact log p_model(x) directly through the change-of-variables formula, where autoregressive models get it via the chain-rule factorization. Same objective, different math.

  • Latent variables and the ELBO (next lesson, opening Phase 2). Phase 2 opens with VAEs, where the architectural picture is encoder + decoder, similar to flows in spirit (encode x to a latent, decode back). The difference: VAEs do not require invertibility, so they can be much more flexible architecturally, but they pay by losing exact likelihood (the ELBO is a lower bound). Reading flows and then VAEs side by side makes this trade-off concrete.

  • Track 4 (Visual Math: Linear Algebra) and Track 8 (Visual Math: Calculus). The Jacobian determinant in this lesson is the T4 determinant lesson exactly, lifted from area to probability density. The 1D change-of-variables formula is the T8 chain-rule derivative, lifted to a density rescaling. T19 L4 is where the two earlier tracks pay off most directly in the AI direction.