References: What a generative model is, and the four-paradigm map

Source material

Source curricula (multi-source structural mirror; cited as further study):

PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 1: Introduction
  Instructor: Stefano Ermon
  Course URL: https://deepgenerativemodels.github.io/
  Syllabus: https://deepgenerativemodels.github.io/syllabus.html
  License: standard course-page link-out; cited as further study

SECONDARY (also contributed to this lesson's framing)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 1: Introduction
  Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
  Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
  License: standard course-page link-out; cited as further study

Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

Watch this next

Stanford CS236 (Stefano Ermon), course homepage. The primary source for this track. The course homepage links the syllabus, lecture videos, and the course notes; Lecture 1 (Introduction) covers the four-paradigm framing this lesson mirrors. The course notes at deepgenerativemodels.github.io/notes are a written companion that is especially useful when you want a paragraph-level treatment of a concept that flew past in the lecture.
Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), course homepage. The secondary source. Lecture 1’s intro slide deck and recording give a complementary placement of the same paradigms, and the rest of the lecture list (autoregressive in L2, flows in L3, latent variable / VAEs in L4, GANs in L5, diffusion in L6) is the cleanest one-lecture-per-paradigm sequence anywhere.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

“What are Diffusion Models?” by Lilian Weng (OpenAI). A long, careful blog post that walks the full math of diffusion models from the noising process through the reverse-time SDE. Best read after lesson 12 of this track, but a one-pager skim now is a useful preview of the diffusion paradigm.
Canonical paper per paradigm. Each paradigm has a single paper that almost every later work cites: PixelRNN by van den Oord et al. 2016 (autoregressive), Kingma and Welling 2013 (VAE), Goodfellow et al. 2014 (GAN), and Ho, Jain, and Abbeel 2020 (DDPM, modern diffusion). The CS236 syllabus and lecture-1 slide deck list each with the arXiv link; reading the abstracts is a fast way to feel the original framing of each paradigm before the textbooks tidied them up.

Adjacent topics

Where this leads inside this track.

Autoregressive models, factoring by the chain rule (lesson 2). This lesson named “predict the next piece, one at a time” as paradigm 1 and stopped there. Lesson 2 opens it up: the chain rule of probability, the next-token prediction objective, and the architecture moves that make modern language models tractable at long context.
The four-paradigm landscape, and where modern systems sit (lesson 15). This lesson’s map is the spine of the whole track. Lesson 15 returns to it at the end, with the full math of every paradigm filled in, and places Stable Diffusion, modern image generators, and autoregressive LLMs on the map explicitly.