References: The four-paradigm landscape and where modern systems sit
Primary source
Section titled “Primary source”- Stanford CS236, Deep Generative Models, capstone synthesis material (Stefano Ermon). The course’s broader framing across lectures, used here as the synthesis reference. Course page: deepgenerativemodels.github.io.
- Berkeley CS294-158 Sp24, Deep Unsupervised Learning, capstone material (Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu). Secondary framing for the same synthesis material with emphasis on the deep-unsupervised-learning side. Course page: sites.google.com/view/berkeley-cs294-158-sp24/.
Foundational papers per paradigm (the math each paradigm rests on)
Section titled “Foundational papers per paradigm (the math each paradigm rests on)”Autoregressive
Section titled “Autoregressive”- “Attention Is All You Need” (Vaswani et al., 2017). The transformer architecture, the standard parameterization for modern autoregressive language models. Masked self-attention enforces causality.
- “Pixel Recurrent Neural Networks” (van den Oord et al., 2016). PixelRNN, the proof that autoregressive models work on images by ordering pixels as a sequence.
Latent-variable (VAE-family)
Section titled “Latent-variable (VAE-family)”- “Auto-Encoding Variational Bayes” (Kingma, Welling, 2013). The original VAE paper. The ELBO derivation and the reparameterization trick are both in section 2.
- “Stochastic Backpropagation and Approximate Inference in Deep Generative Models” (Rezende, Mohamed, Wierstra, 2014). Independent contemporaneous formulation of the same VAE framework.
Adversarial (GAN-family)
Section titled “Adversarial (GAN-family)”- “Generative Adversarial Networks” (Goodfellow et al., 2014). The original GAN paper. Minimax objective in section 3; the optimal-discriminator and Jensen-Shannon-divergence analysis in section 4.
- “Wasserstein GAN” (Arjovsky, Chintala, Bottou, 2017). The Wasserstein-distance formulation with weight clipping.
- “Improved Training of Wasserstein GANs” (Gulrajani et al., 2017). The gradient penalty variant (WGAN-GP) that replaced weight clipping as the standard training method.
Score-based / diffusion
Section titled “Score-based / diffusion”- “Generative Modeling by Estimating Gradients of the Data Distribution” (Song, Ermon, 2019). The NCSN paper, the original multi-noise-level score-matching framework.
- “Denoising Diffusion Probabilistic Models” (Ho, Jain, Abbeel, 2020). The DDPM paper, the noise-prediction MSE training framework.
- “Score-Based Generative Modeling through Stochastic Differential Equations” (Song et al., 2021). The SDE-unification paper from lesson 14.
Canonical modern systems referenced in the capstone
Section titled “Canonical modern systems referenced in the capstone”Autoregressive at scale
Section titled “Autoregressive at scale”- Each major autoregressive language model release in the last few years (the GPT family, the Llama family, the Mistral family, the Claude family, the Gemini family) is a paradigm-1 system at scale. Read each release’s technical report for the specific architectural and training choices; the paradigm placement does not change.
Latent diffusion (the dominant text-to-image paradigm)
Section titled “Latent diffusion (the dominant text-to-image paradigm)”- “High-Resolution Image Synthesis with Latent Diffusion Models” (Rombach et al., 2022). Stable Diffusion. The canonical latent-diffusion-hybrid system; reads as paradigm 2 plus paradigm 4 in the capstone vocabulary.
- “Hierarchical Text-Conditional Image Generation with CLIP Latents” / “DALL-E 2” / “unCLIP” (Ramesh et al., 2022). The unCLIP architecture: a CLIP-aligned text-to-image prior generates an image embedding from text, then a pixel-space diffusion decoder with classifier-free guidance maps the embedding back to pixels. Two-stage diffusion system rather than the single-stage pixel-space framing.
GAN-based face generation
Section titled “GAN-based face generation”- “A Style-Based Generator Architecture for Generative Adversarial Networks” (Karras, Laine, Aila, 2019). StyleGAN. The canonical modern face-generation system; reads as paradigm 3 with a stable-training variant (non-saturating logistic loss with R1 regularization), not WGAN-GP directly.
- “Analyzing and Improving the Image Quality of StyleGAN” (Karras et al., 2020). StyleGAN2. Further improvements to the face-generation pipeline.
Diffusion at scale
Section titled “Diffusion at scale”- “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding” (Saharia et al., 2022). Imagen. Text-to-image diffusion with a frozen large language model providing the text encoding.
- “Video Diffusion Models” (Ho et al., 2022). The diffusion-paradigm extension to video; reads as paradigm 4 extended.
Distilled few-step samplers
Section titled “Distilled few-step samplers”- “Consistency Models” (Song et al., 2023). The distillation framework for one-step or few-step sampling.
- “LCM-LoRA: A Universal Stable-Diffusion Acceleration Module” (Luo et al., 2023). Distillation applied to Stable Diffusion.
Cross-paradigm comparison surveys
Section titled “Cross-paradigm comparison surveys”- “A Survey on Generative Diffusion Models” (Yang et al., 2023). A broad survey covering the diffusion paradigm’s variants. Useful as a map of the paradigm-4 literature.
- “Generative Adversarial Networks: An Overview” (Creswell et al., 2018). An older but still useful GAN survey for the paradigm-3 landscape.
What this track did NOT cover (pointers to other expertise)
Section titled “What this track did NOT cover (pointers to other expertise)”- Systems engineering of training large generative models. Distributed training frameworks, hardware-aware optimization, training-data pipelines, debugging at scale. These topics belong in an MLOps or systems-engineering track and require expertise this lesson does not develop.
- Policy, governance, and societal questions around generative AI. The §6 watch-territory framing on lessons 7, 12, 13, and 14 named six categories of policy questions (use-case appropriateness, provenance and watermarking, sector-specific deployment, training-data IP and licensing, likeness and consent, prompt-injection content risks). Each requires expertise in its forum.
- Frontier research directions. New objectives, new architectures, new sampling procedures appear continuously. The four-paradigm framework gives you the language to read them; the frontier itself moves faster than any single course can keep up with.
Closing pointer
Section titled “Closing pointer”The map you opened the track with is the map you close the track with. The math underneath has been filled in. The references above are the literature this map reads.
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Stanford CS236: Deep Generative Models (Stefano Ermon) Course page: https://deepgenerativemodels.github.io/Clawdemy's lessons are original prose that follows the pedagogical arc of thissource. We do not reproduce or transcribe it; we cite it as a recommendedcompanion. All rights to the original material remain with its authors.