Skip to content

References: GAN training in practice, Wasserstein loss and gradient penalty

Source curricula (multi-source structural mirror; cited as further study):
PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 10: Generative Adversarial Networks (continued)
Instructor: Stefano Ermon
Course URL: https://deepgenerativemodels.github.io/
Syllabus: https://deepgenerativemodels.github.io/syllabus.html
License: standard course-page link-out; cited as further study
SECONDARY (parallel framing where applicable; CS294-158's GAN lecture covers
WGAN-family briefly within the broader implicit-models lecture)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 5: Generative Adversarial Networks / Implicit Models
Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
License: standard course-page link-out; cited as further study
Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this sits in the track.

  • GANs, the minimax game (previous lesson). L7 introduced the minimax framework with the original JS-divergence objective and named the paradigm-level pathologies (vanishing gradients, mode collapse, no stopping criterion). This lesson keeps the framework but changes the divergence, which addresses the first pathology directly, the third partially, and the second meaningfully. The improvement is incremental but practically large.

  • Evaluating generative models (next lesson, L9). Phase 2 closes with how to evaluate generative models when likelihood is bounded (VAEs), unavailable (GANs), or only one of many possible quality measures. FID, Inception Score, and Precision/Recall for distributions are the standard tools. The lesson here flagged FID/IS as the relevant evaluation methods for the WGAN-GP scope; L9 builds them out.

  • The four-paradigm map (lesson 1). This lesson sharpens the L1 placement of GANs by showing that the “implicit / no-likelihood” branch is itself parameterizable by which divergence you choose. JS (original GAN) and Wasserstein (WGAN-GP) are two answers to the divergence-choice question within the GAN family. Spectral GANs and other variants are further answers. The cross-paradigm map can be refined as “which divergence does this GAN use?” once you are inside the paradigm.

  • Lesson 14 (score-based diffusion via SDEs). The fact that “different divergence choices give different paradigms” appears again in Phase 3: diffusion can be viewed through the score-matching lens (an objective related to but distinct from forward KL) and through the SDE lens (a continuous-time view). Recognizing divergence-choice as a paradigm-design parameter, which this lesson sets up explicitly, makes the diffusion derivations easier to read.