References: GAN training in practice, Wasserstein loss and gradient penalty
Source material
Section titled “Source material”Source curricula (multi-source structural mirror; cited as further study):
PRIMARY (this lesson follows its framing most directly)• Stanford CS236, "Deep Generative Models", Lecture 10: Generative Adversarial Networks (continued) Instructor: Stefano Ermon Course URL: https://deepgenerativemodels.github.io/ Syllabus: https://deepgenerativemodels.github.io/syllabus.html License: standard course-page link-out; cited as further study
SECONDARY (parallel framing where applicable; CS294-158's GAN lecture coversWGAN-family briefly within the broader implicit-models lecture)• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 5: Generative Adversarial Networks / Implicit Models Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/ License: standard course-page link-out; cited as further study
Clawdemy's lessons are original prose that follows the pedagogical arc of thesetwo courses, anchored on CS236's lecture order with CS294-158 framing pulled inwhere its slide deck and recording are stronger. We do not reproduce ortranscribe the lectures; we cite them as the recommended companions. All rightsto the original course materials remain with the respective instructors andinstitutions.Watch this next
Section titled “Watch this next”-
Stanford CS236 (Stefano Ermon), course homepage. Lecture 10 (GANs, continued) is the primary anchor; it covers the Wasserstein objective, the Kantorovich-Rubinstein duality, and the gradient-penalty derivation. The course notes at deepgenerativemodels.github.io/notes include the duality proof and the gradient-penalty motivation in more detail than the slides.
-
Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), course homepage. Lecture 5 (GANs / Implicit Models) covers the WGAN family alongside the original GAN in one lecture; the comparison framing is the secondary contribution.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
“Wasserstein GAN” (Arjovsky, Chintala, Bottou, 2017). The original WGAN paper. Section 2 covers the Earth Mover’s distance and its advantages over JS; Section 3 covers the duality form; Section 4 introduces the weight-clipping enforcement (which the next paper improves). The introduction’s “geometric distance vs disjoint-support saturation” argument is the cleanest motivation you will find for swapping JS out.
-
“Improved Training of Wasserstein GANs” (Gulrajani, Ahmed, Arjovsky, Dumoulin, Courville, 2017). The WGAN-GP paper. Replaces weight clipping with the gradient penalty; this is the recipe most production-grade WGAN implementations actually use. Section 4 derives the gradient-penalty form and explains why interpolated samples are the right evaluation points.
-
“Spectral Normalization for Generative Adversarial Networks” (Miyato, Kataoka, Koyama, Yoshida, 2018). An alternative Lipschitz-enforcement method (constrain the spectral norm of each weight matrix). Different mechanism from the gradient penalty, similar effect. Worth knowing about because some modern GAN-family architectures use spectral normalization in place of (or alongside) the gradient penalty.
Adjacent topics
Section titled “Adjacent topics”Where this sits in the track.
-
GANs, the minimax game (previous lesson). L7 introduced the minimax framework with the original JS-divergence objective and named the paradigm-level pathologies (vanishing gradients, mode collapse, no stopping criterion). This lesson keeps the framework but changes the divergence, which addresses the first pathology directly, the third partially, and the second meaningfully. The improvement is incremental but practically large.
-
Evaluating generative models (next lesson, L9). Phase 2 closes with how to evaluate generative models when likelihood is bounded (VAEs), unavailable (GANs), or only one of many possible quality measures. FID, Inception Score, and Precision/Recall for distributions are the standard tools. The lesson here flagged FID/IS as the relevant evaluation methods for the WGAN-GP scope; L9 builds them out.
-
The four-paradigm map (lesson 1). This lesson sharpens the L1 placement of GANs by showing that the “implicit / no-likelihood” branch is itself parameterizable by which divergence you choose. JS (original GAN) and Wasserstein (WGAN-GP) are two answers to the divergence-choice question within the GAN family. Spectral GANs and other variants are further answers. The cross-paradigm map can be refined as “which divergence does this GAN use?” once you are inside the paradigm.
-
Lesson 14 (score-based diffusion via SDEs). The fact that “different divergence choices give different paradigms” appears again in Phase 3: diffusion can be viewed through the score-matching lens (an objective related to but distinct from forward KL) and through the SDE lens (a continuous-time view). Recognizing divergence-choice as a paradigm-design parameter, which this lesson sets up explicitly, makes the diffusion derivations easier to read.