References: GANs, the minimax game

Source material

Source curricula (multi-source structural mirror; cited as further study):

PRIMARY (this lesson follows its framing most directly)
• Stanford CS236, "Deep Generative Models", Lecture 9: Generative Adversarial Networks
  Instructor: Stefano Ermon
  Course URL: https://deepgenerativemodels.github.io/
  Syllabus: https://deepgenerativemodels.github.io/syllabus.html
  License: standard course-page link-out; cited as further study

SECONDARY (also contributed to this lesson's framing)
• Berkeley CS294-158, "Deep Unsupervised Learning" (Spring 2024), Lecture 5: Generative Adversarial Networks / Implicit Models
  Instructors: Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
  Course URL: https://sites.google.com/view/berkeley-cs294-158-sp24/
  License: standard course-page link-out; cited as further study

Clawdemy's lessons are original prose that follows the pedagogical arc of these
two courses, anchored on CS236's lecture order with CS294-158 framing pulled in
where its slide deck and recording are stronger. We do not reproduce or
transcribe the lectures; we cite them as the recommended companions. All rights
to the original course materials remain with the respective instructors and
institutions.

Watch this next

Stanford CS236 (Stefano Ermon), course homepage. Lecture 9 (GANs) is the primary anchor; the optimal-discriminator derivation and the JS-divergence reduction follow the original Goodfellow et al. proof closely. The course notes at deepgenerativemodels.github.io/notes include the algebra in full.
Berkeley CS294-158 Sp24 (Pieter Abbeel et al.), course homepage. Lecture 5 (GANs / Implicit Models) covers the same minimax framework with additional examples on training pathologies and a clearer presentation of the non-saturating loss.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

“Generative Adversarial Networks” (Goodfellow et al., 2014). The original GAN paper. Section 3 is the minimax-formulation; Section 4 has the optimal-discriminator derivation and the JS-divergence reduction this lesson walks through. Among the most-cited papers in modern ML; worth reading the introduction and Section 4 even at a glance.
“A Style-Based Generator Architecture for Generative Adversarial Networks” (Karras et al., 2019; StyleGAN). The StyleGAN paper that set the standard for high-resolution face generation. Useful if you want to see what GANs were capable of in their dominant era and what architectural choices (adaptive instance normalization, style-based generator) drove that quality.
“NIPS 2016 Tutorial: Generative Adversarial Networks” (Goodfellow, 2017). Goodfellow’s own tutorial, covering theoretical foundations and practical training tricks in one place. Especially useful for the section on mode collapse and the standard mitigations that predate Wasserstein-GAN.

Adjacent topics

Where this sits in the track.

VAE training in practice (previous lesson). VAEs and GANs sit at opposite ends of a trade-off: VAEs keep a likelihood objective (the ELBO bound) but produce blurrier samples; GANs drop likelihood entirely for sharper samples and pay with training instability. Reading them back to back makes the trade-off concrete.
GAN training in practice, Wasserstein loss and gradient penalty (next lesson). Phase 2 continues with the WGAN family, which changes the divergence the game minimizes (Wasserstein distance instead of JS) and adds gradient-penalty regularization. The training instability from this lesson is partly addressed there; mode collapse is reduced but not eliminated.
Evaluating generative models (lesson 9). Because GANs do not give a likelihood, evaluating them requires sample-based metrics. FID, Inception Score, and Precision/Recall for distributions are the standard tools, all introduced (with their limits) in lesson 9.
Maximum likelihood and the KL view (lesson 3). The “GANs do not train on forward KL” claim from the L3 cheatsheet is precisely the JS-divergence reduction this lesson derives. The L3 cross-paradigm table (forward KL vs JS vs Wasserstein vs score matching) is the higher-level map this lesson fills in one cell of.