Skip to content

References: GANs and VAEs

This lesson follows Stanford CS231n’s treatment of the first generation of deep generative image models (VAEs and GANs), covered in Lecture 13.

  • Course: Stanford CS231n, “Deep Learning for Computer Vision”
  • Instructors: Fei-Fei Li, Ehsan Adeli, and Justin Johnson (Stanford University)
  • Course site: cs231n.stanford.edu
  • This lesson maps to: Lecture 13 (Generative Models 1: VAEs and GANs).

Attribution (Clawdemy-authored): Stanford CS231n: Deep Learning for Computer Vision, Fei-Fei Li, Ehsan Adeli, and Justin Johnson, Stanford University (cs231n.stanford.edu). CS231n does not publish a required citation string; this is the attribution Clawdemy uses.

The current term’s lecture recordings are posted on Canvas for enrolled Stanford students. Recordings from previous years are publicly available on YouTube under YouTube’s standard license; Clawdemy links out rather than embedding or rehosting. The course notes (cs231n.github.io) and site are Stanford’s. No Creative Commons license is published for the lectures, so we treat them as link-only references.

  • Variational Autoencoders. Kingma, Welling, “Auto-Encoding Variational Bayes” (ICLR 2014). The original VAE paper with the reparameterization trick and the ELBO derivation.
  • Concurrent work. Rezende, Mohamed, Wierstra, “Stochastic Backpropagation and Approximate Inference in Deep Generative Models” (ICML 2014). Independent contemporaneous derivation.
  • VQ-VAE. van den Oord, Vinyals, Kavukcuoglu, “Neural Discrete Representation Learning” (NeurIPS 2017). Discrete-latent variant used in many subsequent text-to-image systems as the first-stage encoder.
  • GAN (original). Goodfellow et al., “Generative Adversarial Networks” (NeurIPS 2014). The min-max adversarial framework that started the field.
  • DCGAN. Radford, Metz, Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks” (ICLR 2016). The architectural recipe that made GAN training reliable in the early years.
  • WGAN. Arjovsky, Chintala, Bottou, “Wasserstein GAN” (ICML 2017) and Gulrajani et al., “Improved Training of Wasserstein GANs” (NeurIPS 2017). One of the major stability improvements.
  • StyleGAN (v1/v2/v3). Karras, Laine, Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks” (CVPR 2019; CVPR 2020 for v2; NeurIPS 2021 for v3). Face-generation quality breakthrough; structured latent space.
  • BigGAN. Brock, Donahue, Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis” (ICLR 2019). Demonstrated GANs scale to class-conditional generation at high resolution on ImageNet.
  • Pix2Pix. Isola, Zhu, Zhou, Efros, “Image-to-Image Translation with Conditional Adversarial Networks” (CVPR 2017).
  • CycleGAN. Zhu, Park, Isola, Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks” (ICCV 2017).
  • SRGAN. Ledig et al., “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” (CVPR 2017).

Further study (deeper mechanics in sister tracks)

Section titled “Further study (deeper mechanics in sister tracks)”
  • T19 (planned, generative modeling). Will cover the ELBO derivation in detail, including the variational inference foundation and the math behind why the reparameterization trick gives an unbiased gradient estimator. The right destination if you want to fully understand VAEs.
  • T24 (planned, image generation). Will cover GAN training dynamics, the min-max convergence story, mode-collapse analysis, and the major stability tricks. The right destination if you want to actually train a GAN in production.
  • CS231n full generative-models notes are integrated into Lec 13’s slides; no standalone cs231n.github.io page exists for this specific lecture’s topic.

Clawdemy follows CS231n’s Lec 13 ordering (discriminative-vs-generative framing, then VAE, then GAN, then the comparison) and stays at vision-applied-intuition level per the Track 16 Phase 0 arc (deep derivations deferred to T19 and T24 as named above). The reparameterization-trick worked examples (body: μ = [0.5, -0.2], σ = [0.1, 0.3], ε = [0.5, -1.0]z = [0.55, -0.5]; practice: 3-dim case → z = [0.6, 0.65, -0.2]) are Clawdemy-authored against the standard formula z = μ + σ · ε. The VAE-vs-GAN trade-off table summarizes well-known practitioner consensus. We do not reproduce CS231n’s slides, figures, problem sets, or lecture text. Full attribution policy: see Doc/attribution-policy.md.