Skip to content

References: CNN architectures and training at scale

This lesson follows Stanford CS231n’s case-studies treatment of CNN architectures (Lec 6) and folds in the training-at-scale material (Lec 11) per the Track 16 Phase 0 arc.

  • Course: Stanford CS231n, “Deep Learning for Computer Vision”
  • Instructors: Fei-Fei Li, Ehsan Adeli, and Justin Johnson (Stanford University)
  • Course site: cs231n.stanford.edu
  • Course notes (case studies of CNN architectures): cs231n.github.io/convolutional-networks (the “Case studies” section surveys LeNet, AlexNet, ZF Net, VGGNet, GoogLeNet, and ResNet with their canonical structural ideas and parameter counts).
  • This lesson maps to: Lecture 6 (CNN Architectures) + Lecture 11 (Large Scale Distributed Training), the latter folded as the training-at-scale subsection per the Phase 0 arc.

Attribution (Clawdemy-authored): Stanford CS231n: Deep Learning for Computer Vision, Fei-Fei Li, Ehsan Adeli, and Justin Johnson, Stanford University (cs231n.stanford.edu). CS231n does not publish a required citation string; this is the attribution Clawdemy uses.

The current term’s lecture recordings are posted on Canvas for enrolled Stanford students. Recordings from previous years are publicly available on YouTube under YouTube’s standard license; Clawdemy links out rather than embedding or rehosting. The course notes (cs231n.github.io) and site are Stanford’s. No Creative Commons license is published for the lectures, so we treat them as link-only references.

Primary architecture papers (cited only by name and venue)

Section titled “Primary architecture papers (cited only by name and venue)”
  • AlexNet. Krizhevsky, Sutskever, Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” (NeurIPS 2012). The inflection point.
  • VGG. Simonyan, Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” (ICLR 2015 / arXiv 2014). The “stack many 3 by 3 convs” paper.
  • GoogLeNet (Inception). Szegedy et al., “Going Deeper with Convolutions” (CVPR 2015 / arXiv 2014). The Inception module paper.
  • ResNet. He, Zhang, Ren, Sun, “Deep Residual Learning for Image Recognition” (CVPR 2016 / arXiv 2015). The residual block paper. Best-paper award; one of the most-cited papers in machine learning.
  • CS231n case studies (full). cs231n.github.io/convolutional-networks goes into more architectural variants (DenseNet’s dense connections, EfficientNet’s scaling, MobileNet’s depthwise-separable convs) than this lesson surveys.
  • Introduction to Deep Learning (Track 12, Clawdemy). Lessons 4 and 5 cover the survey-level intuition for convolution and the edges-to-objects hierarchy; T16 readers who want gentler historical context will find it there.
  • Distributed training depth. PyTorch’s DistributedDataParallel, NCCL’s AllReduce implementation, and Horovod’s open-source AllReduce wrapper are the standard production references for the data-parallel pattern named here.

Clawdemy follows CS231n’s case-studies ordering (AlexNet, VGG, GoogLeNet, ResNet, in chronological / inheritance order) and cites the verbatim numbers the case-studies page gives: AlexNet’s “16% [top-5 error] compared to runner-up with 26% error” and ~60M parameters; VGG’s “16 CONV/FC layers” + “only 3x3 convolutions and 2x2 pooling” + ~140M parameters; GoogLeNet’s “4M, compared to AlexNet with 60M” parameters + “Average Pooling instead of Fully Connected layers at the top”; ResNet’s “special skip connections and a heavy use of batch normalization.” The residual block formula (y = F(x) + x), the 152-layer figure, and the training-at-scale subsection (data vs model parallelism, mixed precision, LR warmup, linear scaling rule, the AlexNet-vs-foundation-scale anchor) are Clawdemy-authored against the canonical sources. We do not reproduce CS231n’s slides, figures, or problem sets. Full attribution policy: see Doc/attribution-policy.md.