Skip to content

References: Parallelism

Source curriculum (structural mirror, cited as further study):
• Stanford CS336, "Language Modeling from Scratch", Lectures 7-8:
Parallelism
Instructors: Tatsunori Hashimoto and Percy Liang (Stanford)
Course page: https://cs336.stanford.edu/
Lecture videos: YouTube playlist
https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV
License: no explicit license is published on the course site; lecture
videos are on YouTube under standard terms; slides are public on GitHub
without a stated license.
Required attribution: "Based on the structure of Stanford CS336,
'Language Modeling from Scratch,' by Tatsunori Hashimoto and Percy Liang
(cs336.stanford.edu). This is an independent structural mirror in
original prose; it reproduces no course materials, and Stanford does
not endorse it."
This lesson mirrors the structure of Lectures 7 and 8 (parallelism). Two
lectures are collapsed here because they cover the same material across
schemes. Clawdemy's lessons are original prose that follows the pedagogical
arc of the course. Because the source publishes no explicit license, we
cite it as a recommended companion and reproduce none of its materials. All
rights to the original course materials remain with their creators.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this connects inside the track.

  • Counting the cost (lesson 2). The 16N memory rule is what triggers parallelism in the first place. This lesson is how that accounting becomes an actionable cluster configuration.

  • How models run on hardware (lesson 5). The within-node-vs-across-nodes interconnect speeds are the physical reason TP lives within nodes and PP crosses them.

  • Inference (lesson 8). Closes Phase 2 by serving the trained model fast; parallelism returns in a different shape (splitting the serving load), with the KV cache from lesson 4 as the central concern.