Skip to content

References: Scaling laws

Source curriculum (structural mirror, cited as further study):
• Stanford CS336, "Language Modeling from Scratch", Lectures 9 and 11:
Scaling laws
Instructors: Tatsunori Hashimoto and Percy Liang (Stanford)
Course page: https://cs336.stanford.edu/
Lecture videos: YouTube playlist
https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV
License: no explicit license is published on the course site; lecture
videos are on YouTube under standard terms; slides are public on GitHub
without a stated license.
Required attribution: "Based on the structure of Stanford CS336,
'Language Modeling from Scratch,' by Tatsunori Hashimoto and Percy Liang
(cs336.stanford.edu). This is an independent structural mirror in
original prose; it reproduces no course materials, and Stanford does
not endorse it."
This lesson collapses the two scaling-laws lectures (9 and 11) per the Phase 0
mirror. Clawdemy's lessons are original prose that follows the pedagogical arc
of the course. Because the source publishes no explicit license, we cite it
as a recommended companion and reproduce none of its materials.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this connects inside the track.

  • Counting the cost (lesson 2). The 6ND compute estimate is the input to the Chinchilla budget calculation. Scaling laws make the accounting actionable.

  • The Transformer architecture (lesson 3). Architectural changes (different attention, optimizer, normalization) are judged at scale by whether they improve the scaling exponent.

  • Evaluation (lesson 10). Scaling laws predict cross-entropy loss; the next lesson is the critical look at what that loss actually correlates with for downstream tasks.