References: What "from scratch" means, and the tokenizer
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Stanford CS336, "Language Modeling from Scratch", Lecture 1: Overview, tokenization Instructors: Tatsunori Hashimoto and Percy Liang (Stanford) Course page: https://cs336.stanford.edu/ Lecture videos: YouTube playlist https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV Assignment 1 (Basics): https://github.com/stanford-cs336/assignment1-basics License: no explicit license is published on the course site; lecture videos are on YouTube under standard terms; slides and assignment code are public on GitHub without a stated license. Required attribution: "Based on the structure of Stanford CS336, 'Language Modeling from Scratch,' by Tatsunori Hashimoto and Percy Liang (cs336.stanford.edu). This is an independent structural mirror in original prose; it reproduces no course materials, and Stanford does not endorse it."This lesson mirrors the structure of Lecture 1 (the from-scratch overview andtokenization). Clawdemy's lessons are original prose that follows thepedagogical arc of the course. Because the source publishes no explicitlicense, we take the conservative posture: we cite the course as arecommended companion and reproduce none of its materials (no slides, code,or assignment text). All rights to the original course materials remain withtheir creators.Watch this next
Section titled “Watch this next”- Stanford CS336, Lecture 1: Overview and tokenization by Tatsunori Hashimoto and Percy Liang. The lecture this lesson mirrors. It motivates the whole from-scratch, efficiency-first approach and walks tokenization in depth. Pair it with this lesson for the full version of the road map.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
CS336 Assignment 1: Basics. The hands-on counterpart: implement a byte-level BPE tokenizer, a Transformer, the loss and optimizer, and a training loop from scratch. The place to actually build what this lesson describes.
-
“Neural Machine Translation of Rare Words with Subword Units” by Sennrich, Haddow, and Birch (2016). The paper that introduced BPE to NLP. Short and readable; the original source for the merge-the-most-frequent-pair idea.
-
The Hugging Face tokenizers course chapter. The companion that builds intuition for tokenizers from the using side, including the normalization and pre-tokenization steps that surround the BPE core covered here.
Adjacent topics
Section titled “Adjacent topics”Where this connects inside the track and the wider curriculum.
-
Counting the cost: FLOPs, memory, and arithmetic intensity (lesson 2). The next lesson introduces the efficiency accounting this lesson named as the track’s through-line, and it lets you quantify the vocabulary-size trade-off concretely.
-
Track 14, Tokenizers up close. The practical-track companion: it uses and trains a fast (byte-level BPE) tokenizer through the Hugging Face library. Same algorithm, approached from the using side rather than the building side.
-
Track 13 (Build Neural Networks from Scratch). The other from-scratch track: it builds the conceptual engine (autograd, a small GPT). This track builds the full production pipeline; the two are complementary.