Skip to content

References: What "from scratch" means, and the tokenizer

Source curriculum (structural mirror, cited as further study):
• Stanford CS336, "Language Modeling from Scratch", Lecture 1: Overview, tokenization
Instructors: Tatsunori Hashimoto and Percy Liang (Stanford)
Course page: https://cs336.stanford.edu/
Lecture videos: YouTube playlist
https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaTUHhq-yembLCV
Assignment 1 (Basics): https://github.com/stanford-cs336/assignment1-basics
License: no explicit license is published on the course site; lecture
videos are on YouTube under standard terms; slides and assignment code
are public on GitHub without a stated license.
Required attribution: "Based on the structure of Stanford CS336,
'Language Modeling from Scratch,' by Tatsunori Hashimoto and Percy Liang
(cs336.stanford.edu). This is an independent structural mirror in
original prose; it reproduces no course materials, and Stanford does
not endorse it."
This lesson mirrors the structure of Lecture 1 (the from-scratch overview and
tokenization). Clawdemy's lessons are original prose that follows the
pedagogical arc of the course. Because the source publishes no explicit
license, we take the conservative posture: we cite the course as a
recommended companion and reproduce none of its materials (no slides, code,
or assignment text). All rights to the original course materials remain with
their creators.
  • Stanford CS336, Lecture 1: Overview and tokenization by Tatsunori Hashimoto and Percy Liang. The lecture this lesson mirrors. It motivates the whole from-scratch, efficiency-first approach and walks tokenization in depth. Pair it with this lesson for the full version of the road map.

A short, durable list. Each link is a specific next step, not a generic pile.

Where this connects inside the track and the wider curriculum.

  • Counting the cost: FLOPs, memory, and arithmetic intensity (lesson 2). The next lesson introduces the efficiency accounting this lesson named as the track’s through-line, and it lets you quantify the vocabulary-size trade-off concretely.

  • Track 14, Tokenizers up close. The practical-track companion: it uses and trains a fast (byte-level BPE) tokenizer through the Hugging Face library. Same algorithm, approached from the using side rather than the building side.

  • Track 13 (Build Neural Networks from Scratch). The other from-scratch track: it builds the conceptual engine (autograd, a small GPT). This track builds the full production pipeline; the two are complementary.