References: makemore, the bigram model
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 2: "The spelled-out intro to language modeling: building makemore" Creator: Andrej Karpathy Video: https://www.youtube.com/watch?v=PaCmpygFfXo Code repo (makemore): https://github.com/karpathy/makemore (MIT License) Series repo: https://github.com/karpathy/nn-zero-to-hero (MIT License) Series page: https://karpathy.ai/zero-to-hero.html License: makemore and the series code are MIT-licensed; the video is YouTube standard.This lesson covers Lecture 2, where Karpathy builds the bigram model both bycounting and as a one-layer neural network, and shows the two agree. Clawdemy'slessons are original prose following the pedagogical arc of this series; we donot reproduce or transcribe the video or code. The worked count/NLL examplehere is ours, built to be checkable by hand. All rights to the original videoand code remain with the creator.Watch this next
Section titled “Watch this next”- The spelled-out intro to language modeling: building makemore (Andrej Karpathy) by Andrej Karpathy. The lecture this lesson mirrors. Karpathy builds the count table, generates the first (bad) names, defines the negative log likelihood loss, then rebuilds the same model as a one-layer network and trains it until the two agree. Watching the generated names go from gibberish to roughly name-like, and seeing the trained network’s probabilities match the counts, is the clearest way to make the two-routes-one-answer idea concrete.
Going deeper
Section titled “Going deeper”-
makemore on GitHub (MIT License). The full project, which grows over the next several lectures from this bigram model up to a transformer. The
names.txtdataset and the bigram code are the parts to read after this lesson. -
Neural Networks: Zero to Hero (full series) and its code repo by Andrej Karpathy. The series this track follows. The next lecture extends makemore from one character of context to several, through a multilayer perceptron with learned character embeddings.
Adjacent topics
Section titled “Adjacent topics”Where this sits in the curriculum.
-
The previous lessons (the autograd engine, building and training a net). The neural-network version of the bigram model is trained with exactly the engine and the gradient-descent loop from Phase 1: one-hot input, a single linear layer, a loss,
backward(), and a downhill step. If the training half felt fast, those two lessons are the grounding. -
How AI reads tokens (AI Foundations track). This lesson works at the character level for clarity; real language models work with tokens (chunks of text). The AI Foundations treatment of tokenization is the bridge from “one character at a time” to “one token at a time,” and the final lesson of this track builds a tokenizer from scratch.