Skip to content

References: Building an autograd engine: micrograd

Source curriculum (structural mirror, cited as further study):
• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 1:
"The spelled-out intro to neural networks and backpropagation: building micrograd"
Creator: Andrej Karpathy
Video: https://www.youtube.com/watch?v=VMj-3S1tku0
Code repo: https://github.com/karpathy/micrograd (MIT License)
Series page: https://karpathy.ai/zero-to-hero.html
License: micrograd code is MIT-licensed; the video lecture is YouTube standard.
Clawdemy's lessons are original prose that follows the pedagogical arc of this
series. We do not reproduce or transcribe the videos or the code; we cite them
as the recommended companion. The worked expression in this lesson follows the
example Karpathy uses in the lecture. All rights to the original video and code
remain with the creator.
  • The spelled-out intro to neural networks and backpropagation: building micrograd (Andrej Karpathy) by Andrej Karpathy. The lecture this lesson mirrors. Karpathy builds the whole engine live in a Jupyter notebook, typing every line and explaining each one, then trains a small neural net with it. Long (about 2.5 hours) but worth it: this lesson covers the autograd-engine half (roughly the first part); watching Karpathy type the backward pass and see the gradients populate is the clearest way to make it concrete.
  • micrograd on GitHub (MIT License). The complete engine, around 150 lines, plus a small demo. Reading the backward() method after this lesson is the fastest way to confirm that the procedure really is just “local derivative times incoming gradient, walked in reverse.” The code is short enough to read in one sitting.

  • Neural Networks: Zero to Hero (full series) by Andrej Karpathy. The series this track follows, from micrograd through building a GPT. The next lecture moves from the autograd engine to language modeling.

Where this sits in the curriculum.

  • The chain rule (calculus track). Backpropagation is the chain rule applied node by node: “rates multiply through a composition” is exactly what the backward pass does as it multiplies each incoming gradient by a local derivative. If the backward walk felt fast, a reread of the chain-rule lesson grounds it.

  • What a neural network is (neural-network-intuition track). That track showed the shape of a network and that it learns by adjusting weights. This lesson is the missing mechanism: how the gradients that drive that adjustment are actually computed. The next lesson assembles the engine into a network and trains it.