Skip to content

References: Building and training a net: micrograd

Source curriculum (structural mirror, cited as further study):
• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 1 (second half):
"The spelled-out intro to neural networks and backpropagation: building micrograd"
Creator: Andrej Karpathy
Video: https://www.youtube.com/watch?v=VMj-3S1tku0
Code repo (micrograd): https://github.com/karpathy/micrograd (MIT License)
Series repo: https://github.com/karpathy/nn-zero-to-hero (MIT License)
Series page: https://karpathy.ai/zero-to-hero.html
License: micrograd and the series code are MIT-licensed; the video is YouTube standard.
This lesson covers the second half of Lecture 1, where Karpathy assembles the
engine into neurons, layers, and an MLP and trains it with gradient descent.
Clawdemy's lessons are original prose following the pedagogical arc of this
series; we do not reproduce or transcribe the video or code. The single-weight
worked example here is ours, built to be checkable by hand. All rights to the
original video and code remain with the creator.
  • The spelled-out intro to neural networks and backpropagation: building micrograd (Andrej Karpathy) by Andrej Karpathy. The same lecture the last lesson mirrored, now the second half: Karpathy wraps the engine in Neuron, Layer, and MLP classes, builds a tiny dataset, and runs the training loop live. The moment to watch for is when he forgets to zero the gradients and the training misbehaves, then fixes it; seeing the bug happen on screen is the best way to remember why step 2 of the loop matters.
  • micrograd on GitHub (MIT License). The full engine plus a demo.ipynb that trains a small MLP on a toy classification dataset and plots the decision boundary. After this lesson, reading the training-loop cell, forward, zero grad, backward, update, confirms that the loop really is just those four lines.

  • Neural Networks: Zero to Hero (full series) and its code repo by Andrej Karpathy. The series this track follows. The next lecture leaves micrograd behind and starts building makemore, a character-level language model.

Where this sits in the curriculum.

  • The previous lesson (the autograd engine). This lesson is built entirely on the engine from lesson 1: neurons, layers, and the loss are all expressions the engine differentiates, and loss.backward() is the same backward pass you walked by hand there. If the gradient flow feels fast, a reread of the autograd lesson grounds it.

  • Gradient descent and minima (calculus track). Gradient descent is the optimization idea that a function decreases fastest in the direction opposite its gradient. The calculus track’s treatment of derivatives, slopes, and minima is the formal backing for “step downhill,” the rule this lesson applies to every parameter at once.