References: Building and training a net: micrograd
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 1 (second half): "The spelled-out intro to neural networks and backpropagation: building micrograd" Creator: Andrej Karpathy Video: https://www.youtube.com/watch?v=VMj-3S1tku0 Code repo (micrograd): https://github.com/karpathy/micrograd (MIT License) Series repo: https://github.com/karpathy/nn-zero-to-hero (MIT License) Series page: https://karpathy.ai/zero-to-hero.html License: micrograd and the series code are MIT-licensed; the video is YouTube standard.This lesson covers the second half of Lecture 1, where Karpathy assembles theengine into neurons, layers, and an MLP and trains it with gradient descent.Clawdemy's lessons are original prose following the pedagogical arc of thisseries; we do not reproduce or transcribe the video or code. The single-weightworked example here is ours, built to be checkable by hand. All rights to theoriginal video and code remain with the creator.Watch this next
Section titled “Watch this next”- The spelled-out intro to neural networks and backpropagation: building micrograd (Andrej Karpathy) by Andrej Karpathy. The same lecture the last lesson mirrored, now the second half: Karpathy wraps the engine in
Neuron,Layer, andMLPclasses, builds a tiny dataset, and runs the training loop live. The moment to watch for is when he forgets to zero the gradients and the training misbehaves, then fixes it; seeing the bug happen on screen is the best way to remember why step 2 of the loop matters.
Going deeper
Section titled “Going deeper”-
micrograd on GitHub (MIT License). The full engine plus a
demo.ipynbthat trains a small MLP on a toy classification dataset and plots the decision boundary. After this lesson, reading the training-loop cell, forward, zero grad, backward, update, confirms that the loop really is just those four lines. -
Neural Networks: Zero to Hero (full series) and its code repo by Andrej Karpathy. The series this track follows. The next lecture leaves micrograd behind and starts building
makemore, a character-level language model.
Adjacent topics
Section titled “Adjacent topics”Where this sits in the curriculum.
-
The previous lesson (the autograd engine). This lesson is built entirely on the engine from lesson 1: neurons, layers, and the loss are all expressions the engine differentiates, and
loss.backward()is the same backward pass you walked by hand there. If the gradient flow feels fast, a reread of the autograd lesson grounds it. -
Gradient descent and minima (calculus track). Gradient descent is the optimization idea that a function decreases fastest in the direction opposite its gradient. The calculus track’s treatment of derivatives, slopes, and minima is the formal backing for “step downhill,” the rule this lesson applies to every parameter at once.