References: Building and training a net: micrograd

Source material

Source curriculum (structural mirror, cited as further study):
• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 1 (second half):
  "The spelled-out intro to neural networks and backpropagation: building micrograd"
  Creator: Andrej Karpathy
  Video: https://www.youtube.com/watch?v=VMj-3S1tku0
  Code repo (micrograd): https://github.com/karpathy/micrograd (MIT License)
  Series repo: https://github.com/karpathy/nn-zero-to-hero (MIT License)
  Series page: https://karpathy.ai/zero-to-hero.html
  License: micrograd and the series code are MIT-licensed; the video is YouTube standard.
This lesson covers the second half of Lecture 1, where Karpathy assembles the
engine into neurons, layers, and an MLP and trains it with gradient descent.
Clawdemy's lessons are original prose following the pedagogical arc of this
series; we do not reproduce or transcribe the video or code. The single-weight
worked example here is ours, built to be checkable by hand. All rights to the
original video and code remain with the creator.

Watch this next

The spelled-out intro to neural networks and backpropagation: building micrograd (Andrej Karpathy) by Andrej Karpathy. The same lecture the last lesson mirrored, now the second half: Karpathy wraps the engine in Neuron, Layer, and MLP classes, builds a tiny dataset, and runs the training loop live. The moment to watch for is when he forgets to zero the gradients and the training misbehaves, then fixes it; seeing the bug happen on screen is the best way to remember why step 2 of the loop matters.

Going deeper

micrograd on GitHub (MIT License). The full engine plus a demo.ipynb that trains a small MLP on a toy classification dataset and plots the decision boundary. After this lesson, reading the training-loop cell, forward, zero grad, backward, update, confirms that the loop really is just those four lines.
Neural Networks: Zero to Hero (full series) and its code repo by Andrej Karpathy. The series this track follows. The next lecture leaves micrograd behind and starts building makemore, a character-level language model.

Adjacent topics

Where this sits in the curriculum.

The previous lesson (the autograd engine). This lesson is built entirely on the engine from lesson 1: neurons, layers, and the loss are all expressions the engine differentiates, and loss.backward() is the same backward pass you walked by hand there. If the gradient flow feels fast, a reread of the autograd lesson grounds it.
Gradient descent and minima (calculus track). Gradient descent is the optimization idea that a function decreases fastest in the direction opposite its gradient. The calculus track’s treatment of derivatives, slopes, and minima is the formal backing for “step downhill,” the rule this lesson applies to every parameter at once.