References: a WaveNet-style hierarchical model

Source material

Source curriculum (structural mirror, cited as further study):
• Andrej Karpathy, "Neural Networks: Zero to Hero", Lecture 6:
  "Building makemore Part 5: Building a WaveNet"
  Creator: Andrej Karpathy
  Video: https://www.youtube.com/watch?v=t3YJ5hKiMQ0
  Code repo (makemore): https://github.com/karpathy/makemore (MIT License)
  Series repo: https://github.com/karpathy/nn-zero-to-hero (MIT License)
  Series page: https://karpathy.ai/zero-to-hero.html
  License: makemore and the series code are MIT-licensed; the video is YouTube standard.
This lesson covers Lecture 6, where Karpathy restructures the flat MLP into a
hierarchical, WaveNet-style model and reorganizes the code into reusable layer
modules. Clawdemy's lessons are original prose following the pedagogical arc of
this series; we do not reproduce or transcribe the video or code. The
receptive-field table and the brianna staging example here are ours. All rights
to the original video and code remain with the creator.

Watch this next

Building makemore Part 5: Building a WaveNet (Andrej Karpathy) by Andrej Karpathy. The lecture this lesson mirrors. Karpathy reshapes the flat model into the tree, wrestles with getting the tensor shapes right at each level (a practical, instructive struggle), and rebuilds the network out of clean, reusable layer modules. Watching the receptive field grow level by level, and the code turn into a tidy stack of layers, makes both the architectural and the software lessons concrete.

Going deeper

WaveNet: A Generative Model for Raw Audio (van den Oord et al., 2016) (arXiv). The original DeepMind paper. It introduced the dilated-causal-convolution hierarchy this lesson is based on, and produced the most natural synthetic speech of its time. Worth a skim to see the idea in its first, audio-focused form.
makemore on GitHub (MIT License) and the Zero to Hero series. The WaveNet model is the last makemore stage; the next lecture leaves makemore behind and builds a GPT.

Adjacent topics

Where this sits in the curriculum.

The MLP language model (lesson 4). This lesson directly restructures that flat model. The embeddings, the tanh hidden layer, and the softmax output all carry over; what changes is how the context characters are combined, gradually up a tree instead of all at once. If the “flat fusion” critique felt fast, that lesson is the grounding.
The transformer (next phase, and the AI Foundations track). The “stack identical refining layers” structure here is exactly the shape of a transformer, which the next phase builds from scratch. The AI Foundations track describes transformers from the user’s side; this track is about to build one. WaveNet’s fixed tree gives way there to attention, a more flexible way for each position to choose what to combine.