Skip to content

Start here
Track 1: Getting Started
- 1.1 AI won't replace you
- 1.2 Your first conversation
- 1.3 API keys and the OAuth path
- 1.6 How Clawless remembers
- 1.8 CostGuard and your data
Track 4: Visual Math: Linear Algebra
- Phase 1 Geometric foundations
  - What vectors are
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Spans and basis
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear transformations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Matrix multiplication
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stepping up to 3D
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Geometry of operations
  - The determinant
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Inverses and null space
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Matrices between dimensions
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Dot products
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Advanced perspectives
  - Cross products (2D)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - 3D cross product
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cramer's rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Change of basis
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Eigenvectors
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Abstract vector spaces
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 5: AI Foundations
- Phase 1 How models read text
  - Tokens
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Embeddings
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - How models know word order
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How models think
  - Attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-head attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The transformer block
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Position embeddings + RoPE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stability: LayerNorm and RMSNorm
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Efficiency tricks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Encoder-decoder: T5
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT, part one: architecture
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT, part two: pretraining and fine-tuning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT family: DistilBERT and RoBERTa
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 How models are trained at scale
  - Pretraining
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Scaling laws and Chinchilla
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Parallelism and Flash Attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Quantization and mixed precision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 How models learn to be helpful
  - Instruction tuning, RLHF
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Preference data, reward models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Aligning models: RLHF and DPO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 5 How we steer models at inference
  - Decoding strategies
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompting
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Few-shot and in-context learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Chain of thought: thinking out loud
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 6 How models reason and act
  - How reasoning models think
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - RAG
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Function calling and tools
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agent loops: observe, plan, act
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 7 How we judge models and where they're going
  - Evaluation: LLM-as-a-Judge
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why benchmarks can mislead
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why tool-using models fail
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Transformers beyond text
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - New ways to generate
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where to be careful: a safety lens
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 6: Privacy & Local-First AI
- Phase 1 Why your privacy matters when you use AI
  - Why your worry is rational
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Your starting point
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 What an AI tool sees when you use it
  - What happens in three seconds
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 7: Git Workflow
- Phase 1 Foundations
  - Why git exists
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Your first repo
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Commit hygiene
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Undoing things
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Branching and collaboration
  - Branches
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Pull requests
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Merge conflicts
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Remotes and forks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Workflows in the wild
  - Team workflows
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Releases and tags
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cherry-pick and stash
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Rebase, deeper
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Multi-agent teams
  - Worktrees and parallel agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-agent integration patterns
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - AI-authored commits and PRs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The future of git
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 8: Visual Math: Calculus
- Phase 1 What a derivative is
  - What calculus is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The derivative as a rate
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The power rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Trig derivatives
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 The differentiation toolkit
  - The product rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The chain rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why e is special
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Implicit differentiation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Limits, carefully
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Integration and approximation
  - Integration & the FTC
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why area equals slope
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Higher-order derivatives
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Taylor series
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 9: Statistics & Probability for AI
- Phase 1 Describing data
  - Why AI runs on stats
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Center and spread
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The shape of data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Correlation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 The laws of chance
  - Probability foundations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Conditional probability
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Bayes' theorem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Random variables and distributions
  - Expected value
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The normal distribution
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The binomial distribution
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 From sample to truth
  - Sampling and the CLT
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Confidence intervals
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Hypothesis testing
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stats in ML
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 10: Classical Machine Learning
- Phase 1 Learning from data
  - What ML actually is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear regression
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Gradient descent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Classification and ensembles
  - Logistic regression
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Decision trees
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Random forests
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Boosting
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Support vector machines
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Unsupervised learning
  - K-means clustering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Hierarchical clustering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - PCA
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - t-SNE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Model evaluation
  - Bias-variance tradeoff
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cross-validation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Classification metrics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 11: Neural Network Intuition
- Phase 1 Network structure
  - Handwritten digits
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Neurons and layers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Weights and the squish
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The whole network
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How networks learn
  - What learning means
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The cost landscape
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Gradient descent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Backpropagation
  - What backprop does
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Backprop and chain rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Seeing it whole
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 12: Intro to Deep Learning
- Phase 1 Foundations and sequences
  - What deep learning adds
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why sequences need memory
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Attention and transformers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Vision and generation
  - How machines see
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - From edges to objects
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Teaching machines to imagine
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Generating by denoising
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Decisions and limits
  - Learning by trial and reward
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where deep learning breaks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Seeing the field whole
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 13: Build Neural Networks from Scratch
- Phase 1 The autograd engine
  - Building an autograd engine
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Training a neural net
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Building a language model
  - The bigram model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - An MLP language model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Activations and gradients
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Backprop by hand
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - A hierarchical (WaveNet) model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Building a transformer
  - Self-attention from scratch
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Assembling the full GPT
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Building the tokenizer
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 14: Practical Transformers
- Phase 1 Transformers library
  - What transformers do
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Run a model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Fine-tune on your data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Share on the Hub
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Data, tokenizers, tasks
  - Wrangle data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Tokenizers up close
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Main NLP tasks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Debug and get unstuck
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Demos and frontier
  - Build a demo
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Fine-tuning LLMs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Curate datasets
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Reasoning frontier
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 15: Build an LLM from Scratch
- Phase 1 The model
  - From scratch + tokenizer
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Counting the cost
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The architecture
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Attention + MoE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Systems and efficiency
  - GPUs and TPUs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Kernels (Triton, XLA)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Parallelism
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Scale, data, alignment
  - Scaling laws
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Evaluation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Data sources
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Data filtering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Post-training (SFT, RLHF)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Reasoning RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 16: Computer Vision
- Phase 1 Foundations for vision
  - Why seeing is hard
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear classifiers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Loss + optimization
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - NNs + backprop
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How machines see
  - Convolution + CNNs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - CNN architectures
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Sequence tools
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Detection + segmentation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Video understanding
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Generating + grounding vision
  - Self-supervised vision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - GANs + VAEs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - 3D vision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Vision and language
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - World modeling
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Human-centered AI
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 17: RL Foundations
- Phase 1 The RL setup
  - What RL actually is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Markov decision processes
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Value + Bellman
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Planning with a known model
  - Policy iteration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Value iteration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Model-free learning
  - Monte Carlo prediction
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Temporal-difference learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Q-learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Scaling up
  - Function approx + deep RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Policy gradient + modern RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 18: Deep Reinforcement Learning
- Phase 1 RL foundations
  - Introduction to deep RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Imitation learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - RL fundamentals
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Policy gradients
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Actor-critic
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Core deep-RL algorithms
  - Value-based RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - DQN
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - PPO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Model-based learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Planning with models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Variational inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Control as inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 RL frontiers
  - RLHF
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Offline RL: the problem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Offline RL: algorithms
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Exploration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-task and meta-RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Challenges and open problems
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 19: Generative Models and Diffusion
- Phase 1 Generative foundations
  - What a generative model is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Autoregressive models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Maximum likelihood and KL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Normalizing flows
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Latent and adversarial
  - Latent variables and the ELBO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - VAE training
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - GANs: the minimax game
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - WGAN gradient penalty
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Evaluating generative models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Energy, score, diffusion
  - Energy-based models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Score matching
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion I
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion II
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Score-based diffusion (SDEs)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The four-paradigm landscape
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 20: AI Agents and Tool Use
- Phase 1 What agents are
  - What makes an agent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Tool use
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Choosing a framework
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Agent design patterns
  - Tool definitions in depth
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Giving agents memory
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agentic RAG
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Planning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-agent systems
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Self-checking (metacognition)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Production agents
  - Building trustworthy agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Securing agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 21: LLM Ops and Production
- Phase 1 Foundations and first app
  - Launch an LLM app
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - LLM foundations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompt engineering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Building production apps
  - Augmented LLMs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Project walkthrough
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - UX for LUIs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - LLMOps
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Advanced and the field
  - What's next
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Training your own LLM
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Industry perspective
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 22: Building with Claude
- Phase 1 API foundations
  - Your first Claude API call
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The Messages API in production
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Choosing your model and the effort dial
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Augmentation patterns
  - Tool use, the foundation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Server-side tools and built-ins
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Model Context Protocol
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompt caching and context management
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Agent patterns
  - From single call to agent loop
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Six effective-agent patterns
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agent Skills and Claude Code
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Subagents and Claude Managed Agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Production
  - Shipping a Claude application
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 23: AI Safety and Alignment
- Phase 1 The risks landscape
  - AI safety as a field
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The four catastrophic risk categories
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Safety and alignment
  - Monitoring and robustness
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The alignment problem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Safety engineering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Complex systems and emergent risk
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Ethics and governance
  - Beneficial AI and machine ethics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Collective action and multi-agent dynamics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - AI governance
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 24: Multimodal AI
- Phase 1 Orientation
  - What multimodal AI is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Large multimodal models
  - LLMs to multimodal
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Native multimodal
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multimodal reasoning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Generative models
  - Diffusion transformers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Video generation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Advanced directions
  - JEPA and world models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - World models for science
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multimodal agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where it is going
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 25: AI Agent Teams
- Why split one AI into many
- How an agent fetches its own data
- The bull and the bear
- The trader
- The risk gate
- Orchestration and shared state
- Memory and reflection
- The capstone
Track 26: Agentic Systems
- Thinking like an architect
- CLAUDE.md at team scale
- Schemas that refuse to lie
- Tools other agents can trust
- Orchestration that survives contact
- Reliability is a design choice
- Agents in the pipeline
- The capstone: design, build, defend
Track 27: Generative AI
- What generative AI actually is
- Asking well: a good prompt
- Beyond the chat window
- Should AI do this task?
- AI on a real project
- The risk map
- Who owns AI's words and pictures?
- Will AI take my job?
- Lies at scale

Attention alternatives and MoE: cheatsheet

Standard attention’s two cost problems

Quadratic in sequence length: every token attends to every other; attention compute/memory grow with length squared.
KV cache dominates inference: cached keys/values of all prior tokens grow with length x heads; can exceed the weights at long context; memory-bandwidth-bound to read.

Attention alternatives

Variant	What it does	Effect
Multi-Query (MQA)	All heads share one key/value set	KV cache / head-count; some quality loss
Grouped-Query (GQA)	Heads in a few groups share key/value sets	KV cache shrinks severalfold; ~no quality loss; modern default
Sliding-window	Each token attends to a recent window	Cost linear (not quadratic) in length

(Sub-quadratic / state-space attention exists but is research; GQA + windowing are the practical levers.)

Mixture of experts (MoE)

Dense FFN:  every token -> the one FFN        (total params = compute params)
MoE FFN:    router picks top-k of many experts per token
            -> total params (capacity, MEMORY) decoupled from
               active params (per-token COMPUTE, the 6ND driver)

	Total params	Active params
Sets	Capacity + memory (all experts stored)	Per-token compute (the few that run)

Costs: all experts stored even if idle (trades compute for memory); router needs load balancing.
“47B total, 13B active” = MoE: dense-13B compute, dense-47B memory/capacity.

Resource-allocation view (lesson 2 terms)

Variation	Resource targeted
MQA / GQA	Memory + memory bandwidth (KV cache)
Sliding-window	Compute (quadratic -> linear in length)
MoE	Separates memory (total params) from compute (active params)

Neither changes the lesson-3 skeleton; each changes which resource you spend.

Words to use precisely

KV cache: stored keys/values of prior tokens, reused during generation; the main inference memory cost.
MQA / GQA: key/value sharing across all heads / per group.
MoE: many expert FFNs + a router running top-k per token.
Total vs active parameters: capacity/memory vs per-token compute.
Load balancing: keeping the router’s token assignment even across experts.

Source

Stanford CS336, Lecture 4 (Attention alternatives and mixture of experts), by Hashimoto and Liang. cs336.stanford.edu. Independent structural mirror in original prose; see references.