Skip to content

Start here
Track 1: Getting Started
- 1.1 AI won't replace you
- 1.2 Your first conversation
- 1.3 API keys and the OAuth path
- 1.6 How Clawless remembers
- 1.8 CostGuard and your data
Track 4: Visual Math: Linear Algebra
- Phase 1 Geometric foundations
  - What vectors are
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Spans and basis
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear transformations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Matrix multiplication
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stepping up to 3D
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Geometry of operations
  - The determinant
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Inverses and null space
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Matrices between dimensions
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Dot products
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Advanced perspectives
  - Cross products (2D)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - 3D cross product
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cramer's rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Change of basis
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Eigenvectors
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Abstract vector spaces
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 5: AI Foundations
- Phase 1 How models read text
  - Tokens
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Embeddings
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - How models know word order
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How models think
  - Attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-head attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The transformer block
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Position embeddings + RoPE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stability: LayerNorm and RMSNorm
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Efficiency tricks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Encoder-decoder: T5
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT, part one: architecture
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT, part two: pretraining and fine-tuning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - BERT family: DistilBERT and RoBERTa
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 How models are trained at scale
  - Pretraining
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Scaling laws and Chinchilla
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Parallelism and Flash Attention
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Quantization and mixed precision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 How models learn to be helpful
  - Instruction tuning, RLHF
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Preference data, reward models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Aligning models: RLHF and DPO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 5 How we steer models at inference
  - Decoding strategies
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompting
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Few-shot and in-context learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Chain of thought: thinking out loud
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 6 How models reason and act
  - How reasoning models think
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - RAG
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Function calling and tools
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agent loops: observe, plan, act
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 7 How we judge models and where they're going
  - Evaluation: LLM-as-a-Judge
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why benchmarks can mislead
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why tool-using models fail
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Transformers beyond text
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - New ways to generate
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where to be careful: a safety lens
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 6: Privacy & Local-First AI
- Phase 1 Why your privacy matters when you use AI
  - Why your worry is rational
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Your starting point
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 What an AI tool sees when you use it
  - What happens in three seconds
    
    Overview
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 7: Git Workflow
- Phase 1 Foundations
  - Why git exists
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Your first repo
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Commit hygiene
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Undoing things
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Branching and collaboration
  - Branches
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Pull requests
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Merge conflicts
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Remotes and forks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Workflows in the wild
  - Team workflows
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Releases and tags
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cherry-pick and stash
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Rebase, deeper
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Multi-agent teams
  - Worktrees and parallel agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-agent integration patterns
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - AI-authored commits and PRs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The future of git
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 8: Visual Math: Calculus
- Phase 1 What a derivative is
  - What calculus is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The derivative as a rate
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The power rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Trig derivatives
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 The differentiation toolkit
  - The product rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The chain rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why e is special
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Implicit differentiation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Limits, carefully
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Integration and approximation
  - Integration & the FTC
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why area equals slope
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Higher-order derivatives
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Taylor series
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 9: Statistics & Probability for AI
- Phase 1 Describing data
  - Why AI runs on stats
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Center and spread
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The shape of data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Correlation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 The laws of chance
  - Probability foundations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Conditional probability
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Bayes' theorem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Random variables and distributions
  - Expected value
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The normal distribution
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The binomial distribution
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 From sample to truth
  - Sampling and the CLT
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Confidence intervals
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Hypothesis testing
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Stats in ML
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 10: Classical Machine Learning
- Phase 1 Learning from data
  - What ML actually is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear regression
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Gradient descent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Classification and ensembles
  - Logistic regression
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Decision trees
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Random forests
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Boosting
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Support vector machines
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Unsupervised learning
  - K-means clustering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Hierarchical clustering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - PCA
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - t-SNE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Model evaluation
  - Bias-variance tradeoff
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Cross-validation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Classification metrics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 11: Neural Network Intuition
- Phase 1 Network structure
  - Handwritten digits
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Neurons and layers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Weights and the squish
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The whole network
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How networks learn
  - What learning means
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The cost landscape
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Gradient descent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Backpropagation
  - What backprop does
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Backprop and chain rule
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Seeing it whole
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 12: Intro to Deep Learning
- Phase 1 Foundations and sequences
  - What deep learning adds
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Why sequences need memory
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Attention and transformers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Vision and generation
  - How machines see
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - From edges to objects
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Teaching machines to imagine
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Generating by denoising
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Decisions and limits
  - Learning by trial and reward
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where deep learning breaks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Seeing the field whole
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 13: Build Neural Networks from Scratch
- Phase 1 The autograd engine
  - Building an autograd engine
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Training a neural net
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Building a language model
  - The bigram model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - An MLP language model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Activations and gradients
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Backprop by hand
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - A hierarchical (WaveNet) model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Building a transformer
  - Self-attention from scratch
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Assembling the full GPT
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Building the tokenizer
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 14: Practical Transformers
- Phase 1 Transformers library
  - What transformers do
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Run a model
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Fine-tune on your data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Share on the Hub
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Data, tokenizers, tasks
  - Wrangle data
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Tokenizers up close
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Main NLP tasks
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Debug and get unstuck
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Demos and frontier
  - Build a demo
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Fine-tuning LLMs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Curate datasets
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Reasoning frontier
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 15: Build an LLM from Scratch
- Phase 1 The model
  - From scratch + tokenizer
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Counting the cost
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The architecture
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Attention + MoE
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Systems and efficiency
  - GPUs and TPUs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Kernels (Triton, XLA)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Parallelism
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Scale, data, alignment
  - Scaling laws
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Evaluation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Data sources
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Data filtering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Post-training (SFT, RLHF)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Reasoning RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 16: Computer Vision
- Phase 1 Foundations for vision
  - Why seeing is hard
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Linear classifiers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Loss + optimization
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - NNs + backprop
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 How machines see
  - Convolution + CNNs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - CNN architectures
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Sequence tools
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Detection + segmentation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Video understanding
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Generating + grounding vision
  - Self-supervised vision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - GANs + VAEs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - 3D vision
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Vision and language
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - World modeling
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Human-centered AI
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 17: RL Foundations
- Phase 1 The RL setup
  - What RL actually is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Markov decision processes
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Value + Bellman
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Planning with a known model
  - Policy iteration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Value iteration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Model-free learning
  - Monte Carlo prediction
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Temporal-difference learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Q-learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Scaling up
  - Function approx + deep RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Policy gradient + modern RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 18: Deep Reinforcement Learning
- Phase 1 RL foundations
  - Introduction to deep RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Imitation learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - RL fundamentals
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Policy gradients
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Actor-critic
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Core deep-RL algorithms
  - Value-based RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - DQN
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - PPO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Model-based learning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Planning with models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Variational inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Control as inference
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 RL frontiers
  - RLHF
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Offline RL: the problem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Offline RL: algorithms
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Exploration
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-task and meta-RL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Challenges and open problems
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 19: Generative Models and Diffusion
- Phase 1 Generative foundations
  - What a generative model is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Autoregressive models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Maximum likelihood and KL
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Normalizing flows
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Latent and adversarial
  - Latent variables and the ELBO
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - VAE training
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - GANs: the minimax game
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - WGAN gradient penalty
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Evaluating generative models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Energy, score, diffusion
  - Energy-based models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Score matching
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion I
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Diffusion II
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Score-based diffusion (SDEs)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The four-paradigm landscape
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 20: AI Agents and Tool Use
- Phase 1 What agents are
  - What makes an agent
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Tool use
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Choosing a framework
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Agent design patterns
  - Tool definitions in depth
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Giving agents memory
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agentic RAG
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Planning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multi-agent systems
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Self-checking (metacognition)
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Production agents
  - Building trustworthy agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Securing agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 21: LLM Ops and Production
- Phase 1 Foundations and first app
  - Launch an LLM app
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - LLM foundations
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompt engineering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Building production apps
  - Augmented LLMs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Project walkthrough
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - UX for LUIs
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - LLMOps
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Advanced and the field
  - What's next
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Training your own LLM
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Industry perspective
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 22: Building with Claude
- Phase 1 API foundations
  - Your first Claude API call
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The Messages API in production
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Choosing your model and the effort dial
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Augmentation patterns
  - Tool use, the foundation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Server-side tools and built-ins
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Model Context Protocol
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Prompt caching and context management
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Agent patterns
  - From single call to agent loop
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Six effective-agent patterns
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Agent Skills and Claude Code
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Subagents and Claude Managed Agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Production
  - Shipping a Claude application
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 23: AI Safety and Alignment
- Phase 1 The risks landscape
  - AI safety as a field
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The four catastrophic risk categories
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Safety and alignment
  - Monitoring and robustness
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - The alignment problem
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Safety engineering
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Complex systems and emergent risk
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Ethics and governance
  - Beneficial AI and machine ethics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Collective action and multi-agent dynamics
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - AI governance
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 24: Multimodal AI
- Phase 1 Orientation
  - What multimodal AI is
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 2 Large multimodal models
  - LLMs to multimodal
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Native multimodal
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multimodal reasoning
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 3 Generative models
  - Diffusion transformers
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Video generation
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
- Phase 4 Advanced directions
  - JEPA and world models
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - World models for science
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Multimodal agents
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
  - Where it is going
    
    Brief
    Lesson
    Practice
    Summary
    Cheatsheet
    References
Track 25: AI Agent Teams
- Why split one AI into many
- How an agent fetches its own data
- The bull and the bear
- The trader
- The risk gate
- Orchestration and shared state
- Memory and reflection
- The capstone
Track 26: Agentic Systems
- Thinking like an architect
- CLAUDE.md at team scale
- Schemas that refuse to lie
- Tools other agents can trust
- Orchestration that survives contact
- Reliability is a design choice
- Agents in the pipeline
- The capstone: design, build, defend
Track 27: Generative AI
- What generative AI actually is
- Asking well: a good prompt
- Beyond the chat window
- Should AI do this task?
- AI on a real project
- The risk map
- Who owns AI's words and pictures?
- Will AI take my job?
- Lies at scale

BERT pretraining and fine-tuning, in brief

What you’ll learn

This is lesson 9 of Phase 2, How models think: the transformer architecture, in Track 5 (AI Foundations). The previous lesson covered BERT’s architecture (encoder-only, bidirectional, structural tokens, three additive embeddings). This lesson covers what BERT was trained to do. Bidirectional self-attention means every token can see every other token, which makes next-token prediction trivial. So BERT used masked language model (MLM) and next sentence prediction (NSP) as pretraining objectives, plus a second stage of fine-tuning that adapts the pre-trained encoder to a specific labeled task. The lesson walks both objectives in detail (including why MLM uses an 80/10/10 mix), the two-stage workflow, and the two common fine-tuning patterns.

Where this fits

This is lesson 9 of Phase 2, How models think: the transformer architecture. BERT is a single mental object split across two consecutive lessons; this is the second one. The previous lesson (BERT, part one: the bidirectional encoder and its structural tokens) covered the architecture. The next lesson, BERT derivatives: DistilBERT and RoBERTa, closes Phase 2 by showing how two follow-up papers compressed BERT (DistilBERT) and improved its training recipe (RoBERTa).

Before you start

Prerequisites: the BERT architecture lesson is required. We assume you understand what bidirectional self-attention means, what the structural tokens (CLS, SEP) do, and how the three additive embeddings shape the input.

By the end, you’ll be able to

Explain why bidirectionality forced BERT to use different pretraining objectives than next-token prediction
Walk through MLM with its 80/10/10 masking mix and explain why the mix is not arbitrary
Walk through NSP and the role of the CLS-head classifier on top of the bidirectional encoder
Describe the two-stage train-then-fine-tune workflow and pick the right fine-tuning head (CLS for whole-input classification, per-token for span detection and named-entity recognition)

Time and difficulty

Read time: about 13 minutes
Practice time: about 12 minutes (a fine-tuning pattern matching exercise across five task scenarios plus a walked training-loop trace through MLM and a sentiment fine-tune on the same input)
Difficulty: standard