Cheatsheet: What deep learning adds
The one idea that matters
Section titled “The one idea that matters”deep learning = neural networks with MANY layers + the tricks that make depth trainablethis track = a tour of what depth enables across different problem shapesYou already know the engine (a network is a function tuned by gradient descent). This is about the vehicles built around it.
Why it waited decades
Section titled “Why it waited decades”Neural network ideas existed in the 1980s but underperformed until three problems were solved at once:
| Blocker | Why it stalled deep nets |
|---|---|
| Training signal faded through many layers | Early layers barely learned |
| Too little labeled data | Hungry deep nets memorized instead of generalizing |
| Too little compute | Training large nets was impractically slow |
The “AI winters” were mostly periods when neural nets were the wrong tool for the available data and hardware.
2012: the trio arrives together
Section titled “2012: the trio arrives together”AlexNet won the ImageNet contest by a wide margin. Three things combined:
- Better algorithms (ReLU, dropout) kept signal flowing through depth.
- More data (~1.2 million labeled ImageNet images).
- More compute (GPUs train networks in parallel).
Scale since: AlexNet ~60M parameters / 8 layers → today’s largest models reach hundreds of billions. Same idea, more depth + data + compute.
Why depth helps (the XOR proof)
Section titled “Why depth helps (the XOR proof)”A checkerboard (XOR) layout cannot be split by one straight line. A network with no hidden layer can only draw that line, so it fails. One hidden layer can bend and combine lines to separate it. Each layer composes a new transformation; depth builds complex patterns from simple ones.
The track map
Section titled “The track map”| Problem shape | The idea | Phase |
|---|---|---|
| Sequences (text, audio, time) | Carry information across steps | 1 |
| Images | Look at local patches of pixels | 2 |
| Generation | Produce new examples, not just classify | 2 |
| Decisions (act for reward) | Reinforcement learning | 3 |
Same engine; only the arrangement changes.
What deep learning is NOT
Section titled “What deep learning is NOT”- Not artificial general intelligence.
- Not “a brain in a computer” (the word “neuron” is borrowed).
- It is powerful pattern-matching from examples: strong on perception and generation, bounded by data hunger, brittleness, and no guarantees (Phase 3 names these).
Pitfalls to dodge
Section titled “Pitfalls to dodge”- “Deep learning is separate from neural networks.” No. It is neural nets with many layers.
- “2012 was a new idea.” No. The ideas were decades old; data + compute + tricks arrived.
- “More layers is always better.” No. Depth fits some problems and brings its own training costs.
- “Capable means understanding.” No. Fluent output is matched patterns, not comprehension.
The one-line version
Section titled “The one-line version”Deep learning is the neural network you already understand, made deep, and finally given the data and compute to show what depth was always capable of.