What deep learning adds: cheatsheet

The one idea that matters

deep learning = neural networks with MANY layers + the tricks that make depth trainable
this track    = a tour of what depth enables across different problem shapes

You already know the engine (a network is a function tuned by gradient descent). This is about the vehicles built around it.

Why it waited decades

Neural network ideas existed in the 1980s but underperformed until three problems were solved at once:

Blocker	Why it stalled deep nets
Training signal faded through many layers	Early layers barely learned
Too little labeled data	Hungry deep nets memorized instead of generalizing
Too little compute	Training large nets was impractically slow

The “AI winters” were mostly periods when neural nets were the wrong tool for the available data and hardware.

2012: the trio arrives together

AlexNet won the ImageNet contest by a wide margin. Three things combined:

Better algorithms (ReLU, dropout) kept signal flowing through depth.
More data (~1.2 million labeled ImageNet images).
More compute (GPUs train networks in parallel).

Scale since: AlexNet ~60M parameters / 8 layers → today’s largest models reach hundreds of billions. Same idea, more depth + data + compute.

Why depth helps (the XOR proof)

A checkerboard (XOR) layout cannot be split by one straight line. A network with no hidden layer can only draw that line, so it fails. One hidden layer can bend and combine lines to separate it. Each layer composes a new transformation; depth builds complex patterns from simple ones.

The track map

Problem shape	The idea	Phase
Sequences (text, audio, time)	Carry information across steps	1
Images	Look at local patches of pixels	2
Generation	Produce new examples, not just classify	2
Decisions (act for reward)	Reinforcement learning	3

Same engine; only the arrangement changes.

What deep learning is NOT

Not artificial general intelligence.
Not “a brain in a computer” (the word “neuron” is borrowed).
It is powerful pattern-matching from examples: strong on perception and generation, bounded by data hunger, brittleness, and no guarantees (Phase 3 names these).

Pitfalls to dodge

“Deep learning is separate from neural networks.” No. It is neural nets with many layers.
“2012 was a new idea.” No. The ideas were decades old; data + compute + tricks arrived.
“More layers is always better.” No. Depth fits some problems and brings its own training costs.
“Capable means understanding.” No. Fluent output is matched patterns, not comprehension.

The one-line version

Deep learning is the neural network you already understand, made deep, and finally given the data and compute to show what depth was always capable of.