Skip to content

Cheatsheet: What deep learning adds

deep learning = neural networks with MANY layers + the tricks that make depth trainable
this track = a tour of what depth enables across different problem shapes

You already know the engine (a network is a function tuned by gradient descent). This is about the vehicles built around it.

Neural network ideas existed in the 1980s but underperformed until three problems were solved at once:

BlockerWhy it stalled deep nets
Training signal faded through many layersEarly layers barely learned
Too little labeled dataHungry deep nets memorized instead of generalizing
Too little computeTraining large nets was impractically slow

The “AI winters” were mostly periods when neural nets were the wrong tool for the available data and hardware.

AlexNet won the ImageNet contest by a wide margin. Three things combined:

  • Better algorithms (ReLU, dropout) kept signal flowing through depth.
  • More data (~1.2 million labeled ImageNet images).
  • More compute (GPUs train networks in parallel).

Scale since: AlexNet ~60M parameters / 8 layers → today’s largest models reach hundreds of billions. Same idea, more depth + data + compute.

A checkerboard (XOR) layout cannot be split by one straight line. A network with no hidden layer can only draw that line, so it fails. One hidden layer can bend and combine lines to separate it. Each layer composes a new transformation; depth builds complex patterns from simple ones.

Problem shapeThe ideaPhase
Sequences (text, audio, time)Carry information across steps1
ImagesLook at local patches of pixels2
GenerationProduce new examples, not just classify2
Decisions (act for reward)Reinforcement learning3

Same engine; only the arrangement changes.

  • Not artificial general intelligence.
  • Not “a brain in a computer” (the word “neuron” is borrowed).
  • It is powerful pattern-matching from examples: strong on perception and generation, bounded by data hunger, brittleness, and no guarantees (Phase 3 names these).
  • “Deep learning is separate from neural networks.” No. It is neural nets with many layers.
  • “2012 was a new idea.” No. The ideas were decades old; data + compute + tricks arrived.
  • “More layers is always better.” No. Depth fits some problems and brings its own training costs.
  • “Capable means understanding.” No. Fluent output is matched patterns, not comprehension.

Deep learning is the neural network you already understand, made deep, and finally given the data and compute to show what depth was always capable of.