From edges to objects
What you’ll learn
Section titled “What you’ll learn”This lesson closes Track 12’s vision pair by answering the question the last one set up: one filter finds an edge, but how does a network climb from edges all the way to “this is a cat”? The answer is depth, stacking convolutions into a hierarchy. The source curriculum is MIT 6.S191, Lecture 3, by Alexander and Ava Amini, freely available at introtodeeplearning.com.
You will see how each layer runs filters on the previous layer’s feature maps, so edges combine into corners and textures, then parts, then whole objects; understand the receptive field (why small filters reach across larger regions as depth grows); meet pooling (the zoom-out step, with no learned weights of its own); see how a fully-connected classifier reads the final features into an answer; and learn what convolutional networks are actually used for.
Where this fits
Section titled “Where this fits”This is lesson 5 of 10, closing Phase 2’s vision pair. It builds directly on the previous lesson’s single convolution, so that lesson is the prerequisite. The next lesson turns the arrow around from recognition to generation, opening the generative half of the phase.
Before you start
Section titled “Before you start”Prerequisites: lesson 4 of this track (the single convolution, feature maps, weight-sharing), which this lesson stacks into a hierarchy. The neural-network basics from the previous track are assumed, especially the fully-connected layer, which reappears here as the classifier on top.
About the math
Section titled “About the math”Light and concrete. The only arithmetic is a max-pooling exercise (take the largest value in each small region), plus reasoning about the convolution hierarchy. No calculus or formulas; the practice section has you pool a small feature map by hand and trace a recognition hierarchy.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Explain how stacking convolutions builds a hierarchy from edges to parts to whole objects
- Explain the receptive field (why small filters reach across larger regions as depth grows)
- Describe what pooling does (shrink and tolerate small shifts) and that it has no learned weights
- Describe how a fully-connected classifier turns the final features into an answer, and name what CNNs are used for
Time and difficulty
Section titled “Time and difficulty”- Read time: about 9 minutes
- Practice time: about 10 minutes (a by-hand max-pooling exercise and a hierarchy trace, plus flashcards)
- Difficulty: standard (one small by-hand calculation; otherwise conceptual)