References: From edges to objects

Source material

Source curriculum (structural mirror, cited as further study):
• MIT 6.S191, "Introduction to Deep Learning", Lecture 3: "Deep Computer Vision"
  Instructors: Alexander Amini and Ava Amini (MIT)
  Course page: https://introtodeeplearning.com
  Code and labs: https://github.com/aamini/introtodeeplearning
  License: MIT (slides, code, and labs); videos are YouTube standard
  Required attribution: "© Alexander Amini and Ava Amini, MIT 6.S191:
    Introduction to Deep Learning, IntroToDeepLearning.com"
This lesson mirrors the convolutional-network-architecture portion of Lecture 3
(the single-convolution idea is in lesson 4). Clawdemy's lessons are original
prose that follows the pedagogical arc of this course. We do not reproduce or
transcribe the lectures; we cite them as the recommended companion. Course
materials are used under their MIT license with the attribution above; all
rights to the original videos remain with the creators.

Watch this next

MIT 6.S191, Lecture 3: Deep Computer Vision by Alexander and Ava Amini. The lecture this lesson mirrors. The architecture portion walks through stacking convolutions and pooling into a full network and shows real recognition results. Pair it with this lesson for the moving version of the edges-to-objects climb.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

Feature Visualization by Chris Olah, Alexander Mordvintsev, and Ludwig Schubert (Distill). The single best thing to look at after this lesson. It shows actual images of what real network filters respond to, layer by layer, from simple edges up to complex parts, which is exactly this lesson’s hierarchy made real. It is also the honest counterweight: you can see for yourself that deeper features are richer but messier than tidy “eye detector” labels.
Stanford CS231n notes: Convolutional Neural Networks. The canonical course notes, here for the full architecture: how convolutional layers, pooling layers, and fully-connected layers stack into a complete network, with the dimensions worked out precisely.
The MIT 6.S191 software labs. The computer-vision lab lets you build and train a full convolutional network and watch its accuracy climb. MIT-licensed; the hands-on end of this two-lesson vision pair.

Adjacent topics

Where this connects inside the track.

How machines see: convolution (lesson 4). The previous lesson built the single convolution this one stacks. If “filter” or “feature map” feels shaky, that is where they come from.
Teaching machines to imagine (lesson 6). Every network so far has classified: taken something in and labeled it. The next lesson turns the direction around and asks whether a network can generate, producing a new image rather than judging an existing one. That is the start of the generative phase.