Convolution: cheatsheet

The one idea that matters

convolution = slide a small filter (grid of weights) across the image
              at each position: local weighted sum = "is my pattern here?"
the same filter is reused everywhere (weight-sharing)

Why fully-connected layers are wrong for images

Problem	Why
Too many parameters	784 → 784 fully connected = ~614,000 weights, on a tiny image; real photos explode this
Ignores locality	Treats neighboring pixels as unrelated, but shapes ARE local arrangements
No translation invariance	A pattern learned in one spot must be relearned everywhere else

The convolution operation

A filter (kernel), e.g. 3x3, slides over the image. At each spot: multiply each pixel by the matching weight, sum to one number = the response there. High response = the filter’s pattern is present.

Worked: an edge detector

Filter (vertical edge, bright-right):

-1  0  +1
-1  0  +1
-1  0  +1

Patch	Each row	Total	Reading
edge: `0 0 1 / 0 0 1 / 0 0 1`	0·-1 + 0·0 + 1·1 = 1	3	edge detected
flat: `1 1 1 / 1 1 1 / 1 1 1`	1·-1 + 1·0 + 1·1 = 0	0	no edge

Lights up on its pattern, quiet otherwise.

The same filter weights are reused at every position (like recurrence reusing weights at every step). Two payoffs:

Far fewer parameters: one 3x3 filter = 9 numbers, regardless of image size (vs ~614,000).
Translation invariance: the pattern is found wherever it appears, learned once.

Feature maps and many filters

Sliding one filter produces a feature map: a grid marking where its pattern was found.
A layer uses many filters, each making its own map; the output is the stack of maps.
Filters are learned, not hand-set, by the same gradient descent + backprop from the neural-network track. (The hand-picked edge filter was just for illustration.)
Stacking layers (next lesson) builds edges → parts → objects.

Pitfalls to dodge

“Convolution sees the whole image at once.” No. It looks at one small patch at a time and slides.
“Each position has its own weights.” No. One filter’s weights are shared across all positions.
“A filter is part of the picture.” No. It is a small grid of learned weights, a pattern-detector.
“One filter is enough.” No. Many filters per layer, many layers stacked.

Words to use precisely

Filter / kernel: a small grid of weights that detects one local pattern.
Convolution: sliding a filter across the image, computing a local weighted sum at each position.
Feature map: the grid of responses produced by sliding one filter.
Weight-sharing: reusing one filter’s weights at every position (fewer params + translation invariance).

The one-line version

A convolution slides a small, shared pattern-detector across an image, so it can find a feature anywhere with almost no extra cost.