| Concept | One line |
|---|
| Image (to a computer) | A grid of numbers: 3 values (R, G, B), each 0 to 255, per pixel |
| Semantic gap | The distance from raw pixel numbers to the meaning a human reads instantly |
| Core task | Image classification: numbers in, a label out |
| What fails | Hand-written rules (every rule meets an image that breaks it) |
| What works | Data-driven approach: learn patterns from labeled examples |
| Challenge | What changes the pixels |
|---|
| Viewpoint | Different camera angles share almost no pixel values |
| Scale | Object near vs far: different count and position of pixels |
| Deformation | Non-rigid objects take endless shapes |
| Occlusion | Most of the object may be hidden |
| Illumination | Lighting swings every value up or down |
| Background clutter | Object blends into a busy scene |
| Intra-class variation | One category (e.g. “cat”) spans many looks |
Constant through all of them: the label. Variable through all of them: the numbers.
| Step | Action |
|---|
| 1. Collect | Gather a large dataset of labeled images |
| 2. Train | Adjust the model until predictions match the labels |
| 3. Predict and evaluate | Test on unseen images; measure accuracy on the unseen |
Standard that matters: accuracy on images the model never saw (generalization), not accuracy on the training set (which can be memorization).
| Image | Values |
|---|
| 28 x 28 grayscale (a digit) | 784 |
| 224 x 224 color | 150,528 |
| 1000 x 1000 color | 3,000,000 |
Channels: grayscale = 1, color = 3.
| Pitfall | Reality |
|---|
| The camera is the hard part | Capturing pixels is solved; the semantic gap is the hard part |
| More rules would work | Images vary without bound; a finite rule list never closes the gap |
| The model “sees” like you | It starts with numbers and computes a pattern match, not understanding |
| High accuracy = understanding | It found patterns that separate labels; that is useful, not comprehension |
A machine starts with a grid of numbers, not a cat, and earns its way to the label, learned from examples rather than told by a rule.