Lesson: The handwritten-digit problem
Look at a few handwritten threes. One is round and careful. One is sharp and slanted. One was dashed off so fast the top loop barely closes. Your eyes read all of them as “3” before you have even finished noticing they are different shapes. You did not think about it. You did not run down a checklist. The answer just arrived.
Now try to write down what you did. Not the answer, the method. What exactly makes a shape a 3 and not an 8, or a 5, or a hurried 2? The moment you try to put it into words, the easy thing turns surprisingly hard. That gap, between how effortlessly you recognize a digit and how impossible it is to explain the recognition, is the whole reason neural networks exist. This lesson is about why that gap is the right place to start.
Why a computer finds this so hard
Section titled “Why a computer finds this so hard”To you, a handwritten digit is a shape. To a computer, it is not a shape at all. It is a grid of pixels, and each pixel is just a number saying how bright that little square is, from 0 for a fully black square up to 1 for a fully white one. A common setup uses a 28 by 28 grid, which is 784 little brightness numbers and nothing more. There is no “curve” and no “loop” in there anywhere. There is only a long list of numbers.
Here is what makes that brutal. Your 3 and my 3 land on completely different pixels. Shift the digit a few squares to the right and every single number in the list changes, even though the digit is obviously the same. Make it bigger, thinner, more slanted, and the list changes again. The thing that is “the same” to your eye is, at the level of raw numbers, wildly different every time.
The natural instinct, especially if you write code, is to reach for rules. So let us try. Here is one honest attempt at a rule for the digit 3:
A 3 has two rounded bumps on its right side, stacked one above the other, and an open left side.
For a tidy, upright, textbook 3, that works. Now meet three real handwritten threes.
The slanted one has its bumps off to the side, not cleanly stacked. The fast one has a lower “bump” that is really just a straight flick. The careful one has a top that is more flat than round. Your rule, which felt reasonable thirty seconds ago, already misses three out of three real examples.
You could patch it. Add a clause for slant. Add a clause for flat tops. Add a clause for flicks. But every patch invites a new digit you did not plan for, and the rules pile up without ever quite covering reality. You would be writing rules forever and still run into a 3 that breaks them. This is not a failure of effort. It is a sign you are using the wrong tool.
Why this is the right problem to learn from
Section titled “Why this is the right problem to learn from”If handwritten digits are so awkward, why is this the problem that nearly every introduction to neural networks opens with? Because it sits in an unusually useful sweet spot.
- The input is small and fixed. Every image is 784 numbers. Not a paragraph, not a video, just a tidy, predictable list. That keeps the problem small enough to reason about.
- The output is small too. There are only ten possible answers, the digits 0 through 9. The computer is not writing an essay; it is picking one of ten boxes.
- It is genuinely hard, but clearly solvable. Rule-writing falls apart, yet a six-year-old reads these digits without breaking stride. When something is effortless for a human but resists every obvious rule, that is a strong hint that a smarter approach exists and is worth finding.
- The approach travels. Reading a digit, recognizing a face, spotting a tumor on a scan, sorting a photo by what is in it: under the hood, these are the same shape of problem. Numbers in, a label out. Crack handwritten digits and you have a template that scales to all of them.
So the digit problem is not the point. It is the smallest honest example of a much larger pattern, which is exactly what you want when you are learning the idea for the first time.
The shift: stop writing rules, start showing examples
Section titled “The shift: stop writing rules, start showing examples”Here is the move that changes everything, and it is more of an attitude than a technique.
Instead of trying to tell the computer what a 3 is, you show it. You gather thousands of images that people have already labeled, this one is a 3, this one is a 7, this one is a 0, and you hand the computer the examples instead of the rules. Then you let it find the pattern on its own. You stop being the author of the answer and become the curator of the examples.
| Rule-based programming | Learning from examples |
|---|---|
| A human writes the logic for every case | A human provides labeled examples |
| Breaks on the first case nobody anticipated | Improves as it sees more examples |
| You describe the answer | You demonstrate the answer |
It helps to name what we are actually after. We want a function: something that takes those 784 brightness numbers in and gives back ten numbers out, one score per possible digit, with the highest score being the answer. The twist is that we are not going to write that function by hand. We are going to let the computer build it from the labeled examples.
What is inside that function, how it is structured, and how the computer actually shapes it from examples, is the work of the next several lessons. For now, hold on to just the reframe: we moved from “describe the answer” to “show examples and learn the answer.” That single shift is the door into everything else in this track.
Why this matters when you use AI
Section titled “Why this matters when you use AI”Almost every AI tool you have touched, the chat assistants, the photo search, the voice transcription, the spam filter quietly working in the background, is built on this same idea. Not one of them is a giant pile of rules a person sat down and wrote. They are all, underneath, systems that were shown enormous numbers of examples and learned the patterns themselves.
That one fact explains a lot of what feels strange about modern AI. It is uncannily good at fuzzy, human things, like telling a cat from a dog or catching the tone of a sentence, precisely because those are things we could never have written clean rules for anyway. And it can be oddly brittle at the edges, confidently wrong on an example unlike anything it was shown, because it only ever knew the examples, never a rule. Once you see that these systems learned from examples rather than followed instructions, their strengths and their blind spots stop being mysterious and start making sense.
Common pitfalls
Section titled “Common pitfalls”The mistakes here are not technical, because there is no technique yet. They are misconceptions about the framing itself.
Thinking modern AI is a huge list of human-written rules. It is the opposite. The whole point of the shift is that nobody wrote the rules for recognizing a 3. The system found the pattern from examples.
Thinking the hard part is the seeing. The seeing is the easy part; your eyes do it instantly. The hard part is the specifying, putting into exact words what makes a 3 a 3. That is what defeats the rule-writer.
Thinking “just write more rules” would eventually work. It feels like you are one clause away from a complete rule, always. You are not. Real handwriting has endless variation, and a finite list of rules will never close the gap.
Underestimating what a pile of labeled examples can do. It is tempting to assume examples alone could not possibly be enough and that real intelligence must need hand-coded knowledge. The surprising lesson of the field is how far examples alone can take you.
What you should remember
Section titled “What you should remember”- Recognizing a handwritten digit is effortless for you and brutally hard to write as rules. That gap between doing and explaining is the reason neural networks exist.
- To a computer, an image is just a list of brightness numbers (often 784 of them for a 28 by 28 image), with no shapes or curves anywhere inside, and the same digit lands on wildly different numbers each time.
- Handwritten digits are the classic first problem because the input and output are small, the task is genuinely hard but clearly solvable, and the approach scales to faces, scans, and far beyond.
- The paradigm shift is the whole point: stop writing rules, start showing labeled examples, and let the system find the pattern. We are after a function from 784 numbers to 10, built from examples rather than written by hand.
Modern AI exists because we stopped writing rules and started showing examples.
Next: the cheatsheet puts this opener on one page, and the references link Grant Sanderson’s video if you want to watch the idea unfold. Then lesson 2 cracks open that function from 784 numbers to 10 and shows what is actually inside it.