Practice: The handwritten-digit problem
Self-check
Section titled “Self-check”Six short questions. Answer each one in your head (or on paper) before opening the collapsible. Trying to retrieve the answer is where the learning sticks; rereading feels productive but does much less.
1. To a computer, what is a handwritten digit, before any recognition happens?
Show answer
A grid of pixels, where each pixel is just a number for how bright that little square is (0 for a fully black square, 1 for a fully white one). A common setup is 28 by 28, which is 784 brightness numbers and nothing else. There are no curves, loops, or shapes in there. Only the list of numbers.
2. Why does the same digit produce such different numbers each time it is written?
Show answer
Because the numbers are tied to exact pixel positions. Shift the digit a few squares, make it bigger, thinner, or more slanted, and every number in the list changes, even though your eye sees the same digit. The thing that is “the same” to you is, at the level of raw numbers, different every time.
3. The lesson says the hard part of digit recognition is not the seeing. So what is it?
Show answer
The specifying. Your eyes recognize the digit instantly; that part is easy. The hard part is putting into exact words what makes a 3 a 3 and not an 8 or a hurried 2. The moment you try to write that down as rules, the easy thing turns brutally hard.
4. What are the two reasons handwritten digits are an unusually good first problem to learn from? (Name any two.)
Show answer
Any two of: the input is small and fixed (784 numbers), the output is small (only ten possible answers, 0 through 9), it is genuinely hard yet clearly solvable (rule-writing fails but a six-year-old reads digits effortlessly), and the approach travels (faces, medical scans, photo sorting are the same shape of problem: numbers in, a label out).
5. State the paradigm shift at the heart of this lesson in one sentence.
Show answer
Stop trying to write the rules for the answer, and instead show the system many labeled examples and let it find the pattern itself. You move from being the author of the answer to being the curator of the examples.
6. Someone says: “Modern AI is impressive because engineers wrote millions of detailed rules to cover every case.” What is wrong with that?
Show answer
It is backwards. The whole point of the shift is that nobody wrote the rules. These systems were shown enormous numbers of labeled examples and learned the patterns themselves. That is also why they are uncannily good at fuzzy human tasks (which we could never have written clean rules for) and oddly brittle at the edges (because they only ever knew examples, never a rule).
Try it yourself, part 1: write a rule and watch it break
Section titled “Try it yourself, part 1: write a rule and watch it break”This is the lesson’s central experience, done with your own hands. About 10 minutes, pen and paper. No tools, no computer, no cost.
Setup. You are going to write a rule that distinguishes a handwritten 1 from a handwritten 7. Just those two. It sounds almost too easy.
Step 1. Before reading further, write down the most complete rule you can for telling a 1 from a 7. Be specific enough that someone could follow it without seeing your examples.
Step 2. Now test your rule against these five real-world variations. For each one, decide honestly whether your rule gets it right:
- A 7 written with a short horizontal crossbar through the stem (common in Europe).
- A 1 written as a single vertical stroke, no serif, no base.
- A 1 written with a long upward flag at the top, so it has a diagonal stroke much like a 7’s top.
- A 7 written fast, with the top stroke barely longer than the down-stroke.
- A 1 with a full serif at the top and a base at the bottom, so it has three strokes.
Step 3. Count how many of the five your original rule handles cleanly. Then try to patch it so it covers all five.
What you should notice
Almost everyone’s first rule (something like “a 7 has a horizontal top stroke, a 1 is just a vertical line”) misses at least two of the five. The crossbarred 7 and the flagged 1 are the usual troublemakers, because the flagged 1 has a diagonal that looks like a 7’s top, and the serifed 1 has a horizontal-ish stroke too.
When you patch the rule, watch what happens: each patch (“ignore short flags,” “a crossbar counts only if it crosses the stem”) invites a new variation you did not plan for. You can feel the rules starting to pile up without ever quite closing the gap. That feeling, one patch away from done and never actually done, is exactly the wall the lesson is about. It is not a failure of your effort. It is a sign that rule-writing is the wrong tool for this kind of problem.
Try it yourself, part 2: sort the tasks
Section titled “Try it yourself, part 2: sort the tasks”Here are six tasks. Sort each into one of two buckets: a crisp rule would work well, or you would be better off showing examples. About 5 minutes.
- Compute the sales tax on a purchase, given the price and the tax rate.
- Decide whether a photo contains a dog.
- Put a list of customer names into alphabetical order.
- Tell whether a short product review is sarcastic.
- Convert a temperature from Celsius to Fahrenheit.
- Recognize a song from someone humming a few seconds of it.
Show answer
A crisp rule works well: 1 (sales tax is one multiplication), 3 (alphabetizing is a fully specified ordering), 5 (Celsius to Fahrenheit is one formula). These have answers you can write down exactly, the same way every time.
Better to show examples: 2 (dog-or-not), 4 (sarcasm), 6 (song-from-a-hum). These are fuzzy, high-variation, human-pattern tasks. Nobody can write a clean rule for what makes a photo “contain a dog” or a sentence “sarcastic,” for exactly the reason the digit lesson showed: the variation is endless and the pattern lives in the examples, not in any rule you could state.
The skill this drill builds is the discernment itself. Once you can feel which bucket a problem belongs in, you understand the most important thing this lesson teaches: which problems neural networks are for.
Flashcards
Section titled “Flashcards”Ten cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.
Q. To a computer, what is a handwritten digit image?
A grid of pixels, each one a number for how bright that square is. A 28 by 28 image is 784 brightness numbers, with no shapes or curves anywhere inside.
Q. Why does the same digit produce different numbers each time?
The numbers are tied to exact pixel positions. Shifting, resizing, or slanting the digit changes every number in the list, even though your eye sees the same digit.
Q. In digit recognition, what is the easy part and what is the hard part?
The seeing is easy; your eyes recognize the digit instantly. The hard part is the specifying: putting into exact words what makes a shape that digit and not another.
Q. Why does rule-writing fail for handwritten digits?
Any rule fits tidy digits and misses real ones. Each patch creates a new edge case. Real handwriting has endless variation, so a finite list of rules never closes the gap.
Q. What is the paradigm shift at the heart of this lesson?
Stop writing rules, start showing labeled examples. You move from being the author of the answer to being the curator of the examples, and let the system find the pattern.
Q. Name two reasons handwritten digits are a good first problem.
Any two: the input is small and fixed (784 numbers), the output is small (ten answers), it is hard but clearly solvable, and the approach generalizes to faces, scans, and photo sorting.
Q. What function are we ultimately after in this problem?
A function that takes 784 brightness numbers in and gives back 10 scores out, one per digit, with the highest score being the answer, built from examples rather than written by hand.
Q. What does a 'label' mean in learning from examples?
The known correct answer attached to a training example, such as “this image is a 3.” Labeled examples are what the system learns the pattern from.
Q. Why is example-trained AI strong at fuzzy tasks but brittle at the edges?
It is strong on fuzzy human tasks because those resist clean rules anyway and live in the examples. It is brittle on inputs unlike anything it was shown, because it only ever knew examples, never a stated rule.
Q. What kinds of problems are NOT a good fit for learning from examples?
Tasks with a crisp, fully specifiable rule, such as computing sales tax, alphabetizing names, or converting units. A formula does the job; examples are overkill.