Practice: Where deep learning breaks

Self-check

Six short questions. Try to answer each one in your head (or on paper) before opening the collapsible. Active retrieval is where the learning sticks; rereading feels productive but does much less.

1. Why are these four limitations not “temporary bugs someone forgot to fix”?

Show answer

Each one falls directly out of how these systems work: they learn patterns from examples, and nothing more. Data hunger, brittleness, data-slant, and opacity are consequences of that approach. Research improves them over time, but they do not simply get patched away.

2. Why does deep learning typically need so much data, and what is the contrast with a child?

Show answer

Because learning patterns purely from examples is the only thing it does, so it needs many examples to generalize. A child can learn what a giraffe is from one picture book; a network may need thousands of giraffe images, and with only a few hundred it tends to memorize them rather than learn the general idea. This makes deep learning a poor fit where data is scarce or expensive to collect.

3. What is an adversarial example, and what does it reveal about how the network “sees”?

Show answer

Take an image classified correctly as a panda, add a tiny amount of carefully chosen noise (invisible to a human), and the network confidently calls it a gibbon. It reveals that the network was keying on fragile statistical patterns, not the concept of a panda. The same fragility shows up on out-of-distribution inputs (anything unlike the training data), and the model gives a confident answer rather than saying “I am unsure.”

4. Why does a model “inherit the slant of its data”? Frame it as a technical property.

Show answer

A model learns the patterns present in its training data, including the imbalances. If the data over-represents some cases and under-represents others, the model is simply more accurate on what it saw more of. It is a mirror of its data: a skewed sample produces uneven performance, working well on common cases and stumbling on rare ones. This is a property of learning from examples, not intent.

5. Why is a model’s confidence unreliable, and what is hallucination?

Show answer

A model’s behavior lives in millions of opaque weights; it produces an output but cannot certify it is correct, and it can be confidently wrong (the panda-gibbon flip). Confidence and correctness are different. Hallucination is the generative version: a text generator produces fluent, confident, fabricated output (a citation to a paper that does not exist) because it is producing what looks like its training data, not output checked against truth.

6. What is the practical rule for using these tools with clear eyes?

Show answer

Be skeptical of confident output, especially on novel or rare inputs; verify anything a generative model states as fact; and never read a model’s confidence as a guarantee. Used with clear eyes the tools are powerful; used as oracles they will eventually burn you.

Try it yourself: diagnose the failure, then reason about a fix

No math here. About 15 minutes of reasoning and writing.

Side effects: none. This is a thinking-and-writing exercise. No tools, no API calls, no costs.

Part A: name the limitation each failure illustrates.

For each situation, name which of the four limitations is at work (data hunger, brittleness, data-slant, or no-guarantees/opacity), and give a one-line reason.

An assistant cites a research paper, with authors and a year, that turns out not to exist.
An image classifier is rock-solid on ordinary photos but gives a confident, absurd label on a picture taken at a strange angle it never saw in training.
A speech recognizer trained mostly on one accent transcribes that accent well and noticeably worse on accents it rarely heard.
A model that worked great in a demo with thousands of examples does poorly on your niche task where you could only gather two hundred.

Show answer

No guarantees / opacity (hallucination). It produced fluent, confident, fabricated output rather than checked-against-truth output.
Brittleness. A confident answer on an out-of-distribution input unlike its training data.
Data-slant. More accurate on what it saw more of; uneven performance from an uneven training sample.
Data hunger. Too few examples to generalize, so it memorizes rather than learns the general idea.

Part B: “more data fixes bias.” True or false, and why?

Write two or three sentences. Be precise about what kind of data would actually help.

Show a model answer

False as stated. More data helps only if it is more representative data. A bigger but still-skewed sample reflects the same slant, so the model stays more accurate on the cases it saw more of. The issue is what the data contains, not merely how much of it there is.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page, ready to print or save as a PDF for offline review.

Q. What are the four limitations of deep learning named in this lesson?

Data hunger, brittleness, data-slant (bias), and the lack of guarantees (with opacity). All four follow from learning patterns from examples.

Q. Why are these limitations not just temporary bugs?

They are consequences of the approach: learning patterns purely from examples. Research improves them over time, but they are deep properties, not quick fixes.

Q. Why is deep learning hungry for data?

Learning patterns from examples is all it does, so it needs many examples to generalize. With too few, it memorizes them instead of learning the general idea.

Q. What is an adversarial example?

An input changed by a tiny, carefully chosen amount of noise (invisible to a human) that flips the network’s confident answer, revealing it keyed on fragile statistical patterns, not the concept.

Q. What is an out-of-distribution input, and how does a model handle it?

An input unlike the training data (an odd angle, an unseen case). The model usually gives a confident answer anyway rather than signaling uncertainty.

Q. Why does a model inherit the slant of its data? (Technical framing.)

It learns the patterns present in its training data, including the imbalances, so it is more accurate on what it saw more of. A skewed sample yields uneven performance. A property of learning from examples, not intent.

Q. Does more data fix data-slant?

Only if it is more representative data. A bigger but still-skewed sample reflects the same slant. The issue is what the data contains, not just how much.

Q. Why can't a model guarantee or fully explain its output?

Its behavior lives in millions of opaque weights tuned by gradient descent, not readable rules. It cannot certify correctness, its confidence is unreliable, and it is hard to say why it produced a given answer.

Q. What is hallucination, and why does it happen?

A generative model producing fluent, confident, fabricated output. It happens because the model produces what looks like its training data, not output checked against truth. Convincing and correct are different.

Q. What is the practical rule for using these tools?

Be skeptical of confident output (especially on novel or rare inputs), verify anything stated as fact, and never treat confidence as a guarantee. Powerful with clear eyes; dangerous as an oracle.