Summary: How few-shot examples teach in context

The model is frozen at inference, but the prompt is not. You cannot retrain a model by typing at it. You can shape its immediate behavior by putting task descriptions and worked examples in the input. The model reads them, picks up on the pattern, and continues the pattern when it gets to your real query. The term for this is in-context learning, and the “learning” is overloaded: nothing about the model has actually changed.

Three vocabulary terms. Zero-shot is no examples (just ask). One-shot is one demonstration. Few-shot is multiple. The conceptually important shift is from zero (no examples) to nonzero (some examples).

Why few-shot works. Pretrained LLMs internalize a huge variety of patterns during training. Examples in the prompt help the model select which of those patterns you want to invoke, and in what format. The examples do not teach the model new facts; they cue existing capabilities.

When examples help and when they don’t. Few-shot is reliable when your examples convey a stable format or disambiguate categories that zero-shot is shaky on. Few-shot cannot create knowledge the model does not already have, cannot rescue tasks far outside its training distribution, and is sometimes outperformed by an explicit instruction on complex reasoning tasks.

This summary is the scan-it-in-five-minutes version. The full lesson covers the practical recipe for writing few-shot prompts, the format-versus-rule heuristic for choosing between examples and instructions, and the common pitfalls.

Core ideas

In-context learning is using the prompt to shape immediate behavior. Weights do not change. The effect lasts only for this inference call.
Zero-shot, one-shot, few-shot. Zero is no examples. One and few are increasing numbers of demonstrations.
Why it works. Pretrained models have absorbed many task patterns. Examples cue which pattern to use and what format the output should take.
Three to five examples is the sweet spot for most tasks. One is unstable; ten is rarely better than five and may overfit the model to a narrow pattern.
Examples should be diverse, representative, and consistent in format. Cover the range of expected outputs. Vary irrelevant dimensions to avoid the model latching onto an accidental pattern.
Format-versus-rule heuristic. If examples are conveying a format, few-shot is the right tool. If they are conveying a rule the model has to infer, an explicit instruction may serve better.
Modern reasoning models change the calculus. Detailed natural-language instructions can sometimes outperform pure few-shot on hard reasoning tasks. The literature is still developing.
Limits of few-shot. Cannot create knowledge the model does not have. Cannot rescue tasks far outside training distribution. Cannot replace product-design choices about what the model should be doing.
Pitfall: thinking the model “learned.” It did not. Same model, same weights, different context.
Pitfall: more examples are always better. They are not. Past three to five, returns diminish; past ten, you may confuse the model.

What changes for you

After this lesson, when an AI app “just understood” what you wanted on the first try, you have language for what happened: in-context learning, cued by your phrasing or by an example you included. You also have a recipe for when zero-shot starts being unreliable: add three to five clean, representative examples and the model’s behavior usually stabilizes. And when few-shot is not enough, you know to reach for an explicit instruction or for chain-of-thought (next lesson) rather than throwing more examples at the problem.

The model is frozen. The prompt is not.
Examples in the prompt cue patterns the model already knows.
Zero-shot when the task is clear, few-shot when zero-shot is unreliable, instructions when the rule is hard to infer.