The model is frozen at inference.
Examples in the prompt do not teach it.
They cue patterns it already learned during training.
Term What’s in the prompt before your real query Zero-shot Nothing. Just the task description and the query. One-shot One worked example showing input-output. Few-shot Multiple worked examples (typically 3 to 5).
The conceptually important shift is from zero (no examples) to nonzero (some examples). One vs few is mostly convention.
Pretrained LLMs absorbed many task patterns during training.
Few-shot examples help the model SELECT which pattern to invoke
and in what format. The model is not learning new facts; it is
being cued into existing capabilities.
Failure mode Why Task requires unknown facts Examples cannot manufacture knowledge that’s not in the weights Examples share an accidental feature (same start word, same topic) Model locks onto the accidental pattern, not the intended one Task is far outside training distribution A few demonstrations cannot bridge a gap of millions of missing pretraining samples
1. Use 3 to 5 examples by default.
2. Make examples diverse (cover the range of outputs).
3. Make examples representative (real-query length and style).
4. Format consistently (the model will mimic the format).
5. Vary irrelevant dimensions (avoid accidental patterns).
6. Place the real query last (recent context weighs more).
If examples are conveying… Reach for… A format (output shape, label set, phrasing)Few-shot. Cheap and reliable. A rule the model has to inferAn explicit instruction. Multi-step reasoning An instruction plus chain-of-thought (next lesson). Both rule + format Hybrid: instruction first, then 1 to 2 illustrative examples.
Format consistency is the point (JSON shape, one-word label, specific structure).
The categories or labels are unfamiliar enough that an instruction would have to define them anyway.
The model is older or smaller and has weaker reasoning capability.
The rule is concise and explicit (a definition, a step-by-step procedure).
The model is reasoning-capable and can follow written rules.
You want the model to generalize to inputs that look different from the examples.
Tag the email as bug-report, feature-request, account-issue,
Email: "When I click submit, the page hangs."
Email: "Could you add CSV export?"
Email: "I cannot update my billing email."
Email: "Do you offer a free trial?"
Email: <YOUR REAL QUERY HERE>
Each category appears once. Format is identical. Real query goes last.
Pitfall Reality ”Few-shot teaches the model.” No. Same model, same weights, different context. ”More examples are always better.” No. Past 5, diminishing returns. Past 10, can confuse or overfit. ”If zero-shot is unreliable, dump 20 examples.” The first 3 do most of the work. After that, focus on diversity, not count. ”If few-shot doesn’t work, the model can’t do the task.” Sometimes the issue is that the rule is hard to infer; an instruction may rescue it.
In-context learning (ICL): using the prompt to shape immediate model behavior. No weights change. Effect is local to one inference call.
Zero-shot: prompting with no demonstrations. Just the task description and the query.
One-shot: prompting with one input-output demonstration before the query.
Few-shot: prompting with multiple demonstrations before the query (typically 3 to 5).
Hybrid prompt: an explicit instruction plus a small number of illustrative examples. Often beats pure-instruction or pure-example versions on hard tasks.
Plan-and-Solve Prompting: a 2023 technique showing instructions can outperform pure few-shot on multi-step reasoning. Cited in references.
The model is frozen. The prompt is not.
Examples in the prompt cue patterns the model already knows.
Zero-shot when the task is clear, few-shot when zero-shot is unreliable, instructions when the rule is hard to infer.