Cheatsheet: Prompt engineering, "Learn to Spell"
The toolkit (in rough order of leverage)
Section titled “The toolkit (in rough order of leverage)”| Technique | When to use |
|---|---|
| Clarity + specificity | Always. State audience, format, tone, constraints, edge cases. Length is fine; precision matters. |
| Format constraints | Whenever the output will be parsed. Use JSON / structured-output mode if the provider offers it. |
| Few-shot examples (2-5) | Format-sensitive or pattern-based tasks. Largest single lift for shape-sensitive outputs. |
| Chain-of-thought | Multi-step reasoning. Combine with a <thinking> block to hide reasoning from users. |
| System prompt as spec | Always for assistants. Persona + behavior + format + constraints + refusals. |
| Persona / tone | User-facing assistants. Sets style consistently. |
| Delimiters | Long inputs. Use triple backticks, XML tags, or unambiguous headers. |
| Critical instructions near the end | Long prompts. End placement followed more reliably. |
| Negative constraints | Sparingly. Long lists of negatives can backfire; reach for them when a specific failure recurs in testing. |
Prompt fix vs code fix vs capability ceiling
Section titled “Prompt fix vs code fix vs capability ceiling”Sample failures, then triage: Wrong INPUT (wrong context/fields/state) -> CODE FIX Wrong OUTPUT given correct input (misunderstanding, wrong format, missing constraint) -> PROMPT FIX (largest, cheapest) Persistent failure after tight prompt + correct input -> retrieval / tools / fine-tune / different modelThe discipline (turns it into engineering)
Section titled “The discipline (turns it into engineering)”- Version your prompts. Source control + a
prompt_versionconstant; treat changes as code changes with review. - Test on 20-50 real held-out examples. Score with regex, structured checks, model-as-judge, or human review. Change something; re-run; compare numbers.
A spreadsheet + a Python script + 20 real examples beats an elaborate eval pipeline you do not use.
Few-shot template
Section titled “Few-shot template”You are a [role / persona]. [Behavior spec; format constraints.]
Here are some examples:
Input: [example 1 input]Output: [example 1 output]
Input: [example 2 input]Output: [example 2 output]
Now do this:Input: [actual input]Output:Where prompts run out
Section titled “Where prompts run out”| Hitting limit on… | Reach for |
|---|---|
| Knowledge the model doesn’t have | Retrieval (lesson 4) |
| External systems the model can’t call | Tool use (lesson 4) |
| Persistent recurring failure cheap to train in | Fine-tuning (lesson 9) |
Reach for these after prompt iteration, not instead of it; the prompt is still the spec.
Respects the three productive limits (lesson 2)
Section titled “Respects the three productive limits (lesson 2)”- Context-efficient: concise prompts free budget for retrieved context / few-shot / history.
- Cost-efficient: shorter outputs save the more expensive per-token side; shorter inputs compound favorably at scale.
- Latency-efficient: shorter outputs are faster end-to-end (total ≈ TTFT + output/tps).
Words to use precisely
Section titled “Words to use precisely”- System prompt: persistent spec (persona, behavior, format, refusals).
- Few-shot: in-prompt input/output examples that teach format and pattern.
- Chain-of-thought: instruction to reason before answering; combine with hidden
<thinking>block. - Structured-output / JSON mode: provider feature constraining generation to a schema.
- Prompt version: the integer/string identifying the deployed prompt; tracked in source control.
Source
Section titled “Source”- Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Learn to Spell: Prompt Engineering.
fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.