Skip to content

Cheatsheet: Prompt engineering, "Learn to Spell"

TechniqueWhen to use
Clarity + specificityAlways. State audience, format, tone, constraints, edge cases. Length is fine; precision matters.
Format constraintsWhenever the output will be parsed. Use JSON / structured-output mode if the provider offers it.
Few-shot examples (2-5)Format-sensitive or pattern-based tasks. Largest single lift for shape-sensitive outputs.
Chain-of-thoughtMulti-step reasoning. Combine with a <thinking> block to hide reasoning from users.
System prompt as specAlways for assistants. Persona + behavior + format + constraints + refusals.
Persona / toneUser-facing assistants. Sets style consistently.
DelimitersLong inputs. Use triple backticks, XML tags, or unambiguous headers.
Critical instructions near the endLong prompts. End placement followed more reliably.
Negative constraintsSparingly. Long lists of negatives can backfire; reach for them when a specific failure recurs in testing.

Prompt fix vs code fix vs capability ceiling

Section titled “Prompt fix vs code fix vs capability ceiling”
Sample failures, then triage:
Wrong INPUT (wrong context/fields/state) -> CODE FIX
Wrong OUTPUT given correct input (misunderstanding,
wrong format, missing constraint) -> PROMPT FIX (largest, cheapest)
Persistent failure after tight prompt
+ correct input -> retrieval / tools / fine-tune / different model

The discipline (turns it into engineering)

Section titled “The discipline (turns it into engineering)”
  1. Version your prompts. Source control + a prompt_version constant; treat changes as code changes with review.
  2. Test on 20-50 real held-out examples. Score with regex, structured checks, model-as-judge, or human review. Change something; re-run; compare numbers.

A spreadsheet + a Python script + 20 real examples beats an elaborate eval pipeline you do not use.

You are a [role / persona]. [Behavior spec; format constraints.]
Here are some examples:
Input: [example 1 input]
Output: [example 1 output]
Input: [example 2 input]
Output: [example 2 output]
Now do this:
Input: [actual input]
Output:
Hitting limit on…Reach for
Knowledge the model doesn’t haveRetrieval (lesson 4)
External systems the model can’t callTool use (lesson 4)
Persistent recurring failure cheap to train inFine-tuning (lesson 9)

Reach for these after prompt iteration, not instead of it; the prompt is still the spec.

Respects the three productive limits (lesson 2)

Section titled “Respects the three productive limits (lesson 2)”
  • Context-efficient: concise prompts free budget for retrieved context / few-shot / history.
  • Cost-efficient: shorter outputs save the more expensive per-token side; shorter inputs compound favorably at scale.
  • Latency-efficient: shorter outputs are faster end-to-end (total ≈ TTFT + output/tps).
  • System prompt: persistent spec (persona, behavior, format, refusals).
  • Few-shot: in-prompt input/output examples that teach format and pattern.
  • Chain-of-thought: instruction to reason before answering; combine with hidden <thinking> block.
  • Structured-output / JSON mode: provider feature constraining generation to a schema.
  • Prompt version: the integer/string identifying the deployed prompt; tracked in source control.
  • Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Learn to Spell: Prompt Engineering. fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.