Prompt engineering: cheatsheet

The toolkit (in rough order of leverage)

Technique	When to use
Clarity + specificity	Always. State audience, format, tone, constraints, edge cases. Length is fine; precision matters.
Format constraints	Whenever the output will be parsed. Use JSON / structured-output mode if the provider offers it.
Few-shot examples (2-5)	Format-sensitive or pattern-based tasks. Largest single lift for shape-sensitive outputs.
Chain-of-thought	Multi-step reasoning. Combine with a `<thinking>` block to hide reasoning from users.
System prompt as spec	Always for assistants. Persona + behavior + format + constraints + refusals.
Persona / tone	User-facing assistants. Sets style consistently.
Delimiters	Long inputs. Use triple backticks, XML tags, or unambiguous headers.
Critical instructions near the end	Long prompts. End placement followed more reliably.
Negative constraints	Sparingly. Long lists of negatives can backfire; reach for them when a specific failure recurs in testing.

Prompt fix vs code fix vs capability ceiling

Sample failures, then triage:
  Wrong INPUT (wrong context/fields/state)              -> CODE FIX
  Wrong OUTPUT given correct input (misunderstanding,
    wrong format, missing constraint)                   -> PROMPT FIX  (largest, cheapest)
  Persistent failure after tight prompt
    + correct input                                     -> retrieval / tools / fine-tune / different model

The discipline (turns it into engineering)

Version your prompts. Source control + a prompt_version constant; treat changes as code changes with review.
Test on 20-50 real held-out examples. Score with regex, structured checks, model-as-judge, or human review. Change something; re-run; compare numbers.

A spreadsheet + a Python script + 20 real examples beats an elaborate eval pipeline you do not use.

Few-shot template

You are a [role / persona]. [Behavior spec; format constraints.]

Here are some examples:

Input: [example 1 input]
Output: [example 1 output]

Input: [example 2 input]
Output: [example 2 output]

Now do this:
Input: [actual input]
Output:

Where prompts run out

Hitting limit on…	Reach for
Knowledge the model doesn’t have	Retrieval (lesson 4)
External systems the model can’t call	Tool use (lesson 4)
Persistent recurring failure cheap to train in	Fine-tuning (lesson 9)

Reach for these after prompt iteration, not instead of it; the prompt is still the spec.

Respects the three productive limits (lesson 2)

Context-efficient: concise prompts free budget for retrieved context / few-shot / history.
Cost-efficient: shorter outputs save the more expensive per-token side; shorter inputs compound favorably at scale.
Latency-efficient: shorter outputs are faster end-to-end (total ≈ TTFT + output/tps).

Words to use precisely

System prompt: persistent spec (persona, behavior, format, refusals).
Few-shot: in-prompt input/output examples that teach format and pattern.
Chain-of-thought: instruction to reason before answering; combine with hidden <thinking> block.
Structured-output / JSON mode: provider feature constraining generation to a schema.
Prompt version: the integer/string identifying the deployed prompt; tracked in source control.

Source

Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Learn to Spell: Prompt Engineering. fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.