Summary: Prompt engineering, "Learn to Spell"
Prompt engineering is the single highest-leverage application skill. Small word changes have outsized effects, and the prompt is the spec for what the assistant is, not an incantation. The toolkit: clarity and specificity (the biggest single lift on most prompts), format constraints (JSON / structured-output modes when offered), few-shot examples (2-5; largest lift for shape-sensitive tasks), chain-of-thought for multi-step reasoning, the system prompt as the spec, persona/tone for user-facing assistants, negative constraints used sparingly, delimiters to separate instructions from input, and placing critical instructions near the end of long prompts. A prompt fix beats a code fix when the model misunderstands what was wanted (correct input, wrong output). The discipline that turns prompting into engineering: version your prompts (source control + an explicit version constant) and test on 20-50 real held-out examples when changing them. Prompts run out at missing knowledge (retrieval, lesson 4), missing external systems (tool use, lesson 4), or persistent cheap-to-train-in failures (fine-tuning, lesson 9), but reach for these after prompt iteration, not instead of it.
Core ideas
Section titled “Core ideas”- The prompt is the spec. Treat it as precise specification work, not casting incantations. Small changes have outsized effects.
- The toolkit: clarity (most prompts want more precision, not less), format constraints (JSON/structured-output when offered), few-shot (2-5; biggest single lift for shape-sensitive tasks), chain-of-thought for multi-step reasoning, system prompt as the persistent spec, persona/tone for user-facing assistants, negative constraints used sparingly (long lists of negatives can backfire), delimiters between instructions and input, critical instructions near the end of long prompts.
- Prompt fix vs code fix vs capability ceiling. Triage by sampling failures: wrong input -> code fix; wrong output given correct input -> prompt fix (largest, cheapest category); persistent ceiling -> retrieval, tool use, fine-tuning, or different model.
- Discipline: version prompts in source control with an explicit
prompt_version; test on 20-50 real held-out examples whenever you change them, scoring with regex, structured checks, model-as-judge, or human review. Vibes-driven tweaking is not engineering. - Where prompts run out: missing knowledge (lesson 4 retrieval), missing external systems (lesson 4 tool use), persistent cheap-to-train-in failures (lesson 9 fine-tuning). Reach for these after prompt iteration.
- Respects all three productive limits from lesson 2: context-efficient, cost-efficient (concise in and out), latency-efficient (shorter outputs stream faster).
What changes for you
Section titled “What changes for you”Prompt engineering is unglamorous and essential, and it is the lever most production-quality LLM applications you admire have pulled hardest. A team shipping new prompt versions weekly with tests outperforms a team with a “better” architecture and no prompt discipline. The discipline is also small: a spreadsheet, a Python script, and 20 real examples are enough to start; you do not need elaborate infrastructure to begin treating prompts as engineering rather than vibes. Phase 1 closes here, with the minimum app (L1), the working-picture foundations (L2), and the prompt-engineering toolkit (L3) together being the smallest complete loop a builder can iterate on. Phase 2 opens with augmented language models (retrieval and tool use), the first time the prompt gets context fetched from outside the model.
The prompt is the spec for what the assistant is, and the “Learn to Spell” joke is not really a joke: small word changes really do move the model, and the highest-leverage discipline in application work is writing those words deliberately, versioning them, and testing the changes. Everything else in the track stacks on this skill.