How prompting works: cheatsheet

The one idea that matters

A prompt is conditioning data for the next-token loop.
The art is choosing the conditioning that makes the desired
response a likely continuation.

This lesson covers prompt mechanics and system prompts. Few-shot prompting is the next lesson; chain-of-thought is the one after that.

Three patterns dominate (named here, taught next)

Pattern	What you write	Where it’s covered in depth
Zero-shot	Just the instruction	This lesson, briefly. The simplest baseline.
Few-shot	A handful of input-output examples, then the new input	Next lesson (`in-context-learning-and-few-shot`)
Chain-of-thought (CoT)	“Let’s think step by step” or input-reasoning-output examples	Lesson after that (`chain-of-thought-prompting`)

Almost every prompt you write is one of these or a combination. This lesson stays inside what they all sit on top of: the input-token mechanic.

What instruction-tuning actually does

What changed in the weights	What did NOT change
The model is biased to produce response-shaped continuations when input is instruction-shaped	The next-token loop. The model is still picking tokens conditioned on the prefix.
Helpful, harmless, honest patterns from RLHF show up as default tendencies	The fact that “instruction” is a property of input tokens, not of a separate execution mode

The full mechanics are in Phase 4. For prompting, the working assumption is that the model in front of you wants to follow instructions that look like instructions.

System and role prompts

Use them when…	Skip them when…
The conversation has a consistent role or constraint that should bias every turn	The role is decoration (“act as a senior X”) and the model has no actual access to that expertise
You want a stable style or refusal posture across many turns	You expect the user to override the role on a per-turn basis (the model treats system as guidance, not as law)

What makes a “system” prompt different	What it shares with a user message
API contract: separate field, processed before user turns	Both are input tokens at the start of the next-token loop
Training contract: the model has been post-trained to weight system instructions over user instructions when they conflict	Both rely on the model’s instruction-following bias; neither one is a hard execution boundary

Prompt injection: the structural vulnerability

Trained behavior: follow instruction-shaped input.
Application input: system prompt + user-supplied or retrieved data.
Failure mode:    that data contains instruction-shaped tokens.
Result:          the model has no robust way to tell whose instruction
                 is authoritative; it may follow the injected one.

Variant	Who is the attacker?	Where it shows up
Direct injection	The user typing the prompt	Anywhere a user message is concatenated into a prompt
Indirect injection	A third party who controls retrieved content (webpage, PDF, email, support ticket)	RAG, email summarizers, scraping pipelines, browsing tools. The user is the victim, not the attacker.

Mitigation	What it does	What it does not do
Instruction-hierarchy training	Teaches the model to weight system instructions over user instructions	Eliminate the gap; raises the bar, does not turn it into a wall
Channel separation (system vs user roles in the API)	Reduces the surface area where injected instructions get authority	Prevent injection in user-supplied content that is concatenated into prompts
Output filtering / sandboxing	Limits the damage an injected instruction can do to downstream systems	Stop the model from following the injection in the first place

Jailbreak vs injection	Threat model
Jailbreak	Attacker is also the user; bypasses the model’s refusal training on the attacker’s own prompt
Injection	Attacker is not the user; hides instructions in user-supplied or retrieved content that the operator concatenates into a prompt

Design rule: treat any user-supplied or web-fetched content inside a prompt as untrusted. Do not give a model on top of that input access to anything you would not let the untrusted input control directly.

Pitfalls to dodge

Pitfall	Reality
Prompting is magic words	No. Every prompt is just input tokens conditioning the next-token loop. Structural choice swamps wording choice.
Cargo-culting role prompts	”Act as a senior X” often makes no measurable difference; the model performs the persona at the cost of doing the task.
Treating the system prompt as enforcement	It is guidance the model is biased toward following, not a sandbox.
Conflating prompting with fine-tuning	Different tools, different cost profiles. Prompting is per-call and free of training cost; fine-tuning persists across calls without paying token cost.

Translating prompt-engineering language

Phrase you will see	What it actually leverages
”You are a [role]…”	System or role prompt (style + persona bias)
“Ignore previous instructions”	Prompt-injection attempt (in user-supplied content); a legitimate operator does not need to write this
”Let’s think step by step”	Zero-shot chain-of-thought (covered two lessons from now)
“Here are some examples…”	Few-shot (covered next lesson)
“In-context learning”	The mechanism behind few-shot prompting (covered next lesson)

Glossary

Prompt: the input tokens you control. Becomes the prefix that conditions the model’s next-token loop.
System prompt: a separate, conceptually higher-trust input that sets standing instructions for the conversation. Mechanically just more tokens at the start; “system” comes from an API contract plus a learned training-time bias.
Role prompt: an instruction that asks the model to adopt a persona (“you are a…”). May appear in the system or user channel.
Instruction-tuning: post-training (covered in Phase 4) that biases the model to produce response-shaped continuations when input is instruction-shaped.
Prompt injection: instruction-shaped content hidden in user-supplied or web-fetched data that the model may follow as if it were operator instructions.
Direct injection: the user is the attacker.
Indirect injection: the attacker is a third party whose content the application retrieves (RAG, email, scraping). The user is the victim.
Jailbreak: a different threat model where the attacker is the user and is trying to get the model to bypass its refusal training on the attacker’s own prompt.
Instruction-hierarchy training: post-training technique that teaches the model to weight some instruction sources (typically the system channel) over others. Reduces but does not eliminate prompt-injection vulnerability.

A prompt is just input tokens.
The model follows instructions because it was trained to.
That is also why it follows the wrong ones when they are hidden in its input.