Skip to content

Cheatsheet: How prompting works: mechanics, system prompts, and prompt injection

A prompt is conditioning data for the next-token loop.
The art is choosing the conditioning that makes the desired
response a likely continuation.

This lesson covers prompt mechanics and system prompts. Few-shot prompting is the next lesson; chain-of-thought is the one after that.

Three patterns dominate (named here, taught next)

Section titled “Three patterns dominate (named here, taught next)”
PatternWhat you writeWhere it’s covered in depth
Zero-shotJust the instructionThis lesson, briefly. The simplest baseline.
Few-shotA handful of input-output examples, then the new inputNext lesson (in-context-learning-and-few-shot)
Chain-of-thought (CoT)“Let’s think step by step” or input-reasoning-output examplesLesson after that (chain-of-thought-prompting)

Almost every prompt you write is one of these or a combination. This lesson stays inside what they all sit on top of: the input-token mechanic.

What changed in the weightsWhat did NOT change
The model is biased to produce response-shaped continuations when input is instruction-shapedThe next-token loop. The model is still picking tokens conditioned on the prefix.
Helpful, harmless, honest patterns from RLHF show up as default tendenciesThe fact that “instruction” is a property of input tokens, not of a separate execution mode

The full mechanics are in Phase 4. For prompting, the working assumption is that the model in front of you wants to follow instructions that look like instructions.

Use them when…Skip them when…
The conversation has a consistent role or constraint that should bias every turnThe role is decoration (“act as a senior X”) and the model has no actual access to that expertise
You want a stable style or refusal posture across many turnsYou expect the user to override the role on a per-turn basis (the model treats system as guidance, not as law)
What makes a “system” prompt differentWhat it shares with a user message
API contract: separate field, processed before user turnsBoth are input tokens at the start of the next-token loop
Training contract: the model has been post-trained to weight system instructions over user instructions when they conflictBoth rely on the model’s instruction-following bias; neither one is a hard execution boundary

Prompt injection: the structural vulnerability

Section titled “Prompt injection: the structural vulnerability”
Trained behavior: follow instruction-shaped input.
Application input: system prompt + user-supplied or retrieved data.
Failure mode: that data contains instruction-shaped tokens.
Result: the model has no robust way to tell whose instruction
is authoritative; it may follow the injected one.
VariantWho is the attacker?Where it shows up
Direct injectionThe user typing the promptAnywhere a user message is concatenated into a prompt
Indirect injectionA third party who controls retrieved content (webpage, PDF, email, support ticket)RAG, email summarizers, scraping pipelines, browsing tools. The user is the victim, not the attacker.
MitigationWhat it doesWhat it does not do
Instruction-hierarchy trainingTeaches the model to weight system instructions over user instructionsEliminate the gap; raises the bar, does not turn it into a wall
Channel separation (system vs user roles in the API)Reduces the surface area where injected instructions get authorityPrevent injection in user-supplied content that is concatenated into prompts
Output filtering / sandboxingLimits the damage an injected instruction can do to downstream systemsStop the model from following the injection in the first place
Jailbreak vs injectionThreat model
JailbreakAttacker is also the user; bypasses the model’s refusal training on the attacker’s own prompt
InjectionAttacker is not the user; hides instructions in user-supplied or retrieved content that the operator concatenates into a prompt

Design rule: treat any user-supplied or web-fetched content inside a prompt as untrusted. Do not give a model on top of that input access to anything you would not let the untrusted input control directly.

PitfallReality
Prompting is magic wordsNo. Every prompt is just input tokens conditioning the next-token loop. Structural choice swamps wording choice.
Cargo-culting role prompts”Act as a senior X” often makes no measurable difference; the model performs the persona at the cost of doing the task.
Treating the system prompt as enforcementIt is guidance the model is biased toward following, not a sandbox.
Conflating prompting with fine-tuningDifferent tools, different cost profiles. Prompting is per-call and free of training cost; fine-tuning persists across calls without paying token cost.
Phrase you will seeWhat it actually leverages
”You are a [role]…”System or role prompt (style + persona bias)
“Ignore previous instructions”Prompt-injection attempt (in user-supplied content); a legitimate operator does not need to write this
”Let’s think step by step”Zero-shot chain-of-thought (covered two lessons from now)
“Here are some examples…”Few-shot (covered next lesson)
“In-context learning”The mechanism behind few-shot prompting (covered next lesson)
  • Prompt: the input tokens you control. Becomes the prefix that conditions the model’s next-token loop.
  • System prompt: a separate, conceptually higher-trust input that sets standing instructions for the conversation. Mechanically just more tokens at the start; “system” comes from an API contract plus a learned training-time bias.
  • Role prompt: an instruction that asks the model to adopt a persona (“you are a…”). May appear in the system or user channel.
  • Instruction-tuning: post-training (covered in Phase 4) that biases the model to produce response-shaped continuations when input is instruction-shaped.
  • Prompt injection: instruction-shaped content hidden in user-supplied or web-fetched data that the model may follow as if it were operator instructions.
  • Direct injection: the user is the attacker.
  • Indirect injection: the attacker is a third party whose content the application retrieves (RAG, email, scraping). The user is the victim.
  • Jailbreak: a different threat model where the attacker is the user and is trying to get the model to bypass its refusal training on the attacker’s own prompt.
  • Instruction-hierarchy training: post-training technique that teaches the model to weight some instruction sources (typically the system channel) over others. Reduces but does not eliminate prompt-injection vulnerability.

A prompt is just input tokens.
The model follows instructions because it was trained to.
That is also why it follows the wrong ones when they are hidden in its input.