Skip to content

Summary: How prompting works: mechanics, system prompts, and prompt injection

This lesson covers prompt mechanics and system prompts. Few-shot prompting is the next lesson; chain-of-thought is the one after that.

A prompt is the conditioning data for the model’s next-token loop. The post-training (covered in Phase 4 on tuning) is what makes the model want to follow instructions. Prompting is the lever you actually pull. The art is choosing the conditioning that makes the desired response a likely continuation of the input you supplied.

  • A prompt is just input tokens. The model is still doing next-token prediction; the prompt is the prefix that conditions the loop. Instruction-tuning biases the model to produce responses to instruction-shaped input rather than continuations of similar text, but the underlying machinery is unchanged.
  • Three patterns dominate practical use. Zero-shot (just ask), few-shot (show examples), chain-of-thought (think step by step). This lesson names them. The next two lessons cover few-shot and chain-of-thought in depth.
  • System prompts are standing instruction. A separate, conceptually higher-trust input that sets role, style, and constraints across the whole conversation. Mechanically, just more tokens at the start. What makes them “system” is an API contract plus a training-time bias toward weighting system instructions over user instructions.
  • System prompts are guidance, not law. The model is biased to follow them. It will deviate when user input pushes hard enough or when the application puts large amounts of attacker-controlled text into the conversation later.
  • Prompt injection is structural. Instruction-tuned models cannot fully distinguish operator instructions from instructions hidden in user-supplied data. At the token level, both are just text that conditions the next-token loop.
  • Direct vs indirect injection. Direct: the user is the attacker and types the injection. Indirect: the attacker hides instruction-shaped text inside content the application later retrieves on a benign user’s behalf (webpage, PDF, email, support ticket). Indirect is the more dangerous variant in real systems and the one Phase 6’s RAG lesson covers in depth.
  • Jailbreaks vs injection. Jailbreaks bypass refusal training on the attacker’s own prompt. Injection makes the model follow someone else’s instructions hidden in its input. Same family of failures, different threat models.
  • Mitigations reduce the gap; they do not close it. Instruction-hierarchy training, channel separation between system and user prompts, output filtering, and sandboxing the surface area an injected instruction can reach all help. None of them turn the bar into a wall, because the underlying mechanism (instruction-following over input tokens) is what makes the model useful.
  • Pitfall: chasing magic words. Every prompt is just input tokens. The structural choice (which pattern, which conditioning) swamps the wording choice. Stop hunting incantations.
  • Pitfall: cargo-culting role prompts. “You are a senior engineer with 20 years of experience” often makes no measurable difference and sometimes hurts. Use a role prompt when the role actually changes what good output looks like; skip it when it is decoration.
  • Pitfall: treating the system prompt as a sandbox. It is a high-priority hint, not a security boundary. Anything the user (or retrieved content) can put in front of the model is reachable.
  • Pitfall: prompting versus fine-tuning. Prompting is per-call and free of training cost. Fine-tuning persists across calls without paying token cost. Different tools, different cost profiles.

Before this lesson, the difference between a prompt that works and one that does not was likely a matter of intuition or trial-and-error. After it, you have a frame: a prompt is input tokens conditioning a next-token loop, the system prompt is a high-priority hint, and the boundary between operator instructions and untrusted content is a training-time bias rather than a hard wall. When a chat assistant produces something useless, the most productive first move is to ask whether the prompt actually specified the task in enough detail; vague conditioning, vague output.

When you build any application that puts a model on top of user-supplied or web-fetched text, you can recognize the prompt-injection surface area and design around it rather than designing as if the system prompt is a guarantee.

A prompt is just input tokens.
The model follows instructions because it was trained to.
That is also why it follows the wrong ones when they are hidden in its input.