References: How prompting works: zero-shot, few-shot, and chain-of-thought

Source material

Source material:
• Stanford CME 295: Transformers & Large Language Models, Autumn 2025
  Instructor: Afshine Amidi & Shervine Amidi, Stanford University
  Course site: https://cme295.stanford.edu/
  Cheatsheet: https://cme295.stanford.edu/cheatsheet/
  Source lecture (Lecture 3, Large Language Models): https://www.youtube.com/watch?v=Q5baLehv5So
  License (lecture videos): as published on Stanford's public YouTube channel
  License (Amidi cheatsheets): MIT
This lesson adapts the prompting, in-context-learning, and chain-of-thought
sections of Stanford CME 295 Lecture 3. The post-training context that makes
prompting work (SFT and RLHF) is covered in our Lecture 5 lesson. Clawdemy
provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original lectures remain with
Stanford and the instructors.

Going deeper

A short list, chosen for durability. Each link is for a specific next step, not a generic “learn more.”

“Language Models are Few-Shot Learners”, Brown et al., 2020. The GPT-3 paper. The foundational empirical demonstration that sufficiently large language models can perform new tasks from a handful of in-context examples without any weight updates. The “few-shot” framing in this lesson comes directly from here. The paper is long; sections 1, 2, and 3 are the conceptual core.
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Wei et al., 2022. The CoT paper. Shows that few-shot examples that include intermediate reasoning steps (not just answers) substantially improve model accuracy on multi-step reasoning tasks. The first crisp result that “more tokens before the answer” was a real lever, not just a trick.
“Large Language Models are Zero-Shot Reasoners”, Kojima et al., 2022. The “let’s think step by step” paper. Demonstrates that the same CoT effect appears with no examples at all if you simply instruct the model to think step by step. Read alongside the Wei paper; the contrast between few-shot and zero-shot CoT is what produced the modern prompting playbook.
“A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks”, Vatsal & Dubey, 2024. A comprehensive survey of prompting techniques organized by NLP task type. Useful when you want to see the broader landscape (prompt chaining, self-consistency, ReAct, tree-of-thought) and where each technique fits. Treat the specific empirical numbers as a snapshot; the taxonomy is what holds up.
“Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection”, Greshake et al., 2023. The paper that named indirect prompt injection (instructions hidden in third-party content the model retrieves rather than in the user’s own input) and demonstrated it against deployed systems. Required reading if you build anything where a model touches retrieved or user-supplied text.
Simon Willison’s prompt injection writeups. The most readable running coverage of prompt injection in the wild on the public web. Updated frequently with new attack patterns and (rarer) effective mitigations. Good companion to the academic literature, which moves slowly relative to the attack surface.

Adjacent topics

Topics that build on or sit beside this one.

In-context learning, the research question. Why in-context learning works is still partly open. There are several competing hypotheses (the model is performing implicit gradient descent in the forward pass; the model is pattern-matching against pretraining-distribution analogues; some combination). Search terms: “induction heads,” “implicit Bayesian inference for in-context learning,” “transformers learn in-context by gradient descent.”
Self-consistency and tree/graph-of-thought. Extensions of chain-of-thought that sample multiple reasoning chains and aggregate, or branch reasoning into a tree. Self-consistency (Wang et al., 2022) is the simplest and often the highest-leverage upgrade to plain CoT. Tree-of-thought (Yao et al., 2023) and its descendants are more elaborate; pay attention to whether the cited gains hold at scale.
Instruction-hierarchy training. Recent post-training work that teaches models to weight instructions from different sources (typically system over user) more reliably. OpenAI’s instruction hierarchy paper (Wallace et al., 2024) is the public starting point. Context for the prompt-injection mitigation discussion in this lesson.
Retrieval-augmented generation (RAG). Sits at the intersection of prompting and external knowledge. The model is given retrieved context as part of the prompt. It is also one of the largest indirect-prompt-injection surfaces in real systems. Planned as a future lesson in this track.
Where to go next. Within Lecture 3, the next published lesson is this one’s predecessor (text generation). The natural Lecture 7 follow-on is RAG (retrieval-augmented generation), which puts prompting on top of a retrieval pipeline and is where indirect prompt injection most often shows up. Check the tracks index for the latest published lessons. If you want to broaden out instead of going deeper, Track 1 covers the human side of working with AI.

Original sources

The primary papers for the patterns covered, in chronological order.

“Language Models are Few-Shot Learners”, Brown et al., 2020. Few-shot / in-context learning at scale.
“Finetuned Language Models Are Zero-Shot Learners”, Wei et al., 2021. The instruction-tuning side: trains the model so that it follows zero-shot instructions reliably across many tasks. Methodological precursor to the SFT step covered in our Lecture 5 lesson, and the reason zero-shot prompting works as well as it does on a modern model.
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”, Wei et al., 2022. The CoT method.
“Self-Consistency Improves Chain of Thought Reasoning in Language Models”, Wang et al., 2022. Sampling multiple CoT chains and taking the majority answer; one of the cleanest single-knob improvements over plain CoT.
“Large Language Models are Zero-Shot Reasoners”, Kojima et al., 2022. Zero-shot CoT.
“Not what you’ve signed up for”, Greshake et al., 2023. Indirect prompt injection.

Community discussion

None selected for this lesson. The public discussion of prompting has consolidated around the academic literature above, vendor-published guides (which rotate as models change), and a small number of practitioner blogs. Simon Willison’s writeups (linked in Going deeper) are the most stable practitioner voice on the security side. If a durable practitioner thread on the prompting side surfaces, it will be added at the next quarterly review.