Cheatsheet: Agents that retrieve their own information: agentic RAG
The one idea
Section titled “The one idea”Classical RAG is a fixed pipeline (retrieve, read, answer, every time). Agentic RAG makes retrieval a tool the agent decides whether and when to call. That single change turns a rigid pipeline into a reasoning loop.
RAG in one breath
Section titled “RAG in one breath”Given a body of text the model was not trained on, retrieve the passages most relevant to a question, paste them into the model’s context, and answer from them. (How retrieval matches meaning, via embeddings, is a separate subject; treat it as a black box: query in, relevant chunks out.)
Classical vs agentic
Section titled “Classical vs agentic”| Classical RAG | Agentic RAG | |
|---|---|---|
| Retrieval | Always, once, before answering | A tool the agent calls on judgment |
| Decides when to retrieve | No (fixed) | Yes |
| Can retrieve multiple times | No | Yes (refine and repeat) |
| Can judge if results suffice | No | Yes (re-search if weak) |
| Strength | Predictable | Adaptable |
The three-question contrast
Section titled “The three-question contrast”Q: "What is 15% of 240?" classical -> retrieves uselessly, answers agentic -> no retrieval, "36"Q: "What's our refund window?" classical -> retrieve once, answer agentic -> retrieve once, answerQ: "Compare 2023 and 2024 refund policies." classical -> retrieve once (half the picture) agentic -> retrieve both years, compareSelf-correcting retrieval
Section titled “Self-correcting retrieval”After a search returns, the agent judges “is this enough?” If not, it refines the query and searches again, the same self-correction as tool failures in Lesson 2.
retrieve("overseas returns") -> weak, genericretrieve("international return policy eligibility") -> the actual clause -> answerWhat it costs
Section titled “What it costs”Agency is not free. The agent can skip a retrieval it needed, retrieve when it should have just answered, or loop on a bad query. The fixed pipeline is predictable because it never decides. Use the pipeline when one path serves; use agentic RAG when questions vary.
It reuses earlier pieces
Section titled “It reuses earlier pieces”- The loop (L1): retrieval is a perceive-decide-act move.
- The tool call (L2): retrieval is one tool call.
- The tool definition (L4): describe the retrieve tool well or it fires at wrong times.
- Memory (L5): retrieval is often how persistent memory gets pulled into a run.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Thinking “RAG” is one fixed thing (it covers both the pipeline and the agentic form).
- Using agentic RAG when a fixed pipeline would do (one retrieval per question = pipeline).
- Forgetting the agent can retrieve badly (wrong time, looping).
- Neglecting the retrieve tool’s description (L4 applies in full).
- Treating retrieval quality (embeddings, chunking) as the agent’s job; it is a separate problem.
Words to use precisely
Section titled “Words to use precisely”- RAG: retrieval-augmented generation; pulling external text into the model’s context to answer.
- Classical/static RAG: the fixed retrieve-read-answer pipeline.
- Agentic RAG: retrieval as a tool the agent dynamically decides to call, repeat, and judge.