Skip to content

Cheatsheet: Agents that retrieve their own information: agentic RAG

Classical RAG is a fixed pipeline (retrieve, read, answer, every time). Agentic RAG makes retrieval a tool the agent decides whether and when to call. That single change turns a rigid pipeline into a reasoning loop.

Given a body of text the model was not trained on, retrieve the passages most relevant to a question, paste them into the model’s context, and answer from them. (How retrieval matches meaning, via embeddings, is a separate subject; treat it as a black box: query in, relevant chunks out.)

Classical RAGAgentic RAG
RetrievalAlways, once, before answeringA tool the agent calls on judgment
Decides when to retrieveNo (fixed)Yes
Can retrieve multiple timesNoYes (refine and repeat)
Can judge if results sufficeNoYes (re-search if weak)
StrengthPredictableAdaptable
Q: "What is 15% of 240?"
classical -> retrieves uselessly, answers agentic -> no retrieval, "36"
Q: "What's our refund window?"
classical -> retrieve once, answer agentic -> retrieve once, answer
Q: "Compare 2023 and 2024 refund policies."
classical -> retrieve once (half the picture) agentic -> retrieve both years, compare

After a search returns, the agent judges “is this enough?” If not, it refines the query and searches again, the same self-correction as tool failures in Lesson 2.

retrieve("overseas returns") -> weak, generic
retrieve("international return policy eligibility") -> the actual clause -> answer

Agency is not free. The agent can skip a retrieval it needed, retrieve when it should have just answered, or loop on a bad query. The fixed pipeline is predictable because it never decides. Use the pipeline when one path serves; use agentic RAG when questions vary.

  • The loop (L1): retrieval is a perceive-decide-act move.
  • The tool call (L2): retrieval is one tool call.
  • The tool definition (L4): describe the retrieve tool well or it fires at wrong times.
  • Memory (L5): retrieval is often how persistent memory gets pulled into a run.
  • Thinking “RAG” is one fixed thing (it covers both the pipeline and the agentic form).
  • Using agentic RAG when a fixed pipeline would do (one retrieval per question = pipeline).
  • Forgetting the agent can retrieve badly (wrong time, looping).
  • Neglecting the retrieve tool’s description (L4 applies in full).
  • Treating retrieval quality (embeddings, chunking) as the agent’s job; it is a separate problem.
  • RAG: retrieval-augmented generation; pulling external text into the model’s context to answer.
  • Classical/static RAG: the fixed retrieve-read-answer pipeline.
  • Agentic RAG: retrieval as a tool the agent dynamically decides to call, repeat, and judge.