Skip to content

Summary: Agents that retrieve their own information: agentic RAG

Classical RAG is a fixed pipeline; agentic RAG makes retrieval a tool the agent decides to call. That single change is the whole lesson. The ordinary form retrieves, reads, and answers the same way every time. The agentic form hands the agent a retrieval tool and lets it decide whether to retrieve, when, how many times, and whether what came back is good enough. This summary is the scan-in-five-minutes version of the full lesson.

  • RAG in one breath: given a body of text the model was not trained on (your documents, a manual, a knowledge base), retrieve the passages most relevant to a question, paste them into the model’s context, and answer from them. Retrieval is what lets the model speak about information it never memorized.
  • How retrieval finds relevant text is a black box here. Matching meaning with embeddings, chunking the documents, building the index, that is its own subject. For this lesson, retrieval is a box with one job: query in, relevant chunks out. What matters is when and how the agent uses it.
  • Classical RAG is a fixed pipeline: retrieve, then read, then answer, in that order, every time. Retrieval always happens, always once, always before the model speaks. It cannot adapt because there is no decision in it.
  • That rigidity breaks in three ways. For a question the model can answer alone, it retrieves anyway and stuffs the context with noise. For a question that needs two lookups, it retrieves once and answers from half the information. For a question whose first search comes back weak, it has no way to notice and try again.
  • The agentic turn: give the agent a retrieval tool and let it decide. This is the same move as tool use generally (Lesson 2): retrieval becomes one tool call inside the perceive-decide-act loop, on the agent’s judgment, rather than a step wired in ahead of time.
  • That unlocks three behaviors a pipeline cannot: skip retrieval when it is not needed, retrieve more than once (refining the query each time) for multi-part questions, and judge whether a result is good enough and re-search when it is weak.
  • Self-correcting retrieval is the most useful gain. After a search returns, the agent asks “is this enough to answer well?” If the passages are off-topic or thin, it sharpens the query and searches again before answering. A classical pipeline commits to whatever the first search returned.
  • Agentic RAG is built from earlier pieces, not new machinery: the loop (L1), the tool call (L2), the tool definition (L4, describe the retrieve tool well or it fires at the wrong times), and memory (L5, retrieval is often how persistent memory gets pulled into a run).
  • Adaptability costs predictability. An agent that decides when to retrieve can decide wrong: skip a needed retrieval, over-fetch, or loop on a bad query. The fixed pipeline is predictable because it never decides. Use the pipeline when one path serves; use agentic RAG when the questions vary enough to need judgment.
  • Retrieval quality is a separate problem. How well the black box finds relevant text (embeddings, chunking, the index) is not the agent’s job. A great agent over a bad index still answers from bad passages.

Before this lesson, “RAG” was one word for one thing. Now you can hear the difference between a fixed retrieve-then-answer pipeline and an agent that owns its retrieval decisions, and you can tell which one a given job needs. The test is variable depth: when you cannot say ahead of time how many retrievals a question takes, let the agent decide; when every question needs exactly one, the pipeline is simpler and more dependable. And you can see that agentic RAG is not a separate system bolted on, it is the loop, the tool call, the tool definition, and memory, all working together on the specific job of pulling in the right information.