Agentic RAG: cheatsheet

The one idea

Classical RAG is a fixed pipeline (retrieve, read, answer, every time). Agentic RAG makes retrieval a tool the agent decides whether and when to call. That single change turns a rigid pipeline into a reasoning loop.

RAG in one breath

Given a body of text the model was not trained on, retrieve the passages most relevant to a question, paste them into the model’s context, and answer from them. (How retrieval matches meaning, via embeddings, is a separate subject; treat it as a black box: query in, relevant chunks out.)

Classical vs agentic

	Classical RAG	Agentic RAG
Retrieval	Always, once, before answering	A tool the agent calls on judgment
Decides when to retrieve	No (fixed)	Yes
Can retrieve multiple times	No	Yes (refine and repeat)
Can judge if results suffice	No	Yes (re-search if weak)
Strength	Predictable	Adaptable

The three-question contrast

Q: "What is 15% of 240?"
  classical -> retrieves uselessly, answers      agentic -> no retrieval, "36"
Q: "What's our refund window?"
  classical -> retrieve once, answer             agentic -> retrieve once, answer
Q: "Compare 2023 and 2024 refund policies."
  classical -> retrieve once (half the picture)  agentic -> retrieve both years, compare

Self-correcting retrieval

After a search returns, the agent judges “is this enough?” If not, it refines the query and searches again, the same self-correction as tool failures in Lesson 2.

retrieve("overseas returns")   -> weak, generic
retrieve("international return policy eligibility") -> the actual clause -> answer

What it costs

Agency is not free. The agent can skip a retrieval it needed, retrieve when it should have just answered, or loop on a bad query. The fixed pipeline is predictable because it never decides. Use the pipeline when one path serves; use agentic RAG when questions vary.

It reuses earlier pieces

The loop (L1): retrieval is a perceive-decide-act move.
The tool call (L2): retrieval is one tool call.
The tool definition (L4): describe the retrieve tool well or it fires at wrong times.
Memory (L5): retrieval is often how persistent memory gets pulled into a run.

Pitfalls to dodge

Thinking “RAG” is one fixed thing (it covers both the pipeline and the agentic form).
Using agentic RAG when a fixed pipeline would do (one retrieval per question = pipeline).
Forgetting the agent can retrieve badly (wrong time, looping).
Neglecting the retrieve tool’s description (L4 applies in full).
Treating retrieval quality (embeddings, chunking) as the agent’s job; it is a separate problem.

Words to use precisely

RAG: retrieval-augmented generation; pulling external text into the model’s context to answer.
Classical/static RAG: the fixed retrieve-read-answer pipeline.
Agentic RAG: retrieval as a tool the agent dynamically decides to call, repeat, and judge.