Practice: Agents that retrieve their own information: agentic RAG

Self-check

Seven short questions. Answer each in your head before opening the collapsible. Active retrieval is where the learning sticks.

1. In one breath, what does RAG do?

Show answer

It retrieves the passages most relevant to a question from a body of text the model was not trained on, pastes those passages into the model’s context, and lets the model answer from them. Retrieval is what lets the model speak about information it never memorized.

2. What makes classical RAG a “fixed pipeline”?

Show answer

It runs the same three steps in the same order every time: retrieve, then read, then answer. Retrieval always happens, always once, always before the model speaks. There is no decision in it, so it cannot adapt.

3. What is the single change that turns classical RAG into agentic RAG?

Show answer

Retrieval stops being a fixed first step and becomes a tool the agent decides whether and when to call inside its perceive-decide-act loop. The agent owns the retrieval decisions instead of following a predetermined path.

4. Name the three behaviors agentic RAG unlocks that a fixed pipeline cannot do.

Show answer

Skipping retrieval when the question does not need it, retrieving more than once (refining the query each time) for multi-part questions, and judging whether a result is good enough and re-searching when it is weak.

5. The lesson says agentic RAG is “still just a tool.” What earlier pieces does it reuse?

Show answer

The loop (Lesson 1): retrieval is a perceive-decide-act move. The tool call (Lesson 2): retrieval is one tool call. The tool definition (Lesson 4): describe the retrieve tool well or it fires at the wrong times. Memory (Lesson 5): retrieval is often how persistent memory gets pulled into a run. Agentic RAG is those parts pointed at fetching information, not new machinery.

6. What does agentic RAG cost, compared to the fixed pipeline?

Show answer

Predictability. An agent that decides when to retrieve can decide wrong: skip a retrieval it needed, retrieve when it should have just answered, or loop on a query that was never going to work. The fixed pipeline is predictable precisely because it never decides. Agentic RAG trades that predictability for adaptability.

7. Someone says, “Our agentic RAG keeps answering from irrelevant passages, so the agent’s reasoning must be broken.” What is a likelier cause?

Show answer

The retrieval black box. How well retrieval finds relevant text (the embeddings, the chunking, the index) is a separate problem from the agent’s decisions. A great agent over a bad index still answers from bad passages. Retrieval quality is not the agent’s job; fix the index before blaming the loop.

Try it yourself: fixed pipeline or agentic RAG?

No tooling, no cost; this is design judgment. For each job, decide whether a fixed retrieve-read-answer pipeline is enough or whether retrieval should be a decision the agent owns, and say why in one line. Then check.

A. An FAQ bot that answers each question from exactly one help-center article.
B. A research assistant that gathers facts from several sources, notices what
   is still missing, and goes back for more.
C. A bot that greets users and answers "what are your hours?" from a single
   stored line.
D. A support agent that may check one policy document or cross-reference three,
   depending on the question asked.

Show answer

A: fixed pipeline. Every question needs exactly one retrieval. The pipeline is simpler and more predictable; agency would add cost for nothing.
B: agentic RAG. The work is inherently several retrievals deep, and how deep is not known in advance. The agent should decide when it has enough and when to go back.
C: fixed pipeline (or no retrieval at all). A single stored line, one fixed lookup. The simplest path serves.
D: agentic RAG. The number of lookups varies with the question, so a fixed pipeline either over-fetches or comes up short. Let the agent decide how many to do.

The test is variable depth: when you cannot say ahead of time how many retrievals a question needs, let the agent decide. When every question needs exactly one (or none), the fixed pipeline wins.

Try it yourself: design the retrieval decisions

For the question below, write out the retrieval decisions an agentic-RAG agent should make, the way the lesson did for the refund-policy comparison. Assume the agent has a retrieve(query) tool over the company’s policy documents.

Question: “Did our return policy change between 2023 and 2024, and if so, how?”

Show a sample trace

DECIDE:   Two years to compare, so one retrieval is not enough.
ACT:      retrieve("2023 return policy") -> 2023 clause
ACT:      retrieve("2024 return policy") -> 2024 clause
DECIDE:   Both retrieved; compare the two clauses.
JUDGE:    Are both specific enough to compare? If one came back vague,
          re-search with a sharper query before answering.
ACT:      answer by naming what changed (or that nothing did)

What matters: you saw that the question needed two targeted retrievals, not one; you left room to judge whether each result was good enough and re-search if weak; and the agent, not a fixed pipeline, decided how many lookups the question demanded. A classical pipeline would have retrieved once and answered from half the picture.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page for offline review.

Q. What is RAG, in one breath?

Retrieval-augmented generation: retrieve the passages most relevant to a question from text the model was not trained on, paste them into the model’s context, and answer from them.

Q. What is classical RAG?

The fixed retrieve-read-answer pipeline. Retrieval always happens, always once, always before the model answers. There is no decision in it.

Q. What is agentic RAG?

Retrieval as a tool the agent decides whether and when to call inside its loop. The agent chooses whether to retrieve, when, how many times, and whether the result is good enough.

Q. What single change separates classical RAG from agentic RAG?

Retrieval goes from a fixed first step to a decision the agent owns. Everything else follows from that one change.

Q. Name the three behaviors agentic RAG unlocks.

Skip retrieval when it is not needed, retrieve multiple times for multi-part questions, and judge whether a result is good enough and re-search when it is weak.

Q. What does 'self-correcting retrieval' mean?

After a search returns, the agent asks ‘is this enough to answer well?’ If the passages are off-topic or thin, it refines the query and searches again before answering, instead of committing to a weak first result.

Q. What does agentic RAG cost compared to a fixed pipeline?

Predictability. An agent that decides when to retrieve can decide wrong (skip, over-fetch, or loop). The fixed pipeline never decides, so it never decides wrong.

Q. When should you use a fixed pipeline instead of agentic RAG?

When every question needs exactly one retrieval. The pipeline is simpler and more predictable. Agency earns its keep only when the number of retrievals varies with the question.

Q. Why is agentic RAG described as 'still just a tool'?

It is built entirely from earlier pieces: the loop (L1), the tool call (L2), the tool definition (L4), and memory (L5). It is those parts pointed at fetching information, not new machinery.

Q. Whose job is retrieval quality (the embeddings, the chunking)?

Not the agent’s. How well the retrieval black box finds relevant text is a separate problem. A great agent over a bad index still answers from bad passages.