Skip to content

Agents that self-check: metacognition

This is lesson 9 of Track 20 (AI Agents and Tool Use) and the closer of Phase 2, The design patterns that make agents work. The previous lesson raised reliability by adding more agents and paying the coordination cost. This lesson raises it with a cheaper move: have one agent check its own work before it commits.

That is metacognition, a long word for thinking about your own thinking. For an agent it means inserting a step where, instead of acting on its first answer, it pauses and asks whether the answer or plan is actually right, then revises if not. You will learn why a reflection step works (the same reason editing improves writing), the forms it takes (critique an answer, review a plan before executing, verify a result against criteria), how it is the proactive general form of the reactive self-correction you saw with tool failures, weak retrievals, and replanning, and why it is the cheap reliability move compared to adding a reviewer agent. The lesson stays honest about the limit: a second look is not a guarantee, it has diminishing returns, and it pairs best with a real external signal.

The track structurally mirrors Microsoft’s “AI Agents for Beginners” (MIT-licensed), with the Berkeley CS294 LLM Agents course as a depth reference. Full attribution is in this lesson’s references.

This lesson answers the reliability question the previous one opened. Multi-agent systems buy reliability through specialization but pay coordination cost; reflection buys it through a single extra reasoning pass with no new coordination, so the honest guidance is to try reflection before adding an agent. It also gathers threads from across the track: the tool-failure recovery, weak-retrieval re-search, and replanning seen earlier were all special cases of self-correction, and reflection is their general form. This lesson closes Phase 2. The next phase changes the question from how to make an agent capable to how to make one you can trust and ship, beginning with the next lesson on trustworthy agents.

Prerequisites: the earlier lessons in the track, especially Many agents working together (the immediately prior lesson; reflection is weighed directly against its add-an-agent reliability move) and Planning (reviewing a plan before executing is one form of reflection, and replanning was a reactive case of self-correction). You do not need to code. If you understand an agent as a model in a loop, reflection is just one more step in that loop.

  • Explain how a reflection step raises an agent’s reliability, by the editing analogy
  • Recognize the forms reflection takes (critique an answer, review a plan, verify a result against criteria)
  • Connect reflection to the reactive self-correction seen in earlier lessons as its proactive, general form
  • Decide between a reflection step and a second reviewer agent for a given reliability problem
  • State the honest limits of self-review (blind-spot errors, diminishing returns, the value of an external signal)
  • Read time: about 10 minutes
  • Practice time: about 15 minutes (a self-check, two applied design exercises, and flashcards)
  • Difficulty: standard