Agents that self-check: metacognition

The last lesson made agents more reliable by adding more of them: split the work across specialists and let each do its piece well. That works, but it is expensive, and the bill is coordination. Before you reach for a second agent, there is a cheaper move that often buys more reliability per unit of effort: have one agent check its own work before it commits.

That is metacognition, which is just a long word for thinking about your own thinking. For an agent it means inserting a step where, instead of acting on its first answer, it pauses and asks: is this actually right? Did I miss something? Is my approach sound? Then it revises if the answer is no. This lesson is about that reflection step, why it raises reliability, and how it compares to the heavier machinery of the last lesson.

By the end you will be able to explain how a self-check step makes an agent more reliable, and you will see that several patterns from earlier in the track were special cases of this one idea.

A first draft is rarely the best draft

The reason reflection works is the same reason editing works for human writing. A first attempt is usually good but flawed: the broad shape is right, but there is a missed case, a wrong assumption, a step taken too fast. If you commit the first attempt, the flaw ships. If you read it back with a critical eye before committing, you catch a good fraction of those flaws.

Language models behave similarly. Asked for an answer in one shot, a model produces a plausible first draft. Asked to review that draft, “check this for errors, consider what you might have missed,” it frequently finds real problems and fixes them. The model that generates and the model that reviews are the same model; the difference is the stance. Generating reaches for a plausible answer. Reviewing looks for what is wrong with the answer already on the page. Those are different jobs, and doing them in sequence beats doing only the first.

Worked example: an agent catching its own mistake

Illustrative fares; the numbers are made up to show the reflection step in action.

TASK: "What is the cheapest way to fly NYC to Tokyo next month?"

WITHOUT reflection:
  AGENT -> searches, finds a $612 fare, answers "$612 via one stop."

WITH a reflection step:
  AGENT -> drafts the same "$612" answer
  REFLECT: "I optimized for price only. The user said cheapest, but I
            should confirm this is a real bookable fare and note the long
            layover, which they may care about."
  AGENT -> re-checks, finds the $612 fare has a 14-hour layover, surfaces
           a $680 option with a 2-hour layover, presents both with the
           tradeoff.

The first answer was not wrong, exactly. It was thin. The reflection step caught that thinness before the user saw it. That is the everyday value of a self-check: it turns a defensible-but-shallow first pass into a better-considered final one.

What reflection looks like in practice

Reflection is not one fixed move; it shows up at a few different points in an agent’s work.

Critiquing an answer. After drafting a response, the agent reviews it for errors and gaps before sending it. This is the form in the example above.
Reviewing a plan before executing. The agent produces a plan (Lesson 7), then checks it for missing steps or bad ordering before acting on any of it, which is far cheaper than discovering the flaw five steps in.
Verifying a result against criteria. After producing something checkable (a calculation, a piece of code, a structured output), the agent tests it against what a correct result should satisfy, and fixes it if it falls short.

What unites them is the stance: a deliberate pause to look for what is wrong, inserted before the agent commits. Where you put that pause depends on where the costly mistakes happen.

You have seen this before

Reflection is not a brand-new idea in this track. It is the general form of a move that has shown up in pieces already.

In Lesson 2 (tool use), when a tool call failed, the agent read the error and tried again. That was self-correction triggered by a failure.
In Lesson 6 (agentic RAG), when a retrieval came back weak, the agent judged it insufficient and searched again with a better query. Self-correction triggered by a poor result.
In Lesson 7 (planning), when a step’s outcome contradicted the plan, the agent replanned. Self-correction triggered by a surprise.

Each of those was the agent noticing something had gone wrong and adjusting. Metacognition names the deliberate, general version: a check-my-work step the agent runs on purpose, even when nothing has visibly failed. The earlier cases were reactions to external signals (an error, a weak result, a contradiction). Reflection is proactive: the agent interrogates its own output before any signal says it must.

Why this is the cheap reliability move

Here is the comparison the last lesson sets up. To make an agent more reliable, you can add a second agent (a reviewer, a checker) and pay the coordination cost: another agent to build, a handoff to manage, more latency. Or you can have the one agent reflect, which costs a single extra reasoning pass and no new coordination at all.

For a large fraction of reliability problems, reflection is the better trade. It is cheaper, it has no seams to lose information across, and it is trivial to add: one more step in the loop. The honest guidance from the last lesson applies in reverse here. Before you reach for another agent to improve reliability, try having the one agent check itself. Add the second agent when reflection is genuinely not enough, not as the first move.

The honest limit: a second look is not a guarantee

Reflection helps, but it is not magic, and the source material on agent metacognition tends not to say so, which is worth correcting. A model that is confidently wrong on the first pass can be just as confidently wrong on the review; asked to check its own work, it may simply rubber-stamp the same mistake. Reflection reliably catches the errors a fresh, critical reading would catch, the thin answers, the missed cases, the skipped checks. It does not reliably catch errors the model cannot see in itself, the way you cannot proofread a fact you do not know is wrong.

Two practical consequences follow. First, reflection has diminishing returns: one good review pass catches most of what review will catch; a third and fourth pass mostly burn tokens and time. Second, reflection pairs well with external checks, running the code, verifying against a source, getting a result back from a tool, because those bring in information the model did not already have. The strongest self-check combines the model’s own review with a real signal from the world.

Common pitfalls

Trusting reflection to catch everything. A self-check catches the errors a critical re-read would catch, not the ones the model is blind to. It lowers the error rate; it does not zero it.
Reflecting endlessly. One good review pass does most of the work. Stacking many reflection rounds mostly adds cost and latency for shrinking returns.
Reaching for a second agent first. Reflection is the cheaper reliability move. Try it before paying the coordination cost of adding a reviewer agent.
Reflecting without a real signal when one is available. If you can run the code, query the source, or check the tool result, that external signal beats pure self-review. Combine them.
Confusing fluent self-justification with checking. A model can produce a confident-sounding rationale for a wrong answer. Useful reflection looks for what is wrong, not for reasons the answer is fine.

What you should remember

Metacognition is an agent thinking about its own thinking: a deliberate reflection step where it reviews its answer or plan before committing, and revises if the review finds a problem.
It works for the same reason editing works: a first draft is usually good but flawed, and a critical re-read catches a real fraction of the flaws before they ship.
You have seen special cases already: tool-failure recovery (Lesson 2), re-searching after a weak retrieval (Lesson 6), and replanning (Lesson 7) were all reactive self-correction. Reflection is the proactive, general form.
It is the cheap reliability move. A reflection step costs one extra reasoning pass; a reviewer agent costs coordination. Try reflection before adding an agent.
A second look is not a guarantee. Reflection catches what a critical re-read catches, not what the model is blind to. It has diminishing returns, and it pairs best with a real external signal (run the code, check the source).

This lesson closes Phase 2, the design patterns that make agents work: tools, frameworks, memory, retrieval, planning, multiple agents, and now self-checking. The next phase changes the question. So far we have asked how to make an agent capable. The remaining lessons ask how to make an agent you can trust and ship: how it fails, how it is attacked, and how it goes to production. That is a shift from building agents that work to building agents that are safe to put in front of real users, and the next lesson, on trustworthy agents, begins it.