Skip to content

Summary: Agents that self-check: metacognition

Metacognition is an agent thinking about its own thinking: a reflection step where it checks its own work before committing. The previous lesson raised reliability by adding more agents and paying the coordination cost. This lesson raises it with a cheaper move: one agent reviews its own answer or plan, asks whether it is actually right, and revises if not. That single extra reasoning pass buys a lot of reliability per unit of effort. This summary is the scan-in-five-minutes version of the full lesson.

  • Reflection works for the same reason editing works. A first draft is usually good but flawed (a missed case, a wrong assumption, a step taken too fast). Reading it back critically before committing catches a real fraction of those flaws. The model that generates and the model that reviews are the same model; the difference is the stance. Generating reaches for a plausible answer; reviewing looks for what is wrong with the answer already on the page.
  • Reflection takes a few forms, all sharing one stance (a deliberate pause to look for what is wrong, before committing): critiquing an answer before sending it, reviewing a plan before executing it, and verifying a checkable result (code, a calculation, structured output) against the criteria a correct result must satisfy. Put the pause where the costly mistakes happen.
  • You have seen special cases already. Tool-failure recovery (L2), re-searching after a weak retrieval (L6), and replanning when a step contradicts the plan (L7) were all self-correction triggered by an external signal. Reflection is the proactive, general form: a check-my-work step the agent runs on purpose, even when nothing has visibly failed.
  • It is the cheap reliability move. To make an agent more reliable you can add a reviewer agent and pay coordination cost (another agent, a handoff, more latency), or you can have the one agent reflect, which costs a single extra reasoning pass and no new coordination. For a large fraction of reliability problems, reflection is the better trade. Try it before reaching for a second agent.
  • A second look is not a guarantee. A model confidently wrong on the first pass can be just as confidently wrong on review, rubber-stamping the same mistake. Reflection catches what a critical re-read catches, not what the model is blind to; you cannot proofread a fact you do not know is wrong.
  • Two practical consequences. Reflection has diminishing returns (one good pass does most of the work; more passes mostly burn tokens), and it pairs best with a real external signal (run the code, check the source, read the tool result) that brings in information the model did not already have.

Before this lesson, “the agent double-checks itself” sounded like a vague reassurance. Now you can see the mechanism and its price: a reflection step is one extra reasoning pass that catches the errors a critical re-read would catch, no more and no less. When you want an agent to be more reliable, you have a clear first move (have it check its own work) and a clear test for when that is not enough (does the error need information or independence the agent does not have?). And you can spot the honest limit the marketing skips: a self-check lowers the error rate, but it does not zero it, and it is strongest when paired with a real signal from the world.