Agents that self-check: cheatsheet

The one idea

Metacognition is an agent thinking about its own thinking: a deliberate reflection step where it reviews its answer or plan before committing, and revises if the review finds a problem.

Why it works

A first draft is usually good but flawed (missed case, wrong assumption, a step too fast). A critical re-read catches a real fraction of those flaws before they ship. Generating reaches for a plausible answer; reviewing looks for what is wrong with the answer already on the page. Same model, different stance, run in sequence.

Worked example

Illustrative numbers, used to show the reflection step in action.

TASK: cheapest flight NYC -> Tokyo next month.
WITHOUT reflection: "$612 via one stop." (thin)
WITH reflection: "I optimized for price only; confirm bookable, note layover."
  -> re-checks: $612 has a 14h layover; surfaces $680 with 2h; presents both.

The first answer was not wrong, just thin. The self-check caught that before the user saw it.

You have seen this before

Reflection is the general form of self-correction that appeared in pieces:

Lesson	Trigger	Self-correction
L2 (tool use)	Tool call failed	read error, retry
L6 (agentic RAG)	Weak retrieval	judge insufficient, re-search
L7 (planning)	Step contradicts plan	replan
L9 (this lesson)	(none needed)	proactive check-my-work step

L2/L6/L7 reacted to an external signal. Reflection is proactive: the agent interrogates its own output before any signal says it must.

The cheap reliability move

Add a reviewer agent	Add a reflection step
Another agent + a handoff + latency (L8 coordination cost)	One extra reasoning pass, no new coordination

Try reflection before adding a second agent. Add the agent only when reflection is genuinely not enough.

The honest limit

A second look is not a guarantee.

A model confidently wrong on the first pass can rubber-stamp the same error on review.
Reflection catches what a critical re-read catches (thin answers, missed cases), not what the model is blind to.
Diminishing returns: one good pass does most of the work; more passes mostly burn tokens.
Pairs best with a real external signal (run the code, check the source, read the tool result), which brings in information the model did not already have.

Pitfalls to dodge

Trusting reflection to catch everything (it lowers the error rate, does not zero it).
Reflecting endlessly (one good pass; the rest is cost).
Reaching for a second agent first (reflection is cheaper).
Skipping a real external signal when one is available.
Confusing fluent self-justification with checking (look for what is wrong, not reasons it is fine).

Words to use precisely

Metacognition / reflection: an agent reviewing its own output or plan before committing.
Self-correction: adjusting after noticing something is wrong (reactive in L2/L6/L7, proactive in reflection).
External signal: information from outside the model (a run result, a source) that a self-check can verify against.