Skip to content

How agent loops work

This is the closer of Phase 6, How models reason and act, in Track 5 (AI Foundations). The previous lesson covered function calling: a single round trip where the LLM emits a structured call, code runs it, and the LLM formats the result. This lesson covers what changes when one round trip is not enough. An agent is a tool-using LLM that loops: it observes the current state, plans the next step, takes an action (often a tool call), sees what happened, and repeats until the goal is met. This iteration-plus-reasoning combination is what people mean when they call something an “AI agent.” We cover the observe-plan-act pattern (and its naming variants from the ReAct paper), a worked teddy-bear-temperature example traced through the loop, multi-agent systems and the agent-to-agent (A2A) protocol that Google released in 2025, the cumulative-error multiplier that limits how far long-horizon agents can go, and the safety threads woven through every agentic system: data exfiltration, prompt injection, and tool misuse, plus the training-stage and inference-stage remediations that defend against them. Course materials are at cme295.stanford.edu.

This is the closer of Phase 6, How models reason and act. The previous lessons covered reasoning models (long internal reasoning chains as part of the policy), RAG (fetching unstructured text), and function calling (one structured tool call). This lesson combines them: an agent uses tools (function calls), often runs reasoning chains internally between calls (reasoning-model-style), and may fetch documents (RAG) along the way. The loop is what makes it agentic. After this lesson, Phase 6 is complete. Phase 7 covers how the field evaluates all of this, where the frontier is heading, and a safety recap closing out the track.

Prerequisites: the function-calling lesson is required. We assume you understand the three-stage mechanism for a single tool call (LLM picks function, code runs it, LLM formats response). The reasoning models lesson is useful since modern agents often run reasoning-style chains between tool calls.

  • Define an agent (a tool-using LLM that loops) and distinguish it from single-call tool use
  • Walk through the observe-plan-act loop and recognize variant naming (ReAct’s think-observe-act, etc.)
  • Apply the cumulative-error multiplier to estimate the reliability of a multi-step agent
  • Identify the major safety threads (data exfiltration, prompt injection, tool misuse) and the two classes of remediation (training-stage and inference-stage)
  • Recognize when a system you encounter is or isn’t an agent in the strict sense
  • Read time: about 13 minutes
  • Practice time: about 12 minutes (a self-check on the agent definition and the observe-plan-act loop, a hands-on exercise tracing a longer agent flow with multiple tool calls, and flashcards)
  • Difficulty: standard