Summary: From single call to agent loop

Phase 3 opens here. Lessons 1-7 covered single-call mechanics across L4 custom tools + L5 Anthropic-provided tools + L6 MCP + L7 cost and staleness. This lesson is the transition to multi-turn loops where the model decides the next step. Workflow vs agent (Anthropic verbatim, Building Effective AI Agents, Erik S. and Barry Zhang, 2024-12-19): a workflow is systems where LLMs and tools are orchestrated through predefined code paths; an agent is systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. The difference is who decides the next step: in a workflow your code decides, in an agent the model decides. Standing call: find the simplest solution possible, and only increasing complexity when needed. The augmented LLM building block underneath both: LLM + retrieval + tools + memory; maximize the single call before reaching for a loop. The canonical loop is small: a while bounded by max_iterations that calls messages.create, appends the assistant turn to messages, then dispatches on stop_reason. The full stop_reason vocabulary: end_turn (return), tool_use (execute + append tool_result + iterate), pause_turn (L5; re-call with assistant turn unchanged), max_tokens, stop_sequence, model_context_window_exceeded (L7), “compaction” (L7 with pause_after_compaction: true), refusal (model declined on safety; stop_details.category on the response carries the category; surface, do not blind-retry). Steering with tool_choice: auto (default; model decides), any (must call a tool), tool with a specific name (must call the named tool), none (must not call). The forcing modes cost slightly more in auto-injected system-prompt tokens. Four disciplines around any agent loop: hard max-iterations cap; tool inventory is the surface area (sandbox L5 computer-use; denylist destructive L6 MCP operations; auth + rate limits at the execute_tool boundary); the L7 cost-and-staleness levers stay engaged (cache the prefix; compact at 150K with cached system end; tool result clearing for tool-heavy loops); explicit dispatch on every stop_reason (no silent fall-through). Framework call: Start by using LLM APIs directly: many patterns can be implemented in a few lines of code. The 30-line loop is the proof. L9 (next) catalogs the canonical patterns on this substrate; L10 / L11 add Agent Skills and Subagents; L12 ships it.

Core ideas

Workflow vs agent (verbatim). Workflow = predefined code paths; agent = dynamically direct their own processes. Who decides the next step is the difference. Start with workflow; graduate to agent only when the task’s shape demands it.
The augmented LLM = model + retrieval + tools + memory. Maximize what a single call can do (tools tight, retrieval targeted, memory shaped) before adding looping.
The canonical loop in 30 lines. Append the assistant turn after each response. On tool_use, dispatch each tool_use block to a local handler, package results as tool_result entries on a follow-up user turn, iterate. On pause_turn, re-call with the assistant turn unchanged. On other stop reasons, surface to the caller. Cap at a max_iterations bound.
The full stop_reason vocabulary. end_turn, tool_use, pause_turn (L5), max_tokens, stop_sequence, model_context_window_exceeded (L7), “compaction” (L7, opt-in), refusal (safety decline; stop_details.category on the response). Dispatch every value explicitly; silent fall-through is the production-failure path.
tool_choice steering. auto (default; model decides), any (must call a tool), tool with a specific name (must call this tool), none (must not call). System-prompt nudges are the soft lever; tool_choice is the hard guarantee. any and tool cost slightly more in auto-injected system-prompt tokens than auto and none.
Four loop disciplines. Hard max_iterations cap. Tool inventory is the surface area (sandbox dangerous tools; denylist destructive ones; auth at the execute boundary). L7 levers stay engaged (cache the prefix; compaction at 150K with cached system end; tool result clearing for tool-heavy loops). Explicit dispatch on every stop_reason.
Direct API first. Per the source post: Start by using LLM APIs directly: many patterns can be implemented in a few lines of code. The 30-line loop is the proof; reach for a framework only when patterns repeat AND you understand the underlying code.
Where this fits. Phase 3 opener. L9 catalogs the canonical patterns the loop takes (the “six effective-agent patterns”: 5 workflow + the open-ended agent itself). L10 adds Agent Skills + Claude Code. L11 adds Subagents + Claude Managed Agents. L12 ships the result to production.

What changes for you

Before this lesson, every Claude call you wrote was one round-trip: one request, one response, even if that response contained server-tool or MCP results inline. After this lesson, the canonical agent loop is a 30-line while-with-dispatch that turns the same Messages API into a multi-turn planner where the model decides the next step. The single highest-leverage move this week: before you reach for an agent framework, write the loop directly against the SDK. The cost is 30 lines and one execute_tool dispatcher; the payoff is complete visibility into every decision the model is making and every dispatch your code is making. The second-highest: dispatch on EVERY stop_reason, not just end_turn and tool_use; the failure mode of “the agent stopped and I do not know why” comes from silent fall-through on max_tokens / pause_turn / model_context_window_exceeded. The third-highest: the L7 cost-and-staleness levers are not optional in a loop; cache the prefix, compact at 150K with a cached system end so the system survives, and add tool result clearing for tool-heavy loops. With these three habits in place, the rest of Phase 3 (the canonical patterns in L9, Agent Skills in L10, Subagents in L11, shipping in L12) sits cleanly on top of the same loop.