Skip to content

Cheatsheet: From single call to agent loop

Workflow (Anthropic verbatim): systems where LLMs and tools are orchestrated through predefined code paths. Agent (verbatim): systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. Difference: who decides the next step. Default: find the simplest solution possible, and only increasing complexity when needed. Workflow for predictability + consistency on well-defined tasks; agent for flexibility + model-driven decisions at scale.

LLM + retrieval + tools + memory. Make the single call as strong as possible (tools tight, retrieval targeted, memory shaped) BEFORE reaching for a loop.

def run_agent(client, system, tools, user_message, max_iterations=20):
messages = [{"role": "user", "content": user_message}]
for _ in range(max_iterations):
response = client.messages.create(
model="claude-opus-4-7", max_tokens=4096,
system=system, tools=tools, messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
return response
if response.stop_reason == "tool_use":
results = [
{"type": "tool_result", "tool_use_id": b.id,
"content": execute_tool(b.name, b.input)}
for b in response.content if b.type == "tool_use"
]
messages.append({"role": "user", "content": results})
continue
if response.stop_reason == "pause_turn":
continue # server-tool mid-loop yield; re-call
if response.stop_reason == "refusal":
return response # model declined; stop_details.category carries reason; surface, don't blind-retry
return response # max_tokens / stop_sequence / model_context_window_exceeded / "compaction" / etc.
raise RuntimeError("agent exceeded max_iterations")

Three things this does that a single call does not: appends the assistant turn (model sees its own prior decisions); dispatches tool_use blocks; bounds the iterations.

stop_reasonWhere introducedLoop action
end_turnL1Return to caller
tool_useL4Execute each tool_use block; append tool_result entries on a user turn; iterate
pause_turnL5Server-tool mid-multi-iteration; re-call with assistant turn unchanged
max_tokensL1Output cap hit; raise / summarize / surface partial
stop_sequenceL2Configured sequence triggered; often treat as end_turn with known reason
model_context_window_exceededL7Window cap; compact (if opted in) or fail clearly
”compaction”L7 (pause_after_compaction: true)Summary written; preserve last N turns; re-call
refusalL8 (safety decline)stop_details.category carries the category; surface to caller; do NOT blind-retry the same prompt

Discipline: dispatch every value explicitly. Silent fall-through = “the agent stopped and I do not know why.”

ModeMeaningUse for
{"type": "auto"} (default)Model decides per turnAgents (the point is the model deciding)
{"type": "any"}Must call a tool, model picksWorkflow steps where a call is required
{"type": "tool", "name": "X"}Must call named toolDeterministic workflow steps
{"type": "none"}Must not call any toolForced final-answer turns

Cost note (auto-injected tool-use system-prompt tokens):

Modelauto / noneany / tool
Opus 4.8290410
Opus 4.7675804
Opus 4.6 / Sonnet 4.6497589
Opus 4.5 / Sonnet 4.5 / Haiku 4.5496588
Opus 4.1313315
Haiku 3.5 (Vertex/Bedrock only)264355

Soft lever: system-prompt nudges (“use the tools to investigate before responding” increases tool use; “always call a tool first” is stronger). Hard guarantee: tool_choice.

DisciplineWhy
Hard max_iterations capA runaway plan without a cap is a runaway bill
Tool inventory is the surface areaEvery tool the loop has access to is a thing the model can decide to call. Sandbox L5 computer-use; denylist destructive L6 MCP; auth + rate limits at execute_tool boundary
L7 levers stay engagedCache the prefix; compact at 150K with cached system end (so system survives); tool result clearing for tool-heavy loops
Explicit stop_reason dispatchSilent fall-through is the production-failure path

Verbatim: Start by using LLM APIs directly: many patterns can be implemented in a few lines of code | If frameworks are used, ensure you understand the underlying code.

Posture: the 30-line loop is the production starting point. Reach for a framework only when patterns repeat AND you understand the underlying code.

FailureRecognize byFix
Loop forgets to append assistant turnModel loses its own prior decisions; loops in circlesAppend {"role": "assistant", "content": response.content} after each messages.create
Silent fall-through on a stop reasonAgent quietly returns partial / wrong answerExplicit dispatch on every stop_reason (see table above)
No max_iterations capRunaway loop; surprise billHard cap (20 is a typical default); raise on exceed
Server tool stalls and loop spinspause_turn observed but treated as an errorRe-call with assistant turn unchanged on pause_turn
Tool block dispatch crashes loopException inside execute_tool terminates iterationWrap execute_tool in try/except; return error as tool_result with is_error: true
Loop prefix not cachedPer-iteration input bill scales linearly with turnsAdd cache_control on system + tool stack per L7
Loop runs long, hits context limitstop_reason: model_context_window_exceededOpt in to compaction (L7) with 150K trigger and cached system end

What this lesson does NOT cover (and where to find it)

Section titled “What this lesson does NOT cover (and where to find it)”
TopicLands at
The six canonical workflow + agent patternsLesson 9
Agent Skills + Claude CodeLesson 10
Subagents + Claude Managed AgentsLesson 11
Production observability for the loop (cost tracking, latency)Lesson 12