Cheatsheet: Agents
The agent loop
Section titled “The agent loop”The lesson-4 tool-use loop with the model deciding when to stop.
def run_agent(task): history = [task] while True: step = model.predict(history) # what should we do next? if step.is_final_answer(): return step.answer result = execute_tool(step.tool_call) # do it history.append(step) history.append(result)Model called once per step. Model decides “are we done yet.” That’s the whole topic.
Three foundational patterns
Section titled “Three foundational patterns”| Pattern | When to reach for it |
|---|---|
| Function-calling agent | Default in 2026. Structured tool call or final answer; hosted API enforces JSON schema. Reach for first. |
| ReAct | Free-text Thought/Action/Observation. Predecessor; messier parser; still in literature. |
| Plan-and-execute | Plan up front; execute step-by-step. For actions with real-world cost where you want to verify intent first. |
Variants (not foundational): memory-augmented (persist facts across runs), multi-agent (specialized sub-agents pass control). Add only when forced.
Three tests for “should this be an agent?”
Section titled “Three tests for “should this be an agent?””All three must be yes:
1. Variable shape : steps depend on earlier results; cannot be written in advance.2. Real bounded tools : 3 to 10 with clear contracts; not vague unbounded ("browse the web freely").3. Acceptable cost/lat. : agents multiply L2 productive limits; user willing to wait OR work hidden.If any test fails: a single call / RAG (L4) / hand-coded pipeline is almost always better.
Most common production mistake: using an agent where a single call would do.
Five engineering failure modes
Section titled “Five engineering failure modes”| Failure | Mitigation |
|---|---|
| Loops (same tool, same args, repeated) | Hard iteration cap + no-identical-call guard; surface to model |
| Wrong paths (early bad decision; doubles down) | Re-planning checkpoints; trajectory-level eval set |
| Compound cost (history grows; ~steps² not steps) | Summarize older history; cap context; smaller models for inner steps |
| Harder evaluation (behavior is a tree) | Trajectory-level eval (success rate, avg steps, cost over time) |
| Brittle tool boundaries (sensitive to name/schema) | Version tool defs like prompts (L3); A/B test tool descriptions |
Cost shape
Section titled “Cost shape”Naive estimate: steps × single_call_costActual cost: closer to (steps)² because context grows each stepA 6-step agent processing 1K → 8K tokens of context is closer to 4-5x the naive “6 × single-call” number. Lesson 2’s three productive limits (context, cost, latency) all multiply.
Build practices (lesson 7 LLMOps scaled)
Section titled “Build practices (lesson 7 LLMOps scaled)”- Function-calling + 3-5 well-defined tools. No multi-agent, no memory, no plan-and-execute until forced.
- Cap iterations and identical calls.
max_steps6-12; no same-tool-same-args twice in a row. Error visibly. - Log every step. Lesson 7 schema +
tool_called, tool_arguments, tool_result_summary, step_number, total_steps. - Evaluate at trajectory level. Held-out set with expected final answer AND expected tool-call patterns.
- Observability before scale. Dashboard of live trajectories (which tools fire, where loops happen, where dead ends accumulate).
Tooling
Section titled “Tooling”| Tool | What it is |
|---|---|
| Anthropic / OpenAI / Google APIs | Hosted function-calling endpoints; structured tool contracts enforced server-side |
| Agent frameworks | LangChain, LangGraph, LlamaIndex, others; the agent loop + retries + observability glue |
| Trajectory observability | Vendor tools (LangSmith, Arize, Helicone) or homegrown structured logging |
Frameworks evolve fast; the patterns in this lesson age slower than any specific library.
When to consider what
Section titled “When to consider what”| Symptom / task | Reach for |
|---|---|
| Single text in, single response out | Single LLM call |
| Needs current data / domain knowledge | RAG (L4) |
| Multi-step but steps are fixed and known | Hand-coded pipeline (with LLM calls inside) |
| Multi-step with variable shape, bounded tools, acceptable latency | Agent (this lesson) |
| “Open-ended, unbounded tools, browse anything” | Tighten scope first, vague tools produce vague failures |
What this lesson does NOT cover
Section titled “What this lesson does NOT cover”- Agent autonomy
- Agent safety
- Agent alignment debates / contested safety claims
- What agents should be allowed to do (policy)
- Sector-specific compliance for agent deployment
Real and important; require their own framing in their own forum with the right stakeholders. This lesson is the engineering (when / how / cost / what goes wrong) discipline.
Words to use precisely
Section titled “Words to use precisely”- Agent: the tool-use loop with the model deciding when to stop.
- Step: one (model decision + tool execution + observation) iteration of the loop.
- Trajectory: the full sequence of steps a single agent run produces; the unit of evaluation.
- Function calling: the structured-tool API contract (JSON schema enforced) that the model produces.
- ReAct: the free-text “Thought / Action / Observation” pattern; predecessor to structured function calling.
- Plan-and-execute: two-phase pattern where the model produces a plan before executing.
- max_steps: hard cap on loop iterations; essential safety control.
Source
Section titled “Source”- Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Agents (guest: Harrison Chase, LangChain).
fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.