Skip to content

Cheatsheet: Agents

The lesson-4 tool-use loop with the model deciding when to stop.

def run_agent(task):
history = [task]
while True:
step = model.predict(history) # what should we do next?
if step.is_final_answer():
return step.answer
result = execute_tool(step.tool_call) # do it
history.append(step)
history.append(result)

Model called once per step. Model decides “are we done yet.” That’s the whole topic.

PatternWhen to reach for it
Function-calling agentDefault in 2026. Structured tool call or final answer; hosted API enforces JSON schema. Reach for first.
ReActFree-text Thought/Action/Observation. Predecessor; messier parser; still in literature.
Plan-and-executePlan up front; execute step-by-step. For actions with real-world cost where you want to verify intent first.

Variants (not foundational): memory-augmented (persist facts across runs), multi-agent (specialized sub-agents pass control). Add only when forced.

Three tests for “should this be an agent?”

Section titled “Three tests for “should this be an agent?””

All three must be yes:

1. Variable shape : steps depend on earlier results; cannot
be written in advance.
2. Real bounded tools : 3 to 10 with clear contracts; not vague
unbounded ("browse the web freely").
3. Acceptable cost/lat. : agents multiply L2 productive limits;
user willing to wait OR work hidden.

If any test fails: a single call / RAG (L4) / hand-coded pipeline is almost always better.

Most common production mistake: using an agent where a single call would do.

FailureMitigation
Loops (same tool, same args, repeated)Hard iteration cap + no-identical-call guard; surface to model
Wrong paths (early bad decision; doubles down)Re-planning checkpoints; trajectory-level eval set
Compound cost (history grows; ~steps² not steps)Summarize older history; cap context; smaller models for inner steps
Harder evaluation (behavior is a tree)Trajectory-level eval (success rate, avg steps, cost over time)
Brittle tool boundaries (sensitive to name/schema)Version tool defs like prompts (L3); A/B test tool descriptions
Naive estimate: steps × single_call_cost
Actual cost: closer to (steps)² because context grows each step

A 6-step agent processing 1K → 8K tokens of context is closer to 4-5x the naive “6 × single-call” number. Lesson 2’s three productive limits (context, cost, latency) all multiply.

  1. Function-calling + 3-5 well-defined tools. No multi-agent, no memory, no plan-and-execute until forced.
  2. Cap iterations and identical calls. max_steps 6-12; no same-tool-same-args twice in a row. Error visibly.
  3. Log every step. Lesson 7 schema + tool_called, tool_arguments, tool_result_summary, step_number, total_steps.
  4. Evaluate at trajectory level. Held-out set with expected final answer AND expected tool-call patterns.
  5. Observability before scale. Dashboard of live trajectories (which tools fire, where loops happen, where dead ends accumulate).
ToolWhat it is
Anthropic / OpenAI / Google APIsHosted function-calling endpoints; structured tool contracts enforced server-side
Agent frameworksLangChain, LangGraph, LlamaIndex, others; the agent loop + retries + observability glue
Trajectory observabilityVendor tools (LangSmith, Arize, Helicone) or homegrown structured logging

Frameworks evolve fast; the patterns in this lesson age slower than any specific library.

Symptom / taskReach for
Single text in, single response outSingle LLM call
Needs current data / domain knowledgeRAG (L4)
Multi-step but steps are fixed and knownHand-coded pipeline (with LLM calls inside)
Multi-step with variable shape, bounded tools, acceptable latencyAgent (this lesson)
“Open-ended, unbounded tools, browse anything”Tighten scope first, vague tools produce vague failures
  • Agent autonomy
  • Agent safety
  • Agent alignment debates / contested safety claims
  • What agents should be allowed to do (policy)
  • Sector-specific compliance for agent deployment

Real and important; require their own framing in their own forum with the right stakeholders. This lesson is the engineering (when / how / cost / what goes wrong) discipline.

  • Agent: the tool-use loop with the model deciding when to stop.
  • Step: one (model decision + tool execution + observation) iteration of the loop.
  • Trajectory: the full sequence of steps a single agent run produces; the unit of evaluation.
  • Function calling: the structured-tool API contract (JSON schema enforced) that the model produces.
  • ReAct: the free-text “Thought / Action / Observation” pattern; predecessor to structured function calling.
  • Plan-and-execute: two-phase pattern where the model produces a plan before executing.
  • max_steps: hard cap on loop iterations; essential safety control.
  • Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Agents (guest: Harrison Chase, LangChain). fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.