Summary: Agents
An LLM agent is the lesson-4 tool-use loop with the model deciding when to stop. The model is called once per step (not once total); it decides what to do next, including whether to emit a final answer; the loop continues until it does. Every pattern, every failure mode, every operational discipline falls out of that one shift. Three foundational patterns: function-calling agents (the reliable default in 2026; reach for first), ReAct (free-text predecessor; still in literature), plan-and-execute (when you want to inspect intent before actions with real-world cost). Reach for an agent only when variable shape + real bounded tools + acceptable cost/latency are all true; otherwise a single call, RAG, or hand-coded pipeline is better. The most common production mistake is using an agent where a single call would do. Five engineering failure modes are the patterns you debug: loops, wrong paths, compound cost (history grows; ~steps² not steps × cost), harder evaluation (trajectory-level), brittle tool boundaries. Build practices: function-calling first; hard iteration + identical-call caps; trajectory-level logs; trajectory-level evaluation; observability before scale. Lesson 7’s LLMOps discipline scales here with the test set strictly more expensive. Taught technical-primer throughout: WHAT, WHEN, WHAT-GOES-WRONG, HOW; agent-autonomy and contested-alignment debates explicitly out of scope.
Core ideas
Section titled “Core ideas”- An agent is the L4 tool-use loop with the model deciding when to stop. Same four steps; model decides “are we done yet.” That tiny shift generates the entire topic.
- Three patterns. Function-calling (default in 2026; structured tools enforced by hosted API). ReAct (free-text predecessor). Plan-and-execute (verify intent before action).
- Three tests for agent-or-not (all yes): variable shape + real bounded tools (3-10 with clear contracts) + acceptable cost and latency. Otherwise a simpler choice wins.
- Most common mistake: using an agent where a single call would do. Build the simpler version first.
- Five engineering failure modes. Loops; wrong paths; compound cost (steps² not steps); harder evaluation; brittle tool boundaries. Each has a specific mitigation.
- Cost shape. Agents do not cost (steps) × (single-call); closer to (steps)² because context grows each step. L2’s three productive limits all multiply.
- Evaluation lives at trajectory level. Held-out set needs expected final answer AND expected tool-call patterns. Strictly more expensive than L7’s flat-call eval.
- Two essential caps.
max_steps(6 to 12); no-identical-call-twice guard. Both error visibly. - Build discipline. Function-calling first; iteration caps; trajectory logs; trajectory eval; observability before scale. L7 discipline scales.
- Out of scope. Agent autonomy, agent safety, agent alignment debates, what agents should be allowed to do, sector-specific compliance. Different forum, different stakeholders. Same discipline as T14 L12, T15 L14, and earlier in this track (L6, L7, L9).
What changes for you
Section titled “What changes for you”Most apps that should be agents are not, and most apps that try to be agents should not be. The first half of that sentence is the smaller problem; the second half is what kills budgets. Knowing the three tests (variable shape, bounded tools, acceptable cost) lets you push back on “let’s make it an agent” proposals at design time, before the team has committed to building something a single call could do. When you do build an agent, the discipline is the same operational machinery you already have from lesson 7, scaled up: log everything (now at trajectory level, not per-call), evaluate everything (now with trajectory-level expectations), monitor everything (now with the loop and dead-end patterns visible in the dashboard), and cap the things that can run away (iteration count, identical-call guards). The model’s role becomes “planner inside a loop with a hard ceiling,” and the engineering’s role is to make the ceiling and the visibility and the evaluation real. The lesson 11 capstone closes Phase 3 and the track with the industry-perspective view of where production LLM applications are heading.
Agents are a specific tool in the production-LLM toolkit, the right answer to “tasks with variable shape that need multiple tool calls a user is willing to wait for”; the wrong answer to almost everything else. Know the three tests, know the five failure modes, build the simpler version first, scale the LLMOps discipline you already have.