Many agents working together: multi-agent systems

Last lesson ended on a plan whose steps looked like separate jobs: find flights, book lodging, build an itinerary. That raises a natural question. Should one agent do all of those, or should each job go to a different agent built specifically for it, a flights agent, a lodging agent, an itinerary agent, working together?

That is the idea behind multi-agent systems: instead of one agent handling everything, several specialized agents divide the work and coordinate. It is one of the most-hyped patterns in the field right now, which is exactly why this lesson spends as much time on when not to use it as on when to. The honest framing is not “multi-agent systems are the advanced way to build agents.” It is “splitting work across agents helps for some tasks and hurts for others, and you need to be able to tell which is which.”

By the end you will be able to judge when multiple specialized agents genuinely beat one well-designed generalist, and you will be able to name the cost that the marketing usually leaves out.

What a multi-agent system is

A multi-agent system is several agents, each with its own role, its own instructions, and often its own focused set of tools, working on parts of a larger task. Usually one of two shapes holds them together: a coordinator agent that breaks up the work and delegates pieces to specialists, or a pipeline in which each agent does its part and hands off to the next.

Each agent is still exactly what the earlier lessons described: a model in a loop with tools. Nothing about the individual agent changes. What is new is that there is more than one of them, and they have to communicate.

Why you might want several

The appeal is real, and it comes in three forms.

Specialization. A focused agent with a tight tool set and clear, narrow instructions does its piece more reliably than a generalist juggling everything. This connects directly to the tool-definition lesson: an agent with five carefully described tools picks among them more reliably than one agent with fifty. Splitting the work can be a way of keeping each agent’s job small enough to do well.
Parallelism. Sub-tasks that do not depend on each other can run at the same time across different agents, which a single agent working one step at a time cannot do.
Modularity. Separate agents are easier to build, test, and replace independently. A bug in the itinerary agent does not require reasoning about the flights agent.

These are the advantages most write-ups lead with, and they are genuine. They are also only half the picture.

The cost the brochures leave out: coordination

Here is the part that gets quietly dropped from the multi-agent sales pitch. Every agent you add buys its specialization with coordination cost, and that cost is not small.

Communication overhead. Agents have to pass information to each other: the goal, the constraints, what has been done, what each one found. Every handoff is a chance to lose or distort context. Two agents that each understand the task perfectly can still fail at the seam between them.
More failure points. One agent has one place to go wrong. Five agents have five, plus the connections between them. The system is only as reliable as its shakiest handoff.
Who-decides-what ambiguity. With several agents, something has to decide which agent handles what, what happens when two disagree, and when the whole thing is done. That coordination logic is itself a hard problem, and it is easy to underestimate.
Harder to debug, slower to run. When a multi-agent system gives a bad answer, you have to trace which agent, which handoff, which message. And every round of inter-agent communication adds latency.

The blunt consequence: many tasks are better served by one well-designed generalist agent. A single agent with a good set of tools and clear instructions has no seams to lose information across and one place to debug. Multi-agent is not the graduation you reach when your agent grows up. It is a specific tradeoff you take on purpose.

The real question: specialists or one generalist

So the question is never “are multi-agent systems better than single agents.” It is “does this task split into distinct enough jobs that specialized agents beat one well-described generalist, and is that worth the coordination cost?”

Multiple agents tend to win when the task genuinely decomposes into different specialties (jobs that need different tools, different knowledge, different instructions), when sub-tasks can run in parallel, or when one agent’s combined tool set would be so large it confuses itself. That last case is the tool-definition lesson at the system level: if a single agent has so many tools it picks the wrong one, splitting it into specialists with smaller toolboxes can be the fix.

One generalist tends to win when the task is mostly sequential, when the pieces share so much context that handing it between agents loses more than specialization gains, or simply when the task is small enough that any coordination at all is overhead.

Worked example: the same task, one agent versus a team

TASK: "Produce a short research brief on a topic."

ONE GENERALIST AGENT:
  one loop: search -> read -> draft -> done.
  Simple, one place to debug. Fine for a brief that one agent can hold.

A TEAM (coordinator + researcher + writer):
  coordinator -> assigns "gather sources" to researcher
  researcher  -> searches, returns findings to coordinator
  coordinator -> passes findings to writer
  writer      -> drafts the brief, returns it
  Buys: a researcher tuned for search, a writer tuned for prose, run in parallel
        across topics.
  Costs: three agents to build, two handoffs where findings can be garbled,
         a coordinator deciding who does what, and more latency per round.

For a single short brief, the generalist is the better engineering choice; the team’s coordination cost is not repaid. For a newsroom producing fifty briefs a day across specialties, the team’s specialization and parallelism start to pay. The task, not the prestige of the pattern, decides.

A practical path: start with one, split when you hit a wall

The cleanest way to get the tradeoff right is to not decide it upfront. Build one generalist agent first. Run it. Then split into multiple agents only when you hit a concrete wall that splitting actually solves: the agent’s toolbox has grown so large it picks the wrong tool, or you need sub-tasks to run in parallel and one agent cannot, or two parts of the job genuinely need different knowledge and instructions that fight each other inside one agent. Each of those is a specific problem with a specific reason multi-agent fixes it. Splitting because the architecture diagram looks more impressive is not on the list. Let the wall you actually hit, not the pattern’s reputation, be the thing that pushes you to more than one agent.

Zooming out: compound systems

Multi-agent design is one instance of a broader idea. In his Berkeley lecture on compound AI systems, Omar Khattab frames modern AI applications as systems built from multiple components (models, retrievers, tools, and agents) composed together, rather than a single model doing everything. Seen that way, a multi-agent system is one shape of compound system, and the same engineering question applies to all of them: each component you add buys some capability and costs some coordination, and good design is spending that tradeoff deliberately.

Common pitfalls

Reaching for multi-agent by default. It is the hyped pattern, so it gets used where one generalist would be simpler and more reliable. Start with one agent; split only when the task demands it.
Ignoring coordination cost. Specialization is the visible benefit; communication overhead, extra failure points, and latency are the hidden bill. Count both.
Underestimating the coordination logic. Deciding who does what, handling disagreement, and knowing when the task is done is itself a hard problem, not a free wrapper around the agents.
Confusing more agents with more capability. A five-agent system is not smarter than a one-agent system; it is more specialized and more complex. Those are different things.
Losing context at the seams. The most common multi-agent failure is not a bad agent but a bad handoff, where one agent fails to pass a constraint the next one needed.

What you should remember

A multi-agent system is several specialized agents that coordinate, usually via a delegating coordinator or a handoff pipeline. Each agent is still just a model in a loop with tools.
The advantages are specialization, parallelism, and modularity. A focused agent with a small toolbox does its piece more reliably than an overloaded generalist.
The cost is coordination: communication overhead, more failure points, who-decides-what complexity, harder debugging, more latency. The source material often omits this; do not.
The real question is fit, not ranking: do the task’s jobs split cleanly enough that specialists beat one generalist, and is that worth the coordination cost? Many tasks are better as one well-designed generalist agent.
Multi-agent is one shape of compound system. Every component you add buys capability and costs coordination; good design spends that tradeoff on purpose.

So far, reliability has come from structure: better tools, plans, the right number of agents. The next lesson turns the agent’s attention on itself. Metacognition is an agent that checks its own work, a reflection step where it asks “is this actually right?” before committing, and it is one of the cheapest ways to make a single agent more reliable without adding a second one.