Skip to content

Lesson: Choosing an agent framework

By now you know more than the marketing implies. Lesson 1 showed that an agent is a model in a perceive-decide-act loop. Lesson 2 showed exactly how the tool-call exchange works, four steps of plain back-and-forth between the model and the surrounding code. Nothing in either lesson required a special library. You could write that loop yourself in an afternoon.

That changes the question most people ask. The internet wants you to ask “which agent framework should I use?” The honest first question is one step earlier: do you want a framework at all? Sometimes the answer is no, and reaching for a heavy framework to do what fifty lines of your own code would do is a real and common mistake. This lesson gives you a way to make that call deliberately, and then, if the answer is yes, a way to match a framework to your task instead of to whichever name you have heard most often.

Start by being concrete about what a framework does, because “framework” is a vague word.

Everything in Lessons 1 and 2 is plumbing a framework can provide for you: building the tool menu (the schema), formatting the model’s tool-call requests, parsing them back out, running the tools, feeding results back, and looping until done. That is the boilerplate. A framework writes it once, well, so you do not write it badly many times.

Most frameworks go further and provide opinionated building blocks on top of the loop: ready-made memory, planning routines, multi-agent coordination, connectors to data sources and external services, and observability so you can see what the agent did and why. Those extras are the real reason to adopt a framework, because they are the parts that are genuinely tedious and error-prone to build yourself.

With that in view, the tradeoff is clear and you can make it honestly.

Hand-roll the loop when you want full control over every step, you want minimal dependencies, and the loop is small. If your agent calls two or three tools in a simple loop, your own code is easy to write, easy to read, and easy to debug. You own every line, nothing changes under you, and there is no framework behavior to fight when you need something slightly unusual.

Adopt a framework when you want the boilerplate gone, you want opinionated patterns for the hard parts (multi-agent, memory, planning), and you value ecosystem integrations. If you are wiring an agent into a dozen data sources, coordinating several agents, or you simply do not want to maintain the plumbing, a framework earns its keep. You trade some control for a large head start.

The trap to avoid is reaching for a framework reflexively. A single-tool agent does not need a multi-agent orchestration library any more than a one-page script needs a web framework. Match the weight of the tool to the weight of the task.

Here is the same simple agent, framework-agnostic, sketched both ways.

HAND-ROLLED (you write all of this):
define the tool schema by hand
send model the goal + schema
loop:
read model output
if it is a tool call: parse it, run the tool, feed result back
else: return the answer
WITH A FRAMEWORK (you write only this):
@tool
def get_weather(city, day): ... # framework reads the schema from here
agent = Agent(tools=[get_weather])
agent.run("weather in Seattle tomorrow") # framework owns the loop

The framework did not do anything you could not do. It removed the plumbing so you could describe the tool and the goal and let the loop be someone else’s maintained code. For a small agent that saving is modest. For a large one it is the difference between shipping and babysitting boilerplate.

If you decide to adopt a framework, do not pick by popularity. Pick by the shape of your task. The ecosystem sorts roughly into a few categories, and a framework is a strong fit when its category matches what you are building. The four-category map below draws on the Berkeley CS294 framework lecture (Chi Wang of AutoGen and Jerry Liu of LlamaIndex). Microsoft’s “Explore AI Agent Frameworks” contributes a separate piece: the fit-question framing we use to choose among them.

  • Orchestration and multi-agent. Built for coordinating one or more agents, passing work between them, and managing conversations among several roles. Microsoft Agent Framework (MAF) sits here as the current Microsoft choice; AutoGen, MAF’s predecessor, is in maintenance mode per Microsoft’s own README and is community-managed going forward, so new projects should start on MAF. A fit when your problem is naturally several cooperating agents rather than one.
  • Retrieval-first. Built around bringing your own data to the agent, with retrieval treated as a first-class primitive rather than a bolted-on tool. LlamaIndex sits here. A fit when the agent’s main job is to answer from a body of documents or a knowledge base.
  • Graph and state-machine. Built to express an agent’s control flow as an explicit graph of steps and transitions, including loops and branches you define. LangGraph sits here. A fit when you need precise, inspectable control over what happens in what order, beyond a simple loop.
  • Agent as a managed service. A different shape entirely: not a code library you run, but a hosted service that runs and scales the agent for you, often with built-in tools and managed conversation state. Microsoft Foundry Agent Service (formerly Azure AI Agent Service) is one example; AWS Bedrock Agents and Google Vertex AI Agent Builder occupy the same category. A fit when you want deployment and scaling handled rather than running the loop on your own infrastructure.

Each category has costs. Orchestration adds multi-agent machinery you waste when your task is a single agent with a few tools. Retrieval-first carries indexing and document-handling weight you do not need if you are not answering from a document base. Graph and state-machine frameworks make you author every transition, which is over-specification for a fluid task. A managed service trades the operations burden for vendor lock-in, recurring runtime fees, and less control over what runs underneath. Match each fit to a real cost you are willing to pay.

Notice that each description says what the framework is built for, not which framework is best. There is no best. Microsoft Agent Framework is built for multi-agent orchestration; LlamaIndex is built around retrieval; LangGraph is built for explicit control flow. Those are different jobs. Asking which is best is like asking whether a hammer is better than a screwdriver.

Three tasks, three fits.

TASK: "Answer support questions from our 500-page product manual."
-> retrieval is the core job
-> retrieval-first category (e.g. LlamaIndex), or hand-roll with one
retrieval tool if the rest is simple.
TASK: "A planner agent hands subtasks to a researcher and a writer agent."
-> several cooperating agents
-> orchestration / multi-agent category (e.g. Microsoft Agent Framework).
TASK: "One agent that checks the weather and books a calendar slot."
-> two tools, one simple loop
-> hand-roll. A framework would be more weight than the task needs.

The third case is the one people get wrong most often. When the task is small, the framework is the overhead, not the help.

How to actually choose: a few honest questions

Section titled “How to actually choose: a few honest questions”

Once you have decided a framework is worth it, a short set of questions usually settles which one, without any leaderboard.

  • What shape is the task? One agent with a few tools, several cooperating agents, a retrieval-heavy job, or something needing precise step-by-step control? The answer points straight at a category above.
  • How much control do you need? If you need to inspect and shape every transition, a graph or state-machine framework fits. If you are happy to let the framework own the flow, an orchestration framework is less work.
  • Where does your data and infrastructure already live? A framework that connects cleanly to the systems you already run saves more time than any feature comparison. This is where a managed service can win on operations alone.
  • Who maintains it after you ship? A framework your team already knows beats a marginally better-fitting one nobody can debug at 2am. Familiarity is a real input, not a cop-out.

None of these is a ranking question. Each one narrows the field by fit, which is the only kind of answer that survives contact with a real project.

Notice each fit question maps back to the parts you already know. Shape asks which parts of Lesson 1’s perceive-decide-act loop (model, system prompt, tools, loop) the framework most usefully provides. Control asks who owns Lesson 2’s four-step exchange, you or the framework. Data and infrastructure asks which tool integrations come pre-wired. Maintainership is the people question. Frameworks differ less in what they do (every category implements the same loop) than in how they split ownership of those four parts and the four exchange steps.

A caveat worth carrying: the loop is stable, the libraries are not

Section titled “A caveat worth carrying: the loop is stable, the libraries are not”

One honest note before you choose. The agent ideas you learned in Lessons 1 and 2, the loop and the tool-call exchange, are stable. They will still describe agents years from now. The frameworks built on top of them move fast: APIs change, names rise and fade, and today’s standard can be next year’s migration. This is an argument for understanding the loop well no matter what you choose, so that whichever framework you adopt, you can see what it is doing underneath and you are never fully captive to it. A framework you understand is a tool. A framework you treat as magic is a liability.

  • Reaching for a framework reflexively. The default assumption that “real” agents need a framework is wrong. Small agents are often cleaner hand-rolled. Decide, do not default.
  • Picking by popularity instead of fit. The most-starred framework is not the right one unless its category matches your task. Match the shape, not the hype.
  • Asking which framework is “best.” There is no best, only best-for-a-job. Replace “which is best” with “which is built for what I am doing.”
  • Treating the framework as magic. If you do not know what the framework does underneath, you cannot debug it when it fails or returns an unexpected result. The loop from Lessons 1 and 2 is what it is doing; keep that picture in mind.
  • Ignoring churn. Frameworks change fast. Build on a solid understanding of the loop so a breaking change is an inconvenience, not a rewrite of your mental model.
  • The first decision is hand-roll versus framework, not which framework. Lessons 1 and 2 showed the loop is buildable. A framework is worth it when the boilerplate, the hard patterns (multi-agent, memory, planning), and the integrations outweigh the control and simplicity of writing it yourself.
  • A framework gives you the loop plumbing plus opinionated building blocks. The plumbing is the schema, the tool-call parsing, and the loop; the building blocks (memory, planning, multi-agent, connectors, observability) are usually the real reason to adopt one.
  • Choose by category, not popularity. Orchestration/multi-agent, retrieval-first, graph/state-machine, and managed-service are different shapes. Match the framework’s shape to your task’s shape.
  • There is no best framework, only best-for-a-job. Frame every comparison as “built for X” versus “built for Y,” never as a ranking.
  • The loop is stable; the libraries are not. Understand the underlying loop so you can use any framework without being captive to it.

The next lesson goes deep on the single most important building block underneath all of these frameworks: the tool-use design pattern. We will move past how one tool call works and into how to define a tool well, so the model reliably knows when and how to use it.