Summary: The tool-use design pattern in depth

The model picks which tool to call, and with what arguments, from the tool’s description and nothing else. It cannot see the code. So when an agent calls the wrong tool, skips one it should have used, or passes garbage arguments, the cause is almost never a weak model; it is a tool described badly. This lesson is how to write a tool definition the model can use reliably, and it is the most directly useful lesson in the early track if you are actually building an agent.

Core ideas

A tool definition has four parts, all text the model reads: the name (a clear handle, the first and shortest description), the description (what the tool does AND when to use it), the parameters (each with a name, type, and its own short description), and the expected output (what comes back). Treat the whole thing as the model’s only window onto the tool; a dirty window means the model uses the tool wrong.
Good descriptions say what the tool is for and when to use it. “Searches.” tells the model nothing; “Search the company knowledge base of support articles and policies; use for company-specific questions” tells it exactly when this tool fits.
Parameters need their own descriptions. The model fills arguments from them. A bare date is a guessed format; date (YYYY-MM-DD, resolve relative dates first) moves a class of wrong-argument bugs from runtime into the definition.
Negative guidance is high-leverage. A short “do not use this for X” line closes the near-miss cases a positive description leaves open. Models over-reach to neighboring uses; one sentence stops it (an internal-docs tool with “do not use for general world knowledge” stops the agent from searching docs for the capital of France).
When two tools overlap, each description must draw the boundary. “Get the weather” and “Get the forecast” are near-synonyms, so the model picks at random. Rewrite each to say what makes it different (current conditions now versus predicted future), with a “not for the other case” clause.
Make the output legible too. The model reads the result to choose its next step, so labeled, self-describing output ({ high_f: 58, condition: "rain" }) beats an opaque blob ({ t: 58, c: 3 }). Output design is part of tool design.
Descriptions carry more weight as the toolbox grows. With three tools, rough descriptions get by; with thirty, overlapping territory is everywhere and description quality is what keeps selection correct. An unreliable many-tool agent is usually a description problem, not a model problem.

What changes for you

The next time an agent calls the wrong tool, your instinct will change. Instead of reaching for a bigger model or blaming the loop, you will read the tool definitions the way the model reads them, as the only information it had, and you will usually find the bug there: a vague description, an undescribed parameter, two tools with no boundary between them. Tool definitions stop being boilerplate you fill in once and become the place where an agent’s reliability is mostly won or lost.