Practice: The tool-use design pattern in depth

Self-check

Seven short questions. Answer each before opening the collapsible.

1. What does the model use to decide which tool to call and with what arguments?

Show answer

The tool’s description and nothing else: the name, the description, and the parameter list you wrote. It cannot see the code or run experiments. So a tool the model misuses is almost always a tool you described badly.

2. Name the four parts of a tool definition and the job each does for the model.

Show answer

Name (the handle the model calls by; the first and shortest description), description (what the tool does AND when to use it), parameters (each input with name, type, and its own short description), expected output (what comes back, so the model can plan the next step). All four are text the model reads, the model’s only window onto the tool.

3. Why do parameters need their own descriptions?

Show answer

Because the model fills arguments from the parameter descriptions the same way it picks tools from tool descriptions. A bare date leaves the model guessing the format ("tomorrow"? 2026-05-21?); a described date (“YYYY-MM-DD; resolve relative dates before calling”) moves a whole class of wrong-argument bugs from runtime into the definition.

4. What is negative guidance, and why is it high-leverage?

Show answer

A “do not use this for X” clause in the description. Models are eager to use the tools they are given and will reach for one in cases just outside its intended use. A short “do not use when…” line closes those near-miss cases that a purely positive description leaves open.

5. Two tools are described “Get the weather” and “Get the forecast.” What goes wrong, and how do you fix it?

Show answer

Weather and forecast are near-synonyms, so the descriptions draw no boundary and the model picks one at random. Fix it by making each description say what makes it different: current conditions now (“use for ‘what is it like now’”) versus predicted future conditions (“use for ‘will it rain tomorrow’, not for right now”). Each description must mark its boundary with the other.

6. Why does the expected-output shape matter, not just the input description?

Show answer

Because the model reads the result to choose its next step. An opaque blob like { "t": 58, "c": 3 } makes the model guess what c: 3 means; a labeled { "high_f": 58, "condition": "rain" } lets it act confidently. Legible, self-describing output is part of a good tool definition.

7. An agent with thirty tools is unreliable. Where do you look first?

Show answer

At the tool descriptions, not the model or the loop. With many tools, several will have overlapping territory, and the quality of every description is what keeps the model picking correctly. Unreliable selection is usually a description problem.

Try it yourself: fix a bad tool definition

Here is a poorly defined tool. Rewrite it well. The tool looks up a customer’s order status given an order ID.

name: lookup
description: "Looks things up."
parameters: q

Show a strong rewrite

name: get_order_status
description: "Look up the current status of a customer's order by its
  order ID (e.g. processing, shipped, delivered). Use when the user asks
  where their order is or whether it has shipped. Do not use for product
  questions or for orders identified only by a customer name."
parameters:
  order_id (string): "The order's ID, like 'ORD-48213'. Ask the user for
    it if it is not already in the conversation."

What changed: a name that says what it does, a description with what + when + when-not, and a parameter with a format and a rule (ask for it if missing). Same tool, but now the model knows when to reach for it and how to fill it.

Try it yourself: disambiguate two overlapping tools

These two tools will be confused, because their descriptions do not draw a boundary. Rewrite the descriptions so the model can always tell them apart.

find_user(name)      description: "Find a user."
search_users(query)  description: "Search users."

Show a strong rewrite

find_user(name)
  description: "Look up exactly one user by their full, exact name and
    return that user's record. Use when you already know the precise name.
    Do not use for partial names or to browse multiple matches."
search_users(query)
  description: "Find all users matching a partial name, email fragment, or
    keyword, and return a list of matches. Use when the name is incomplete
    or you expect more than one result. Do not use when you already know
    the exact full name (use find_user)."

The boundary is now explicit on both sides: exact-one-match versus partial-many-matches, each with a “do not use when…” pointing at the other. When two tools could match the same request, every description has to do double duty: say what it is for, and mark where it ends and the neighbor begins.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.

Q. What does the model use to pick a tool and fill its arguments?

The tool’s description and nothing else (name, description, parameter list). It cannot see the code. A misused tool is almost always a badly-described tool.

Q. What are the four parts of a tool definition?

Name, description (what + when to use), parameters (each with name, type, and its own description), and expected output. All four are text the model reads.

Q. Why must parameters have their own descriptions?

The model fills arguments from them. A bare parameter leaves the format to a guess; a described one (‘YYYY-MM-DD; resolve relative dates first’) prevents a class of wrong-argument bugs.

Q. What is negative guidance in a tool description?

An explicit ‘do not use this for X’ clause. It closes the near-miss cases a purely positive description leaves open, since models over-reach to neighboring uses.

Q. How do you stop the model confusing two overlapping tools?

Make each description mark its boundary with the other: say what each is for AND where it ends, often with a ‘not for Y; use the other tool’ clause.

Q. Why does the tool's output shape matter?

The model reads the result to choose its next step. Labeled, self-describing output ({ high_f: 58 }) lets it act; an opaque blob ({ t: 58 }) makes it guess.

Q. An agent with many tools picks wrong. First place to look?

The tool descriptions. With many tools, overlapping territory is common, and description quality is what keeps selection correct, not the model or the loop.

Q. Why are vague tool names like 'search' or 'process' a problem?

The name is the first and shortest description. A vague name tells the model nothing, making it harder to decide whether the tool fits the task.

Q. Whose fault is it, usually, when an agent calls the wrong tool?

The description’s, not the model’s. The model decides from the words you wrote; unclear or overlapping descriptions cause wrong or skipped calls.

Q. What is the single highest-leverage sentence you can add to a tool definition?

A ‘do not use this for X’ line. It is short and closes off the most common over-reach failures.