Skip to content

Cheatsheet: The tool-use design pattern in depth

The model picks tools and fills arguments from descriptions alone. It cannot see the code. A tool the model misuses is almost always a tool you described badly.

PartJob for the model
NameShort handle (get_weather); the first and shortest description.
DescriptionWhat it does AND when to use it (and when not to). The model leans on this most.
ParametersEach input with name, type, and its own short description (format + rules).
Expected outputWhat comes back, so the model can plan the next step.

All four are text the model reads. The definition is the model’s only window onto the tool.

BAD: name: search description: "Searches." parameters: query
GOOD: name: search_internal_docs
description: "Search the company knowledge base of support articles
and policies. Use for company-specific procedures/products/policies.
Do not use for general world knowledge or live data."
parameters: query (string): "User's question as search keywords."

Same tool, same code. Only the words changed, and the words are what the model acts on.

WEAK: date # model guesses the format
STRONG: date (string): "Target date in YYYY-MM-DD. Resolve relative dates
like 'tomorrow' before calling."

Undescribed parameters cause wrong-argument bugs. Describe format + rules.

The model reads the result to choose its next step (L2’s decide step). Return labeled, self-describing data.

HARD: { "t": 58, "c": 3 }
EASY: { "high_f": 58, "condition": "rain" }

Same data; only the second lets the model act without guessing.

When two tools could match the same request, each description must mark its boundary.

get_current_weather(city) "Conditions right now. Use for 'what is it like now'."
get_forecast(city, days) "Predicted future conditions, up to 7 days. Use for
'will it rain tomorrow'. Not for right now."

Models over-reach to neighboring cases. A short “do not use this for X” line closes the near-misses that positive descriptions leave open. One of the best sentences you can add.

3 tools: rough descriptions usually fine. 30 tools: overlapping territory is everywhere, and description quality is what keeps selection correct. Unreliable many-tool agent? Check the descriptions first, not the model.

  • Blaming the model for selection errors (it picks from descriptions).
  • Vague names: search, process, handle, do_task.
  • Bare, undescribed parameters.
  • Only-positive descriptions (no “do not use for X”).
  • Two tools that overlap silently.
  • Tool definition: name + description + parameters + expected output; all text the model reads.
  • Negative guidance: an explicit “do not use this tool for X” clause.
  • Disambiguation: writing each overlapping tool’s description so its boundary with the others is explicit.