Tool prediction: Stage 1 of the function-call mechanism; LLM picks function and arguments. Bucket 1 of failures.
Tool execution: Stage 2; regular code runs the function. Bucket 2 of failures.
Response synthesis: Stage 3; LLM wraps the structured response in natural language. Bucket 3 of failures.
Punt: assistant-design term for when the model says “I can’t help” instead of using an available tool.
Tool hallucination: model emits a call to a function name that doesn’t exist.
Tool router (or tool selector): intermediary system that filters the list of available tools before showing them to the LLM. Used at scale when the tool inventory is large.
Grounding: model’s ability to use information that’s present in its context (in this case, the structured tool response).
Structured output: API feature that guarantees the LLM (or tool) produces output matching a JSON schema. Required for production reliability.
Tool-use failures fall into three buckets: tool prediction, tool execution, response synthesis. Categorize the failure before chasing the fix. Most “AI is broken” cases resolve cleanly once placed. Tool quality (names, docstrings, structured outputs) is the high-leverage non-AI work that makes AI features reliable.