Cheatsheet: How models call functions
The one idea that matters
Section titled “The one idea that matters”Function calling is how an LLM acts on the world.The model picks a predefined function, code runs it,the model wraps the structured response in natural language.RAG vs function calling
Section titled “RAG vs function calling”| RAG | Function calling | |
|---|---|---|
| Closes the gap for… | Unstructured data (documents, text) | Structured data (APIs, databases, transactional actions) |
| Mechanism | Fetch documents, inject relevant chunks into prompt | LLM emits structured call; code executes; structured response goes back to LLM |
| Common use case | Q&A over a knowledge base | Real-time data, transactional actions, structured queries |
| What’s in the prompt | Retrieved text passages | Function signatures + docstrings |
The three-stage mechanism
Section titled “The three-stage mechanism”USER QUERY → STAGE 1 (LLM): pick function + arguments ↓ Structured call (e.g., JSON) ↓ STAGE 2 (CODE): execute the function (no LLM) ↓ Structured response ↓ STAGE 3 (LLM): wrap response in natural language ↓ FINAL RESPONSE → USERStages 1 and 3 are LLM-driven. Stage 2 is regular code.
The user sees only Stage 3. Stages 1 and 2 happen inside the system.
What the LLM sees vs doesn’t see
Section titled “What the LLM sees vs doesn’t see”| Sees | Does not see |
|---|---|
| User query | Function implementation |
| Function signature (name + arguments) | API internals |
| Function docstring (what it does, what it returns) | Sandbox/runtime details |
| Conversation history (prior calls + structured responses) | Anything in Stage 2 except the structured return |
Why this matters: docstrings have to be specific. The model has nothing else to infer behavior from.
A worked example: find a teddy bear
Section titled “A worked example: find a teddy bear”User: "Find a teddy bear store near me."
STAGE 1 (LLM emits):{ "function": "find_teddy_bear_store", "arguments": {"location": "37.4275,-122.1697", "radius": "1mi"}}
STAGE 2 (code runs):→ Hits maps API with structured arguments→ Returns: [{"name": "Bear Necessities", "address": "...", "distance": "0.3mi"}]
STAGE 3 (LLM responds):"There are three teddy bear stores within a mile of you.The closest is Bear Necessities at 525 University Ave..."Two SFT training pairs
Section titled “Two SFT training pairs”| Pair | Input | Expected output |
|---|---|---|
| Tool prediction | User query + function description | Structured function call (name + arguments) |
| Response formatting | Structured function response + conversation history | Natural-language answer |
The two pairs combine with the model’s regular SFT data. After training, the model has both new capabilities: emit a call when one is appropriate, format a response afterwards.
Newer pattern: sufficiently strong reasoning models can skip explicit SFT for tool prediction; they figure out the call from the description alone.
Where function calling shines
Section titled “Where function calling shines”| Task class | Example | Why function calling? |
|---|---|---|
| Real-time data | ”What’s the weather in Boston now?” | Model’s training data is stale; function fetches live data |
| Transactional actions | ”Book a 3pm meeting tomorrow” | The side effect is the point; model is the natural-language interface |
| Structured queries | ”Get customer record for ID 12345” | Structured data, structured access pattern |
Common failure modes
Section titled “Common failure modes”| Failure | What goes wrong | Where to debug |
|---|---|---|
| Argument hallucination | Plausible-but-wrong arguments (wrong format, fabricated values) | Inspect Stage 1 structured call before execution |
| Wrong-tool selection | Multiple tools available; model picks wrong one | Tool descriptions; SFT data quality; tool-selection patterns (next lesson) |
| Added latency | Stage 2 = real API call; user waits | Cache results; parallelize calls; skip detour when not needed |
Function calling vs code execution
Section titled “Function calling vs code execution”| Function calling | Code execution | |
|---|---|---|
| What’s executed | Predefined function; implementation exists already | LLM-generated new code |
| Risk | Constrained (only declared functions) | Higher (sandbox required; model could do anything) |
| Predictability | High (function contract is fixed) | Lower (depends on LLM’s code generation) |
| Common use | Most tool-augmented AI in production | Code interpreter features, data analysis sandboxes |
They often coexist in modern apps. Different tools for different jobs.
Pitfalls to dodge
Section titled “Pitfalls to dodge”| Pitfall | Reality |
|---|---|
| ”The model wrote the function.” | No. Implementation existed before. Model picks when to call and what arguments to fill. |
| ”Function calling = code execution.” | Different things. Function calling = predefined functions. Code execution = LLM generates new code. |
| ”Function calling eliminates hallucination.” | Only for the structured tool output. Natural-language framing in Stage 3 can still contain wrong claims. |
| ”If the AI does something, it must be tool calling.” | Not always. Could be RAG (text retrieval), pure prompt engineering, or just the model’s training data. Knowing which one was used helps you reason about reliability. |
Glossary
Section titled “Glossary”- Tool calling: general term for any LLM-emitted call to an external resource. Function calling is the structured subset.
- Function calling: specific protocol where the LLM emits a structured call to a predefined function with documented signature.
- Tool prediction: Stage 1 of the mechanism; the LLM picking the function and arguments.
- Response formatting: Stage 3 of the mechanism; the LLM turning structured tool output into natural language.
- Function definition: the signature + docstring shown to the LLM in the preamble. The contract the model uses to decide what to call.
- Argument hallucination: failure mode where the LLM emits a function call with plausible-but-wrong arguments.
- Code execution / code interpreter: sibling capability where the LLM generates new code (not just calls predefined functions).
- ReAct: one common pattern for combining reasoning and tool use. Mentioned in the next lesson on agent loops.
Function calling is how an LLM acts on the world.
Three stages: pick the function, run the function, explain the result.
The model never sees the implementation. Only the contract.