Skip to content

Cheatsheet: How models call functions

Function calling is how an LLM acts on the world.
The model picks a predefined function, code runs it,
the model wraps the structured response in natural language.
RAGFunction calling
Closes the gap for…Unstructured data (documents, text)Structured data (APIs, databases, transactional actions)
MechanismFetch documents, inject relevant chunks into promptLLM emits structured call; code executes; structured response goes back to LLM
Common use caseQ&A over a knowledge baseReal-time data, transactional actions, structured queries
What’s in the promptRetrieved text passagesFunction signatures + docstrings
USER QUERY → STAGE 1 (LLM): pick function + arguments
Structured call (e.g., JSON)
STAGE 2 (CODE): execute the function (no LLM)
Structured response
STAGE 3 (LLM): wrap response in natural language
FINAL RESPONSE → USER

Stages 1 and 3 are LLM-driven. Stage 2 is regular code.

The user sees only Stage 3. Stages 1 and 2 happen inside the system.

SeesDoes not see
User queryFunction implementation
Function signature (name + arguments)API internals
Function docstring (what it does, what it returns)Sandbox/runtime details
Conversation history (prior calls + structured responses)Anything in Stage 2 except the structured return

Why this matters: docstrings have to be specific. The model has nothing else to infer behavior from.

User: "Find a teddy bear store near me."
STAGE 1 (LLM emits):
{
"function": "find_teddy_bear_store",
"arguments": {"location": "37.4275,-122.1697", "radius": "1mi"}
}
STAGE 2 (code runs):
→ Hits maps API with structured arguments
→ Returns: [{"name": "Bear Necessities", "address": "...", "distance": "0.3mi"}]
STAGE 3 (LLM responds):
"There are three teddy bear stores within a mile of you.
The closest is Bear Necessities at 525 University Ave..."
PairInputExpected output
Tool predictionUser query + function descriptionStructured function call (name + arguments)
Response formattingStructured function response + conversation historyNatural-language answer

The two pairs combine with the model’s regular SFT data. After training, the model has both new capabilities: emit a call when one is appropriate, format a response afterwards.

Newer pattern: sufficiently strong reasoning models can skip explicit SFT for tool prediction; they figure out the call from the description alone.

Task classExampleWhy function calling?
Real-time data”What’s the weather in Boston now?”Model’s training data is stale; function fetches live data
Transactional actions”Book a 3pm meeting tomorrow”The side effect is the point; model is the natural-language interface
Structured queries”Get customer record for ID 12345”Structured data, structured access pattern
FailureWhat goes wrongWhere to debug
Argument hallucinationPlausible-but-wrong arguments (wrong format, fabricated values)Inspect Stage 1 structured call before execution
Wrong-tool selectionMultiple tools available; model picks wrong oneTool descriptions; SFT data quality; tool-selection patterns (next lesson)
Added latencyStage 2 = real API call; user waitsCache results; parallelize calls; skip detour when not needed
Function callingCode execution
What’s executedPredefined function; implementation exists alreadyLLM-generated new code
RiskConstrained (only declared functions)Higher (sandbox required; model could do anything)
PredictabilityHigh (function contract is fixed)Lower (depends on LLM’s code generation)
Common useMost tool-augmented AI in productionCode interpreter features, data analysis sandboxes

They often coexist in modern apps. Different tools for different jobs.

PitfallReality
”The model wrote the function.”No. Implementation existed before. Model picks when to call and what arguments to fill.
”Function calling = code execution.”Different things. Function calling = predefined functions. Code execution = LLM generates new code.
”Function calling eliminates hallucination.”Only for the structured tool output. Natural-language framing in Stage 3 can still contain wrong claims.
”If the AI does something, it must be tool calling.”Not always. Could be RAG (text retrieval), pure prompt engineering, or just the model’s training data. Knowing which one was used helps you reason about reliability.
  • Tool calling: general term for any LLM-emitted call to an external resource. Function calling is the structured subset.
  • Function calling: specific protocol where the LLM emits a structured call to a predefined function with documented signature.
  • Tool prediction: Stage 1 of the mechanism; the LLM picking the function and arguments.
  • Response formatting: Stage 3 of the mechanism; the LLM turning structured tool output into natural language.
  • Function definition: the signature + docstring shown to the LLM in the preamble. The contract the model uses to decide what to call.
  • Argument hallucination: failure mode where the LLM emits a function call with plausible-but-wrong arguments.
  • Code execution / code interpreter: sibling capability where the LLM generates new code (not just calls predefined functions).
  • ReAct: one common pattern for combining reasoning and tool use. Mentioned in the next lesson on agent loops.

Function calling is how an LLM acts on the world.
Three stages: pick the function, run the function, explain the result.
The model never sees the implementation. Only the contract.