Summary: How models call functions

Function calling is how an LLM acts on the world. The bare LLM has frozen weights and a prompt; it cannot directly fetch real-time data, query a database, or trigger an API. Function calling closes that gap by giving the model access to predefined tools (functions with documented signatures), letting it emit structured calls when relevant, and routing the structured responses back through the model for natural-language formatting.

It is the structured-data sibling of RAG. Where RAG fetches text from documents, function calling fetches structured data from APIs (or triggers structured side effects). Same problem; different kind of data.

Three-stage mechanism. Stage 1: the LLM reads the user query plus a function description (signature + docstring) and emits a structured call (function name + arguments). Stage 2: regular code parses the call and executes the function; this stage has no LLM involvement. Stage 3: the structured function output goes back to the LLM, which produces a natural-language answer wrapping the data.

The LLM sees the contract, not the implementation. It knows the function exists, what arguments it takes, and what it returns. It does not know how the function is implemented. This is why tool descriptions need to be specific and complete: the model has nothing else to go on.

This summary is the scan-it-in-five-minutes version. The full lesson covers the lecturer’s teddy-bear example, the two SFT training pairs that typically produce a function-calling model, and the common failure modes.

Core ideas

Function calling fills the structured-data gap that RAG does not cover. RAG = documents (text). Function calling = structured input/output (APIs, databases, transactional services).
Three stages: tool prediction → function execution → response formatting. The LLM does the first and third; regular code does the second.
The LLM sees the function signature plus docstring in the preamble; it does not see the implementation. The model has to infer when to call the function and what arguments to use from the description.
Two SFT training pairs. Tool prediction (query + function description to structured call) and response formatting (structured response + history to natural language). Sufficiently strong reasoning models can sometimes skip the explicit SFT for tool prediction.
What the user sees is the final natural-language response. Stages 1, 2, and 3 happen behind the chat UI.
Where function calling shines. Real-time data, transactional actions, structured queries.
Common failure modes. Argument hallucination (plausible-but-wrong function arguments), wrong-tool selection (when multiple tools are available), occasionally added latency from the API roundtrip.
Pitfall: function calling vs code execution. Different things. Function calling = predefined functions with structured inputs. Code execution = LLM generates new code, sandboxed runtime executes it. More flexible, more risky.
Pitfall: thinking the model wrote the function. It did not. The implementation existed before; the model’s job is to know when to call it and how to fill its arguments.
Pitfall: assuming function calling eliminates all hallucination. Only the structured tool output is verified. The model’s natural-language framing of the response can still contain wrong claims.

What changes for you

After this lesson, “the AI just did something” moments in modern apps stop being mysterious. ChatGPT booking a reservation, Claude searching the web, an assistant pulling your calendar: all function calling under the hood. You can read what an AI app can and cannot do by looking at what tools it has been given access to, and you have a debugging frame: “did the model pick the right function?” and “did the model pass the right arguments?” are the questions to ask before “did the AI hallucinate?”

Function calling is how an LLM acts on the world.
Three stages: pick the function, run the function, explain the result.
The model never sees the implementation. Only the contract.