References: How models call functions

Source material

Source material:
• Stanford CME 295: Transformers & Large Language Models, Autumn 2025
  Instructor: Afshine Amidi & Shervine Amidi, Stanford University
  Course site: https://cme295.stanford.edu/
  Cheatsheet: https://cme295.stanford.edu/cheatsheet/
  Source lecture (Lecture 7, Agentic LLMs):
    see course site at https://cme295.stanford.edu/ for the lecture URL
  License (lecture videos): as published on Stanford's public YouTube channel
  License (Amidi cheatsheets): MIT
This lesson adapts the function-calling section of Stanford CME 295 Lecture 7,
covering [00:59:32-01:00:31] motivation and the structured-data framing,
[01:00:31-01:06:04] the function definition + the LLM-sees-the-contract framing,
[01:06:04-01:14:00] the three-stage mechanism + the two SFT training pairs +
the find-teddy-bear worked example. Clawdemy provides original notes,
summaries, and quizzes derived from this material for educational purposes.
All rights to the original lectures remain with Stanford and the instructors.

Foundational papers

The published research behind tool-use and function calling.

“Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al., 2023. The first widely-cited paper on training LLMs to use tools. Section 2 (the self-supervised data construction) is the conceptual core: the paper teaches the model to insert tool-call markers into text via a self-supervised process, sidestepping the need for hand-labeled SFT pairs. Read after this lesson for the training-side detail.
“ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al., 2023. Introduces the ReAct prompting pattern: interleave reasoning steps and tool-call actions in a single prompt. Even though ReAct is itself a prompting technique (not a fine-tuning recipe), it became a load-bearing pattern for production tool-use. The lecturer cites it briefly; the next lesson (agent loops) goes deeper.

Practical references

OpenAI’s function-calling guide. The most-cited working reference for how function calling looks in production. Covers JSON schema validation, error handling, and the structured output the API exposes. Worth scanning if you’re building anything that uses tools.
Anthropic’s tool use documentation. Same idea, different vendor. Useful for comparing vendor-specific differences in how the function-calling protocol works.

Going deeper

A short list, chosen for durability.

“Gorilla: Large Language Model Connected with Massive APIs”, Patil et al., 2023. Demonstrates fine-tuning an LLM specifically for tool selection across thousands of available APIs. Useful if you want to understand how the tool-selection problem (the next lesson’s territory) scales.
“A Survey on the Memorization of Tool Calls in LLMs”, Yang et al., 2024. A field overview of how tool-calling capabilities are evaluated, where they fail, and how the research community measures progress. Worth reading after this lesson and the agent-loops lesson for a higher-level view.

Adjacent topics

Code execution / code interpreter. Function calling’s flexible-but-riskier sibling. Search terms: “code interpreter,” “Python sandbox in LLMs,” “OpenAI Code Interpreter,” “Anthropic computer use.” The pattern is the same in shape (LLM produces code, runtime executes it, response feeds back), but the implementation involves a sandboxed runtime that can run arbitrary generated code. Risk profile is meaningfully different.
Function-call argument validation. The structured Stage 1 output is typically validated against a JSON schema before Stage 2 runs. This catches most argument-hallucination failures. Worth understanding if you’re building production tool-using AI; the validation step is where most of the practical hardening lives.
Tool-augmented agents. The next lesson (how-agent-loops-work) builds on this lesson. Function calling is a single round-trip; agent loops chain many function calls into longer-horizon work, with the model deciding what to do next based on each tool’s output.

Stanford CME 295 cheatsheet

Stanford CME 295 cheatsheet by the Amidi twins. MIT-licensed. The function-calling and tool-use section covers the same material in their dense visual style. Worth using as a study reference after this lesson.

Community discussion

None selected for this lesson. Vendor docs (OpenAI, Anthropic) and academic sources are the better entry points right now. Durable community references will be added at a future quarterly review if any consolidate.