Skip to content

References: How models call functions

Source material:
• Stanford CME 295: Transformers & Large Language Models, Autumn 2025
Instructor: Afshine Amidi & Shervine Amidi, Stanford University
Course site: https://cme295.stanford.edu/
Cheatsheet: https://cme295.stanford.edu/cheatsheet/
Source lecture (Lecture 7, Agentic LLMs):
see course site at https://cme295.stanford.edu/ for the lecture URL
License (lecture videos): as published on Stanford's public YouTube channel
License (Amidi cheatsheets): MIT
This lesson adapts the function-calling section of Stanford CME 295 Lecture 7,
covering [00:59:32-01:00:31] motivation and the structured-data framing,
[01:00:31-01:06:04] the function definition + the LLM-sees-the-contract framing,
[01:06:04-01:14:00] the three-stage mechanism + the two SFT training pairs +
the find-teddy-bear worked example. Clawdemy provides original notes,
summaries, and quizzes derived from this material for educational purposes.
All rights to the original lectures remain with Stanford and the instructors.

The published research behind tool-use and function calling.

  • “Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al., 2023. The first widely-cited paper on training LLMs to use tools. Section 2 (the self-supervised data construction) is the conceptual core: the paper teaches the model to insert tool-call markers into text via a self-supervised process, sidestepping the need for hand-labeled SFT pairs. Read after this lesson for the training-side detail.

  • “ReAct: Synergizing Reasoning and Acting in Language Models”, Yao et al., 2023. Introduces the ReAct prompting pattern: interleave reasoning steps and tool-call actions in a single prompt. Even though ReAct is itself a prompting technique (not a fine-tuning recipe), it became a load-bearing pattern for production tool-use. The lecturer cites it briefly; the next lesson (agent loops) goes deeper.

  • OpenAI’s function-calling guide. The most-cited working reference for how function calling looks in production. Covers JSON schema validation, error handling, and the structured output the API exposes. Worth scanning if you’re building anything that uses tools.

  • Anthropic’s tool use documentation. Same idea, different vendor. Useful for comparing vendor-specific differences in how the function-calling protocol works.

A short list, chosen for durability.

  • Code execution / code interpreter. Function calling’s flexible-but-riskier sibling. Search terms: “code interpreter,” “Python sandbox in LLMs,” “OpenAI Code Interpreter,” “Anthropic computer use.” The pattern is the same in shape (LLM produces code, runtime executes it, response feeds back), but the implementation involves a sandboxed runtime that can run arbitrary generated code. Risk profile is meaningfully different.

  • Function-call argument validation. The structured Stage 1 output is typically validated against a JSON schema before Stage 2 runs. This catches most argument-hallucination failures. Worth understanding if you’re building production tool-using AI; the validation step is where most of the practical hardening lives.

  • Tool-augmented agents. The next lesson (how-agent-loops-work) builds on this lesson. Function calling is a single round-trip; agent loops chain many function calls into longer-horizon work, with the model deciding what to do next based on each tool’s output.

  • Stanford CME 295 cheatsheet by the Amidi twins. MIT-licensed. The function-calling and tool-use section covers the same material in their dense visual style. Worth using as a study reference after this lesson.

None selected for this lesson. Vendor docs (OpenAI, Anthropic) and academic sources are the better entry points right now. Durable community references will be added at a future quarterly review if any consolidate.