Fine-tuning LLMs, supervised and instruction tuning
What you’ll learn
Section titled “What you’ll learn”This is the first lesson of the LLM-specific frontier. You will learn the kind of fine-tuning that turns a raw base model into an instruction-following assistant, and how it differs from the task fine-tuning you already know. The source curriculum is the Hugging Face LLM Course’s Supervised Fine-Tuning chapter, freely available and Apache-2.0 licensed at huggingface.co/learn/llm-course/chapter11.
You will distinguish task fine-tuning (a classifier head, one task) from supervised fine-tuning (the generative model learning to follow instructions); learn when SFT is the right tool versus simply prompting an instruction-tuned model; see the chat-formatted data and chat templates SFT needs (apply_chat_template); meet the SFTTrainer from TRL as the familiar training loop specialized for SFT; and understand how LoRA and parameter-efficient fine-tuning make fine-tuning large models affordable. The lesson stays at a mechanical, how-it-works level throughout.
Where this fits
Section titled “Where this fits”This is lesson 10 of 12, the second lesson of Phase 3 (demos and the LLM frontier). It is the direct counterpart to lesson 3: that was task fine-tuning, this is supervised fine-tuning, and the Trainer loop carries over to SFTTrainer. It sets up lesson 11 (the data quality that determines whether any fine-tuning works) and lesson 12 (the reasoning-model frontier).
Before you start
Section titled “Before you start”Prerequisites: lesson 3 (the Trainer fine-tuning loop and the idea of adapting a pretrained model), which this lesson contrasts with and builds on. Lesson 2 (prompting a model with pipeline) is useful, since “try prompting first” is the first step before any SFT. The worked code uses a GPU; you can follow the concepts without running it. Optional installs: pip install transformers trl peft datasets.
About the math
Section titled “About the math”None. This lesson describes how SFT works (data format, the training step, what changes versus a base model) at a mechanical level, with no derivations. Concepts like LoRA are explained by what they do (train small added matrices), not by their linear algebra.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”The single capability this lesson builds: distinguish task fine-tuning from supervised/instruction fine-tuning for large language models. Concretely, you will be able to:
- Distinguish task fine-tuning from supervised/instruction fine-tuning
- Decide when to use SFT versus prompting an existing instruction-tuned model
- Explain chat-formatted data and chat templates (
apply_chat_template) - Describe the
SFTTrainer(TRL) and how it relates to theTrainer - Explain how LoRA / PEFT makes fine-tuning large models affordable
Time and difficulty
Section titled “Time and difficulty”- Read time: about 12 minutes
- Practice time: about 10 minutes (a when-to-SFT diagnosis exercise plus flashcards; running the code is optional)
- Difficulty: standard (conceptual, with one illustrative training example; no math, kept at a how-it-works level)