Skip to content

Fine-tuning LLMs, supervised and instruction tuning

This is the first lesson of the LLM-specific frontier. You will learn the kind of fine-tuning that turns a raw base model into an instruction-following assistant, and how it differs from the task fine-tuning you already know. The source curriculum is the Hugging Face LLM Course’s Supervised Fine-Tuning chapter, freely available and Apache-2.0 licensed at huggingface.co/learn/llm-course/chapter11.

You will distinguish task fine-tuning (a classifier head, one task) from supervised fine-tuning (the generative model learning to follow instructions); learn when SFT is the right tool versus simply prompting an instruction-tuned model; see the chat-formatted data and chat templates SFT needs (apply_chat_template); meet the SFTTrainer from TRL as the familiar training loop specialized for SFT; and understand how LoRA and parameter-efficient fine-tuning make fine-tuning large models affordable. The lesson stays at a mechanical, how-it-works level throughout.

This is lesson 10 of 12, the second lesson of Phase 3 (demos and the LLM frontier). It is the direct counterpart to lesson 3: that was task fine-tuning, this is supervised fine-tuning, and the Trainer loop carries over to SFTTrainer. It sets up lesson 11 (the data quality that determines whether any fine-tuning works) and lesson 12 (the reasoning-model frontier).

Prerequisites: lesson 3 (the Trainer fine-tuning loop and the idea of adapting a pretrained model), which this lesson contrasts with and builds on. Lesson 2 (prompting a model with pipeline) is useful, since “try prompting first” is the first step before any SFT. The worked code uses a GPU; you can follow the concepts without running it. Optional installs: pip install transformers trl peft datasets.

None. This lesson describes how SFT works (data format, the training step, what changes versus a base model) at a mechanical level, with no derivations. Concepts like LoRA are explained by what they do (train small added matrices), not by their linear algebra.

The single capability this lesson builds: distinguish task fine-tuning from supervised/instruction fine-tuning for large language models. Concretely, you will be able to:

  • Distinguish task fine-tuning from supervised/instruction fine-tuning
  • Decide when to use SFT versus prompting an existing instruction-tuned model
  • Explain chat-formatted data and chat templates (apply_chat_template)
  • Describe the SFTTrainer (TRL) and how it relates to the Trainer
  • Explain how LoRA / PEFT makes fine-tuning large models affordable
  • Read time: about 12 minutes
  • Practice time: about 10 minutes (a when-to-SFT diagnosis exercise plus flashcards; running the code is optional)
  • Difficulty: standard (conceptual, with one illustrative training example; no math, kept at a how-it-works level)