Fine-tuning LLMs, in brief

What you’ll learn

This is the first lesson of the LLM-specific frontier. You will learn the kind of fine-tuning that turns a raw base model into an instruction-following assistant, and how it differs from the task fine-tuning you already know. The source curriculum is the Hugging Face LLM Course’s Supervised Fine-Tuning chapter, freely available and Apache-2.0 licensed at huggingface.co/learn/llm-course/chapter11.

You will distinguish task fine-tuning (a classifier head, one task) from supervised fine-tuning (the generative model learning to follow instructions); learn when SFT is the right tool versus simply prompting an instruction-tuned model; see the chat-formatted data and chat templates SFT needs (apply_chat_template); meet the SFTTrainer from TRL as the familiar training loop specialized for SFT; and understand how LoRA and parameter-efficient fine-tuning make fine-tuning large models affordable. The lesson stays at a mechanical, how-it-works level throughout.

Where this fits

This is lesson 10 of 12, the second lesson of Phase 3 (demos and the LLM frontier). It is the direct counterpart to lesson 3: that was task fine-tuning, this is supervised fine-tuning, and the Trainer loop carries over to SFTTrainer. It sets up lesson 11 (the data quality that determines whether any fine-tuning works) and lesson 12 (the reasoning-model frontier).

Before you start

Prerequisites: lesson 3 (the Trainer fine-tuning loop and the idea of adapting a pretrained model), which this lesson contrasts with and builds on. Lesson 2 (prompting a model with pipeline) is useful, since “try prompting first” is the first step before any SFT. The worked code uses a GPU; you can follow the concepts without running it. Optional installs: pip install transformers trl peft datasets.

About the math

None. This lesson describes how SFT works (data format, the training step, what changes versus a base model) at a mechanical level, with no derivations. Concepts like LoRA are explained by what they do (train small added matrices), not by their linear algebra.

By the end, you’ll be able to

The single capability this lesson builds: distinguish task fine-tuning from supervised/instruction fine-tuning for large language models. Concretely, you will be able to:

Distinguish task fine-tuning from supervised/instruction fine-tuning
Decide when to use SFT versus prompting an existing instruction-tuned model
Explain chat-formatted data and chat templates (apply_chat_template)
Describe the SFTTrainer (TRL) and how it relates to the Trainer
Explain how LoRA / PEFT makes fine-tuning large models affordable

Time and difficulty

Read time: about 12 minutes
Practice time: about 10 minutes (a when-to-SFT diagnosis exercise plus flashcards; running the code is optional)
Difficulty: standard (conceptual, with one illustrative training example; no math, kept at a how-it-works level)