Cheatsheet: Fine-tuning LLMs
Two kinds of fine-tuning
Section titled “Two kinds of fine-tuning”| Task fine-tuning (lesson 3) | Supervised fine-tuning (SFT) | |
|---|---|---|
| Head | A task head (classifier) | The language-modeling head |
| Trains for | One narrow task | Following instructions broadly |
| Output | A label | Text (assistant-style) |
| Makes the model | Good at a task | Good at being an assistant |
Decision order
Section titled “Decision order”- Prompt an existing instruction-tuned model. If it works, stop.
- SFT only when prompting is not enough:
- Template control (strict output format)
- Domain adaptation (specialized terms/style)
- Cost (a smaller fine-tuned model is cheaper to run)
SFT data: conversations + chat template
Section titled “SFT data: conversations + chat template”- Data is role-tagged messages:
system,user,assistant. - The model’s chat template lays them out the way it expects.
- Apply with
tokenizer.apply_chat_template(messages). - Wrong template -> broken behavior (markers no longer match).
SFTTrainer (TRL)
Section titled “SFTTrainer (TRL)”from trl import SFTConfig, SFTTrainerfrom transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)tokenizer = AutoTokenizer.from_pretrained(model_name)
args = SFTConfig(output_dir="./sft_output", max_steps=1000, per_device_train_batch_size=4, learning_rate=5e-5, eval_strategy="steps", eval_steps=50)
trainer = SFTTrainer(model=model, args=args, train_dataset=dataset["train"], eval_dataset=dataset["test"], processing_class=tokenizer)trainer.train()Same loop as lesson 3, specialized for generative SFT. Auto-applies the chat template when the dataset has a messages field. packing=True packs short examples for efficiency.
LoRA / PEFT (make it affordable)
Section titled “LoRA / PEFT (make it affordable)”- Full fine-tuning of a large model needs huge memory.
- LoRA freezes the base weights and trains small added low-rank matrices.
- Big memory savings; pretrained knowledge preserved; fits on a modest GPU.
- One of the PEFT (parameter-efficient fine-tuning) methods; the standard choice for large models. Configure with a
LoraConfig.
The assistant-building pipeline
Section titled “The assistant-building pipeline”Pretrain -> SFT -> (optional) preference tuning(learn (learn (RLHF / DPO: refine which language) to follow responses are preferred) instructions)This lesson stops at SFT; preference tuning is named only to place SFT in the pipeline.
Words to use precisely
Section titled “Words to use precisely”- SFT: training a generative model on instruction/response data to follow instructions.
- Chat template: the role-marker text layout a chat model expects.
- LoRA / PEFT: fine-tune by training a few added parameters, not all weights.
- TRL: the library providing
SFTTrainer, built ontransformers.
Recommended further study
Section titled “Recommended further study”- Hugging Face LLM Course, Chapter 11: “Supervised Fine-Tuning.”
huggingface.co/learn/llm-course/chapter11. Released under Apache 2.0; this lesson mirrors its structure with original prose.