Skip to content

Cheatsheet: Fine-tuning LLMs

Task fine-tuning (lesson 3)Supervised fine-tuning (SFT)
HeadA task head (classifier)The language-modeling head
Trains forOne narrow taskFollowing instructions broadly
OutputA labelText (assistant-style)
Makes the modelGood at a taskGood at being an assistant
  1. Prompt an existing instruction-tuned model. If it works, stop.
  2. SFT only when prompting is not enough:
    • Template control (strict output format)
    • Domain adaptation (specialized terms/style)
    • Cost (a smaller fine-tuned model is cheaper to run)
  • Data is role-tagged messages: system, user, assistant.
  • The model’s chat template lays them out the way it expects.
  • Apply with tokenizer.apply_chat_template(messages).
  • Wrong template -> broken behavior (markers no longer match).
from trl import SFTConfig, SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
args = SFTConfig(output_dir="./sft_output", max_steps=1000,
per_device_train_batch_size=4, learning_rate=5e-5,
eval_strategy="steps", eval_steps=50)
trainer = SFTTrainer(model=model, args=args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
processing_class=tokenizer)
trainer.train()

Same loop as lesson 3, specialized for generative SFT. Auto-applies the chat template when the dataset has a messages field. packing=True packs short examples for efficiency.

  • Full fine-tuning of a large model needs huge memory.
  • LoRA freezes the base weights and trains small added low-rank matrices.
  • Big memory savings; pretrained knowledge preserved; fits on a modest GPU.
  • One of the PEFT (parameter-efficient fine-tuning) methods; the standard choice for large models. Configure with a LoraConfig.
Pretrain -> SFT -> (optional) preference tuning
(learn (learn (RLHF / DPO: refine which
language) to follow responses are preferred)
instructions)

This lesson stops at SFT; preference tuning is named only to place SFT in the pipeline.

  • SFT: training a generative model on instruction/response data to follow instructions.
  • Chat template: the role-marker text layout a chat model expects.
  • LoRA / PEFT: fine-tune by training a few added parameters, not all weights.
  • TRL: the library providing SFTTrainer, built on transformers.
  • Hugging Face LLM Course, Chapter 11: “Supervised Fine-Tuning.” huggingface.co/learn/llm-course/chapter11. Released under Apache 2.0; this lesson mirrors its structure with original prose.