Instruction tuning (SFT): cheatsheet

The one idea that matters

pretraining (huge, raw text)  →  SFT (small, curated instructions)
       (knows)                          (follows)

Pretraining vs SFT, side by side

Property	Pretraining	SFT
Data	Trillions of tokens of unfiltered web text	A curated, much smaller corpus of instruction-response pairs
Objective	Next-token prediction	Same: next-token prediction (on response tokens only)
Output	Base model	Instruction-tuned model
Time	Months on a cluster	Hours to days, depending on model size
What it adds	Knowledge, grammar, idiom, the ability to continue text plausibly	Response shape: instruction comes in, response comes out
What it does not add	Instruction-following, helpfulness, response shape	New knowledge (with a hedge at very high volumes)

The SFT mechanism

Take a base model.
Show it (instruction, response) pairs hand-written by humans.
Train the same way pretraining did (predict the next token),
  but only score the response tokens against the loss.
After enough examples, the model generalizes:
  any instruction → response in the same shape.

What SFT changes (and what it does not)

Question	Answer	Why
Does SFT teach new knowledge?	No (at typical scales)	The base model already knows the content from pretraining. SFT teaches it when to deploy that knowledge in response form.
Does SFT change the architecture?	No	Same model, same weights, slightly nudged.
Does SFT change the objective?	No	Same next-token prediction loss.
Does SFT make the model helpful in tone?	No	SFT produces a response shape rather than a continuation; which response is best in tone is a preference-tuning concern (lesson 2).
Can a few thousand examples change behavior dramatically?	Yes	The underlying capability is already in the weights. SFT just activates the trigger.

LoRA in one line

Approach	What it changes	Why it matters
Full SFT	Updates all model weights	Slower, more expensive; needs more memory
LoRA	Holds base weights largely fixed; trains a small set of low-rank matrices on the SFT data	Cheaper, equally effective for the conceptual mechanism, lets you keep many specialized fine-tunes around one base

You will see “LoRA” and “PEFT” (parameter-efficient fine-tuning) in open-source training repos and model release notes. Same idea, different acronym.

The structural limitation (the bridge to lesson 2)

The lecturer’s framing: SFT is all about teaching the model what it should predict, but it does not teach the model what it should not predict.

Question	SFT can teach this	SFT cannot teach this
What response shape to produce	Yes
Which of two valid responses is better		No
Which kinds of responses to refuse		No
Calibrated uncertainty (“I do not know”)		No

Everything in the right column is the next lesson’s territory.

Pitfalls to dodge

Pitfall	Reality
SFT teaches new knowledge	At typical scales, no. SFT teaches response shape. Knowledge was in the base model from pretraining. (Hedge: at very high SFT volumes the line with continued pretraining starts to blur.)
SFT alone is sufficient for a polished assistant	No. SFT is a real capability jump and is not the last step. The structural “no negative signal” limit is what the next lessons fix.
Fine-tuning equals SFT	Fine-tuning is the umbrella term; SFT is one kind. Many other fine-tuning regimes exist for narrower domain tasks. SFT is specifically scoped to instruction-following.

Translating model release language

Vendor or repo claim	What it usually means
”Fine-tuned on N instruction-response pairs”	SFT, classic recipe. Stages 1 and 2 done. Nothing about preference tuning.
”LoRA fine-tune of model X”	SFT-via-LoRA on a specific base model. Cheap to produce, often shipped as small adapter weights you load on top of the base.
”Instruction-tuned base”	SFT done, no preference tuning yet. Useful, often visibly less polished than a fully tuned chat assistant.

Glossary

Base model: the output of pretraining. Predicts next tokens; does not follow instructions.
SFT (supervised fine-tuning): trains the base model on curated instruction-response pairs using next-token prediction.
Instruction-tuned model: the post-SFT model. Follows instructions; has no learned preference among valid responses.
LoRA (Low-Rank Adaptation): a parameter-efficient SFT technique that holds the base weights largely fixed and trains a small set of low-rank matrices.
PEFT (Parameter-Efficient Fine-Tuning): umbrella term for techniques like LoRA that update only a small fraction of model parameters.
Continued pretraining: training a base model on more raw text from a new domain (different from SFT in objective scope; SFT trains on instructions, continued pretraining trains on free text).
Negative signal: information that some outputs are worse than others. SFT does not provide it; preference-based methods (next lesson) do.
Post-training: any training stage after pretraining. SFT is the first.

Pretraining fills the weights with everything the model knows.
Supervised fine-tuning teaches it to answer when someone asks.