Skip to content

Cheatsheet: How instruction tuning makes a model helpful

pretraining (huge, raw text) → SFT (small, curated instructions)
(knows) (follows)
PropertyPretrainingSFT
DataTrillions of tokens of unfiltered web textA curated, much smaller corpus of instruction-response pairs
ObjectiveNext-token predictionSame: next-token prediction (on response tokens only)
OutputBase modelInstruction-tuned model
TimeMonths on a clusterHours to days, depending on model size
What it addsKnowledge, grammar, idiom, the ability to continue text plausiblyResponse shape: instruction comes in, response comes out
What it does not addInstruction-following, helpfulness, response shapeNew knowledge (with a hedge at very high volumes)
Take a base model.
Show it (instruction, response) pairs hand-written by humans.
Train the same way pretraining did (predict the next token),
but only score the response tokens against the loss.
After enough examples, the model generalizes:
any instruction → response in the same shape.
QuestionAnswerWhy
Does SFT teach new knowledge?No (at typical scales)The base model already knows the content from pretraining. SFT teaches it when to deploy that knowledge in response form.
Does SFT change the architecture?NoSame model, same weights, slightly nudged.
Does SFT change the objective?NoSame next-token prediction loss.
Does SFT make the model helpful in tone?NoSFT produces a response shape rather than a continuation; which response is best in tone is a preference-tuning concern (lesson 2).
Can a few thousand examples change behavior dramatically?YesThe underlying capability is already in the weights. SFT just activates the trigger.
ApproachWhat it changesWhy it matters
Full SFTUpdates all model weightsSlower, more expensive; needs more memory
LoRAHolds base weights largely fixed; trains a small set of low-rank matrices on the SFT dataCheaper, equally effective for the conceptual mechanism, lets you keep many specialized fine-tunes around one base

You will see “LoRA” and “PEFT” (parameter-efficient fine-tuning) in open-source training repos and model release notes. Same idea, different acronym.

The structural limitation (the bridge to lesson 2)

Section titled “The structural limitation (the bridge to lesson 2)”

The lecturer’s framing: SFT is all about teaching the model what it should predict, but it does not teach the model what it should not predict.

QuestionSFT can teach thisSFT cannot teach this
What response shape to produceYes
Which of two valid responses is betterNo
Which kinds of responses to refuseNo
Calibrated uncertainty (“I do not know”)No

Everything in the right column is the next lesson’s territory.

PitfallReality
SFT teaches new knowledgeAt typical scales, no. SFT teaches response shape. Knowledge was in the base model from pretraining. (Hedge: at very high SFT volumes the line with continued pretraining starts to blur.)
SFT alone is sufficient for a polished assistantNo. SFT is a real capability jump and is not the last step. The structural “no negative signal” limit is what the next lessons fix.
Fine-tuning equals SFTFine-tuning is the umbrella term; SFT is one kind. Many other fine-tuning regimes exist for narrower domain tasks. SFT is specifically scoped to instruction-following.
Vendor or repo claimWhat it usually means
”Fine-tuned on N instruction-response pairs”SFT, classic recipe. Stages 1 and 2 done. Nothing about preference tuning.
”LoRA fine-tune of model X”SFT-via-LoRA on a specific base model. Cheap to produce, often shipped as small adapter weights you load on top of the base.
”Instruction-tuned base”SFT done, no preference tuning yet. Useful, often visibly less polished than a fully tuned chat assistant.
  • Base model: the output of pretraining. Predicts next tokens; does not follow instructions.
  • SFT (supervised fine-tuning): trains the base model on curated instruction-response pairs using next-token prediction.
  • Instruction-tuned model: the post-SFT model. Follows instructions; has no learned preference among valid responses.
  • LoRA (Low-Rank Adaptation): a parameter-efficient SFT technique that holds the base weights largely fixed and trains a small set of low-rank matrices.
  • PEFT (Parameter-Efficient Fine-Tuning): umbrella term for techniques like LoRA that update only a small fraction of model parameters.
  • Continued pretraining: training a base model on more raw text from a new domain (different from SFT in objective scope; SFT trains on instructions, continued pretraining trains on free text).
  • Negative signal: information that some outputs are worse than others. SFT does not provide it; preference-based methods (next lesson) do.
  • Post-training: any training stage after pretraining. SFT is the first.

Pretraining fills the weights with everything the model knows.
Supervised fine-tuning teaches it to answer when someone asks.