Summary: How instruction tuning makes a model helpful

A pretrained transformer is a great autocompleter, not an assistant. It produces plausible text. Supervised fine-tuning (SFT) is the first post-training stage that turns it into something that follows instructions. SFT uses the same loss function as pretraining (predict the next token), applied to a much smaller, much higher quality dataset of instruction-response pairs hand-written by humans. The trained model produces a response shape rather than a continuation shape when it sees an instruction. The knowledge that fills the response is already in the weights from pretraining; SFT teaches the model when to apply it.

This summary is the scan-it-in-four-minutes version. The full lesson covers each step in the SFT mechanism, what it changes versus what stays the same, where parameter-efficient methods (LoRA) fit, and the structural limitation that makes Phase 4 a multi-lesson story rather than a one-lesson one.

Core ideas

Two artifacts, two roles. The base model is the output of pretraining. It predicts next tokens, no more. The instruction-tuned model is the output of SFT. It follows instructions. Same architecture, mostly the same weights, slightly nudged.
Same objective, different data. SFT uses next-token prediction (the pretraining loss) on a curated, much smaller corpus of instruction-response pairs. Same code, different data and a different scope. The model learns to predict the response tokens given the instruction tokens.
SFT teaches response shape, not new knowledge. The base model already knows the content from pretraining. SFT teaches it that “the user wrote an instruction, so the next thing should be a response in the matching shape.” A few thousand high-quality examples can change surface behavior decisively because the underlying capability was already there.
The volume drop is the whole point. Pretraining costs months of compute on trillions of tokens. SFT runs in hours to days on a tiny curated corpus. A massive cost asymmetry between the stages, but the SFT stage is what makes the model usable.
LoRA is the parameter-efficient variant. Instead of updating all the weights, freeze most of them and train a small set of low-rank matrices on the SFT data. Cheaper and equally effective for the conceptual mechanism. You will see “LoRA” and “PEFT” in open-source training repos and model release notes.
The end-state is “correct on average.” The instruction-tuned model produces a valid response in the right shape. Among many possible responses, it picks something close to the average of the training examples. Often correct; rarely the best possible answer.
The structural limit: no negative signal. Every SFT example is positive (“here is what to predict”). There are no negative examples, no “this would be worse,” no “do not do this.” The lecturer’s framing: SFT teaches what to predict, not what not to predict. That gap is exactly what the next lesson opens.
Pitfall: SFT does not add knowledge at typical scales. Knowledge is from pretraining. SFT is response shape. (At very high SFT volumes the line with continued pretraining blurs, but the starting mental model holds.)
Pitfall: SFT is not “the last step.” It is a real capability jump and it is not sufficient. An SFT-only model is useful but noticeably less polished than one that has also been preference-tuned. Phase 4 lessons 2 and 3 cover the rest.

What changes for you

Before this lesson, the difference between “a base model” and “an instruction-tuned model” was probably either invisible or attributed to “the algorithm.” After it, you have a working frame for what changes when. When you see a model card that says “fine-tuned on N instruction-response pairs” or “trained with SFT on dataset X,” you know what stage it has been through and what it can and cannot do. When you see “LoRA” in a release note or training repo, you know it is parameter-efficient SFT, not a different concept. When you read about an open-source instruction-tuned model that “feels less polished” than a commercial chat assistant, you know that gap is the preference-tuning gap, and that the next two lessons cover what closes it.

Pretraining fills the weights with everything the model knows.
Supervised fine-tuning teaches it to answer when someone asks.