At typical scales, no. SFT teaches response shape. Knowledge was in the base model from pretraining. (Hedge: at very high SFT volumes the line with continued pretraining starts to blur.)
SFT alone is sufficient for a polished assistant
No. SFT is a real capability jump and is not the last step. The structural “no negative signal” limit is what the next lessons fix.
Fine-tuning equals SFT
Fine-tuning is the umbrella term; SFT is one kind. Many other fine-tuning regimes exist for narrower domain tasks. SFT is specifically scoped to instruction-following.
Base model: the output of pretraining. Predicts next tokens; does not follow instructions.
SFT (supervised fine-tuning): trains the base model on curated instruction-response pairs using next-token prediction.
Instruction-tuned model: the post-SFT model. Follows instructions; has no learned preference among valid responses.
LoRA (Low-Rank Adaptation): a parameter-efficient SFT technique that holds the base weights largely fixed and trains a small set of low-rank matrices.
PEFT (Parameter-Efficient Fine-Tuning): umbrella term for techniques like LoRA that update only a small fraction of model parameters.
Continued pretraining: training a base model on more raw text from a new domain (different from SFT in objective scope; SFT trains on instructions, continued pretraining trains on free text).
Negative signal: information that some outputs are worse than others. SFT does not provide it; preference-based methods (next lesson) do.
Post-training: any training stage after pretraining. SFT is the first.
Pretraining fills the weights with everything the model knows. Supervised fine-tuning teaches it to answer when someone asks.