Training your own LLM, in brief

What you’ll learn

Lesson 8 sketched the build-vs-buy spectrum and named “fine-tune an open model” as the middle point most teams should consider before “train from scratch.” This lesson is the deep dive on that point. The source curriculum is the Full Stack Deep Learning LLM Bootcamp (Spring 2023) session on training your own LLM, with Reza Shabani (Replit) as the guest instructor, freely available at fullstackdeeplearning.com/llm-bootcamp with recorded lectures on the Full Stack Deep Learning YouTube channel.

You will apply the three-things-true-at-once test for whether to fine-tune (prompting fails consistently + retrieval/tools don’t fix it + volume justifies upfront cost); walk the staged fine-tuning pipeline (open checkpoint → curated small SFT dataset → LoRA training via TRL or Axolotl → optionally DPO → held-out eval → A/B test in production); apply the economics rule (per-task hosted cost × expected lifetime volume vs fine-tune cost + serving cost); place fine-tuning in the mix architecture from lesson 8 (inner sub-tasks fine-tuned; outer user-facing synthesis hosted); and recognize that train-from-scratch is almost never the right move for an application team (Track 15 territory).

§6 framing note: taught at a strictly technical-primer level, same discipline as Track 14 lesson 10 and Track 15 lesson 13. Mechanical: when and how to consider fine-tuning, the staged pipeline, the economics. Out of scope: training-data policy, alignment debates, contested safety claims, sector-specific compliance for trained models, those belong in their own forum with the right stakeholders.

Where this fits

This is lesson 9 of 11, the second lesson of Phase 3 (advanced and the field). It is the deep dive on the fine-tune point of lesson 8’s build-vs-buy spectrum, and it threads back to lesson 3 (“where prompts run out”), lesson 4 (retrieval/tools as the first move before fine-tuning), lesson 7 (LLMOps as the evaluation and A/B discipline that makes fine-tune adoption safe), and Track 14 lesson 10 / Track 15 lesson 13 / Track 15 lesson 12 for the using- and build-side companions on the methods and data.

Before you start

Prerequisites: lesson 8 of this track (the build-vs-buy spectrum this lesson deep-dives) and lesson 7 (the eval + A/B discipline that wraps the fine-tune adoption). Track 14 lesson 10 (using-side SFT mechanics) and Track 15 lesson 13 (post-training pipeline) are direct companions; familiarity with one helps but is not required.

About the math

Light. The “economics rule” is multiplication and a crossover calculation (one division). No derivations of LoRA or DPO mechanics; those live in Track 14 lesson 10 and Track 15 lesson 13. The decision-making here is criterion-and-arithmetic, not theory.

By the end, you’ll be able to

The single capability this lesson builds: describe when and how to train your own (smaller, specialized) LLM versus using a hosted model. Concretely, you will be able to:

Apply the three-things-true-at-once test for whether to fine-tune
Walk the staged fine-tuning pipeline (checkpoint → SFT data → LoRA → optional DPO → eval → A/B)
Apply the economics rule (per-task hosted cost × volume vs fine-tune cost + serving cost)
Place fine-tuning in the mix architecture (inner sub-tasks fine-tuned; outer synthesis hosted)
Recognize that train-from-scratch is almost never the right move for an app team

Time and difficulty

Read time: about 12 minutes
Practice time: about 12 minutes (fine-tune-or-not on four scenarios + a back-of-envelope economics calculation, plus flashcards)
Difficulty: standard (no math beyond arithmetic; the work is internalizing the criteria, the pipeline, and the build-economics framing)