Skip to content

References: Training your own LLM

Source curriculum (structural mirror, cited as further study):
• Full Stack Deep Learning, "LLM Bootcamp" (Spring 2023):
How to train your own LLM
Guest instructor: Reza Shabani (Replit), with Charles Frye, Sergey
Karayev, and Josh Tobin
Course page: https://fullstackdeeplearning.com/llm-bootcamp/
Lecture videos: publicly available on the Full Stack Deep Learning
YouTube channel
License: bootcamp materials are published free to view but no explicit
license (Creative Commons or otherwise) is published; lecture videos
are on YouTube under standard terms.
Required attribution: "Based on the structure of the Full Stack Deep
Learning LLM Bootcamp (Spring 2023), by Charles Frye, Sergey Karayev,
and Josh Tobin (fullstackdeeplearning.com/llm-bootcamp). This is an
independent structural mirror in original prose; it reproduces no
course materials, and Full Stack Deep Learning does not endorse it."
This lesson mirrors the structure of the corresponding bootcamp session (training your own LLM as
a production decision). Clawdemy's lessons are original prose taught at
a strictly technical-primer level; training-data policy, alignment debates,
and similar contested topics are out of scope here.

A short, durable list. Each link is a specific next step, not a generic pile.

  • The TRL library documentation. The reference for SFTTrainer and DPOTrainer; the canonical Python implementations of the fine-tuning recipes named in this lesson.

  • Axolotl. The config-driven wrapper over TRL that most production LoRA fine-tunes use. Worth reading the README + a few example configs before your first fine-tune; it saves significant boilerplate.

  • The Hugging Face PEFT library. Where LoRA + the other parameter-efficient methods live, with practical LoraConfig examples. The bridge between “I want to fine-tune” and “I have a LoRA-trained adapter to deploy.”

Where this connects inside the track and the wider curriculum.

  • What’s next (lesson 8). This lesson is the deep dive on the “fine-tune an open model” point on the build-vs-buy spectrum named there.

  • Augmented language models (lesson 4). The retrieval/tool-use boundary, the second of the three fine-tuning criteria (“retrieval/tools don’t fix it”).

  • LLMOps (lesson 7). The held-out evaluation discipline + A/B testing + regression suite are exactly what makes a fine-tune adoption safe rather than silently risky.

  • Track 14 lesson 10 (Fine-tuning LLMs: supervised and instruction tuning). The using-side companion for the SFT mechanics this lesson assumes.

  • Track 15 lesson 13 (Post-training: SFT and RLHF). The from-scratch / lab-POV companion for the same pipeline.

  • Track 15 lesson 12 (Data: filtering, dedup, mixing, synthetic). The data-engineering companion; the synthetic-data + filtering discipline this lesson recommends is built there.