Augmented language models: brief

What you’ll learn

Lesson 3 named the place prompts run out: missing knowledge, missing external systems. This lesson is what you reach for in both cases. The source curriculum is the Full Stack Deep Learning LLM Bootcamp (Spring 2023), by Charles Frye, Sergey Karayev, and Josh Tobin, freely available at fullstackdeeplearning.com/llm-bootcamp with recorded lectures on the Full Stack Deep Learning YouTube channel.

You will distinguish RAG (fetch context into the prompt) from tool use (let the model call external systems); walk RAG’s seven moving parts (knowledge source, chunking, embedding model, vector store, retriever, prompt composition with sources, generation with citations); apply the trade-offs that decide whether RAG actually works (chunk size and overlap, top-k, embedding-model choice, re-ranking, hybrid search, metadata filtering); walk the four steps of tool use (declare schemas, model emits a tool-call request, your code executes, model continues) and see why RAG-as-a-tool is cleaner than always-retrieve; and recognize the recurring RAG failure mode (bad retrieval the model cannot detect) with the held-out-eval defense.

Where this fits

This is lesson 4 of 11, the opener of Phase 2 (building production apps). It connects directly to lesson 3 (“where prompts run out”) and uses lesson 2’s three productive limits as the frame for every trade-off (context, cost, latency). The next lesson (5) reads a real application end-to-end so these moving parts have a worked-example shape; lessons 6 and 7 add the UX and operational layers that wrap all of this.

Before you start

Prerequisites: lesson 3 of this track (prompt engineering, since retrieved chunks and tool results live inside well-engineered prompts). Familiarity with at least one vector database (Pinecone, Weaviate, Chroma, pgvector) helps but is not required; the lesson treats them as roughly equivalent at this depth.

About the math

None. The retrieval intuition is “embed query and chunks into a shared vector space; find nearest neighbors,” explained without the linear algebra. The trade-offs are decisions with empirical effects, not formulas.

By the end, you’ll be able to

The single capability this lesson builds: design retrieval-augmented and tool-using LLM applications, including their moving parts and trade-offs. Concretely, you will be able to:

Distinguish RAG from tool use, and explain when modern apps use each (or both)
Walk the seven moving parts of a RAG pipeline
Apply RAG trade-offs (chunk size, top-k, re-ranking, hybrid search, metadata filtering)
Walk the four steps of tool use and explain why “RAG-as-a-tool” is cleaner than always-retrieve
Recognize the recurring RAG failure mode (bad retrieval the model can’t detect) and the held-out-eval defense

Time and difficulty

Read time: about 13 minutes
Practice time: about 12 minutes (sketch a RAG pipeline for an internal-help use case, plus flashcards)
Difficulty: standard (no math; the work is internalizing the seven moving parts, the trade-offs, and the tool-use loop)