Skip to content

Augmented language models, retrieval and tools

Lesson 3 named the place prompts run out: missing knowledge, missing external systems. This lesson is what you reach for in both cases. The source curriculum is the Full Stack Deep Learning LLM Bootcamp (Spring 2023), by Charles Frye, Sergey Karayev, and Josh Tobin, freely available at fullstackdeeplearning.com/llm-bootcamp with recorded lectures on the Full Stack Deep Learning YouTube channel.

You will distinguish RAG (fetch context into the prompt) from tool use (let the model call external systems); walk RAG’s seven moving parts (knowledge source, chunking, embedding model, vector store, retriever, prompt composition with sources, generation with citations); apply the trade-offs that decide whether RAG actually works (chunk size and overlap, top-k, embedding-model choice, re-ranking, hybrid search, metadata filtering); walk the four steps of tool use (declare schemas, model emits a tool-call request, your code executes, model continues) and see why RAG-as-a-tool is cleaner than always-retrieve; and recognize the recurring RAG failure mode (bad retrieval the model cannot detect) with the held-out-eval defense.

This is lesson 4 of 11, the opener of Phase 2 (building production apps). It connects directly to lesson 3 (“where prompts run out”) and uses lesson 2’s three productive limits as the frame for every trade-off (context, cost, latency). The next lesson (5) reads a real application end-to-end so these moving parts have a worked-example shape; lessons 6 and 7 add the UX and operational layers that wrap all of this.

Prerequisites: lesson 3 of this track (prompt engineering, since retrieved chunks and tool results live inside well-engineered prompts). Familiarity with at least one vector database (Pinecone, Weaviate, Chroma, pgvector) helps but is not required; the lesson treats them as roughly equivalent at this depth.

None. The retrieval intuition is “embed query and chunks into a shared vector space; find nearest neighbors,” explained without the linear algebra. The trade-offs are decisions with empirical effects, not formulas.

The single capability this lesson builds: design retrieval-augmented and tool-using LLM applications, including their moving parts and trade-offs. Concretely, you will be able to:

  • Distinguish RAG from tool use, and explain when modern apps use each (or both)
  • Walk the seven moving parts of a RAG pipeline
  • Apply RAG trade-offs (chunk size, top-k, re-ranking, hybrid search, metadata filtering)
  • Walk the four steps of tool use and explain why “RAG-as-a-tool” is cleaner than always-retrieve
  • Recognize the recurring RAG failure mode (bad retrieval the model can’t detect) and the held-out-eval defense
  • Read time: about 13 minutes
  • Practice time: about 12 minutes (sketch a RAG pipeline for an internal-help use case, plus flashcards)
  • Difficulty: standard (no math; the work is internalizing the seven moving parts, the trade-offs, and the tool-use loop)