Project walkthrough: cheatsheet

The app, in one line

askFSDL = Q&A over the FSDL course materials. Scoped, source-citing, streamed. The shape of a large class of production LLM apps.

Overall pipeline (lesson 1 + lesson 4)

user input
  -> prompt template
  -> RAG retrieval over scoped corpus (with source labels carried)
  -> hosted-model API call (streaming, citations asked)
  -> render with citations

The production decisions, in order

Stage	Decision	Principle
Knowledge source	Narrow, well-known corpus	Narrow that works > broad that doesn’t
Chunking	Tutorial-shaped chunks + metadata (source, section)	Chunk for content’s natural unit; tag for citations + filters
Retrieval	top-k with source labels kept on every chunk	Citations are carried, not bolted on
System prompt	Scope-honest, citation-asking, refuses out-of-scope	Refusing OOS is part of the spec, not a failure
User message	Retrieved chunks (with sources) + user question at end	End-place critical instructions (lesson 3)
Generation	Stream + cite + capped `max_tokens`	UX + cost + latency decisions at the call site
Logging	Q + retrieval IDs + prompt version + model + params + response + user feedback	Seed of LLMOps; near-impossible to backfill later

What the walkthrough defers (and where it goes)

Deferred	Lesson
Sophisticated UX (regeneration, hedging, recoverable failure)	6
Production observability + evaluation pipelines	7
Multiple-tool / agentic flow	10

Honest scoping: core well, missing pieces named.

The “five hours, not five weeks” reframing

A real LLM app of this shape is small:
  a few hundred lines of Python pipeline
  + prompts (the spec)
  + indexed corpus
  + a hosted model someone else trained.
The complexity is in the DECISIONS, not the line count.

Read-this-design checklist (use on any LLM app)

Is the knowledge source scoped enough to retrieve over well?
Are chunks sized for the content’s natural unit, with metadata?
Does the source label travel through retrieval into the prompt?
Is the system prompt scope-honest (refuses out-of-scope) and citation-asking?
Are streaming + citations decided at the call site and rendered in the UI?
Are the 5-10 logging fields in place (Q, retrieval IDs, prompt version, model+params, response, user feedback)?
Is max_tokens capped deliberately?
Has retrieval been evaluated on a held-out set (separate from end-to-end answer quality)?

Words to use precisely

Scope-honest system prompt: answers in-scope, cites sources, refuses out-of-scope plainly.
Source-carrying retrieval: every chunk keeps its source label through the entire pipeline.
Citation discipline: the model is asked to cite which chunks it used; UI renders citations as links back to source.
Production-decision eye: the ability to see the deliberate choices baked into a real app’s pipeline stages.

Source

Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Project Walkthrough (askFSDL). fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.