Practice: Launch an LLM app in one hour

Self-check

Seven short questions. Answer each before opening the collapsible.

1. Name the five components of a minimum-viable LLM application.

Show answer

(1) A hosted model (Anthropic Claude, or another provider’s API). (2) An API key, safely managed (environment variable, never committed). (3) A prompt template (the spec). (4) Application code that takes input, fills the template, calls the model, returns the response. (5) A UI plus a deployment target (Streamlit / Gradio / web frontend / CLI; deployed somewhere it can be reached). Every later lesson refines one of these.

2. Why does the bootcamp claim “in one hour” rather than longer?

Show answer

Because the hosted model does the hard part: training, inference serving, scaling. You are not standing up infrastructure; you are calling an HTTPS endpoint with a JSON payload. The remaining work is orchestration (prompt, wiring, deploy), which fits comfortably in an hour even for a first-time builder.

3. What is the API-key mistake the lesson warns against, and how do you avoid it?

Show answer

Committing the API key to source control (or hard-coding it). The fix is to load it from an environment variable (or a secrets manager) and .gitignore whatever local file holds it. This sounds boring but is the single most common way a MVP becomes a credentials incident.

4. Roughly how many lines of Python is a complete minimal LLM app, and what does each part do?

Show answer

About thirty lines. The pieces: import the client and UI library; load the API key from the environment; define a system prompt; render a UI for input; on submit, call the model with the prompt and input; display the response. The client and UI library do most of the heavy lifting; your code is wiring.

5. What does the minimum-viable app not yet do, and which later lessons fill each gap?

Show answer

No knowledge beyond pretrained weights + prompt (lesson 4: retrieval and tool use), no rigorous prompt engineering (lesson 3), no real UX (lesson 6: streaming, citations, recoverable failure), and no observability or evaluation (lesson 7: LLMOps), and a thin understanding of how it actually generates (lesson 2: foundations).

6. Why is “ship first, then refine” the lesson’s order, instead of “learn first, then build”?

Show answer

Because shipping a small thing first is a forcing function: it surfaces every practical question (key management, deployment, model choice, prompt design) that the rest of the track teaches you to answer well. It also builds confidence that LLM applications are accessible engineering, not magic. The fastest way to demystify the work is to put a working version in your hands first.

7. What does it mean to say “the model is the easy part now”?

Show answer

Provider APIs hand you a state-of-the-art model with one HTTP call; you do not train, host, or scale it. Most production work for an LLM application is therefore application engineering on top of someone else’s model: orchestration, retrieval, prompt design, UX, evaluation, monitoring. The track is shaped around that reality.

Try it yourself: ship a tiny app

About 30 minutes if you have an account and key already. You will build and run the lesson’s minimum app.

Part A: set up + ship. Get a provider account (Anthropic Claude or another hosted-model API), generate an API key, store it in an environment variable, install the client library and Streamlit (pip install anthropic streamlit), and run the thirty-line script from the lesson body. Type a question into the rendered app and watch the response come back.

What you should see, and why

A browser window with a text input, a submit, and a response area; entering a question returns a model response in two to four sentences (matching the system prompt’s instruction). The pipeline shape is now concrete: input -> prompt template -> API call -> render. Five components, working. If the response is plausibly correct and on-format, the minimal app is shipped. If something fails (auth error, import error, no response), each kind of failure is exactly the kind of practical issue the rest of the track addresses.

Part B (extension). Change one thing at a time and observe: (i) replace the system prompt with “Answer in one sentence only.” (ii) set max_tokens=20. What changes, and what does it tell you about the role of each piece?

What you should notice

(i) The system prompt is the spec; changing it changes the model’s behavior on every input. This is why lesson 3 (prompt engineering) is so consequential, the prompt is doing more work than people realize. (ii) max_tokens=20 truncates the response; the model stops mid-sentence. This is your first taste of LLM productive limits (context length, max output, cost-per-token), which lesson 2 (foundations) opens up. Two one-line changes, two foundational lessons foreshadowed.

Part C (reasoning). Why is the minimum-viable app a fair starting point for the track even though it is missing retrieval, observability, and UX polish?

What you should notice

Because it captures the shape of every production LLM app you will ever build: input flows through a prompt, hits a model, and the response is rendered to a user. Every later lesson is a refinement of one of the five components, not a different structure. Retrieval changes step 3 (the prompt gets augmented with fetched context); observability adds telemetry around step 4 (the model call); UX changes step 5 (how the response is rendered). The minimum app is the template that holds the rest of the production work.

Flashcards

Nine cards. Click any card to reveal the answer. Use the Print flashcards button to lay the set out one card per page for offline review.

Q. The five components of a minimum-viable LLM app?

(1) Hosted model (provider’s API). (2) API key, safely managed (env var). (3) Prompt template. (4) Application code that wires input -> prompt -> API call -> response. (5) UI + deployment target.

Q. Why is 'in one hour' honest?

The hosted model does the hard part (training, inference serving, scaling). You are orchestrating: write a prompt, wire input to output, deploy. Those are tractable engineering tasks that fit comfortably in an hour.

Q. The most common credentials mistake, and the fix?

Committing the API key to source control or hard-coding it. Fix: load from an environment variable (or secrets manager); .gitignore any local file holding it. Boring, but the single most common way an MVP becomes an incident.

Q. Roughly how big is a minimum LLM app?

About thirty lines of Python (Streamlit + a model client). The client and UI library do most of the work; your code is wiring (env-var key, prompt, call, render).

Q. What does the minimum app not yet do?

No retrieval (lesson 4), no rigorous prompt engineering (lesson 3), no real UX (lesson 6), no observability/eval (lesson 7), thin understanding of generation (lesson 2). Each gap is a later lesson.

Q. Why ship first, then refine?

Shipping a small thing surfaces every practical question the rest of the track teaches you to answer (key management, deploy, model choice, prompts). Cheapest way to build confidence that LLM apps are accessible engineering.

Q. What does 'the model is the easy part' mean?

Provider APIs hand you a state-of-the-art model with one HTTP call; no training, hosting, or scaling. Most production work is therefore application engineering on top: orchestration, retrieval, prompts, UX, evaluation, monitoring.

Q. Changing the system prompt: what does it reveal?

The system prompt is the spec; changing it changes behavior on every input. The prompt does more work than people realize; lesson 3 (prompt engineering) opens this up.

Q. Setting max_tokens=20 reveals what?

Productive limits: context length, max output, cost-per-token. The response truncates mid-sentence. Lesson 2 (LLM foundations) makes these limits concrete.