Launch an LLM app: cheatsheet

The five components

#	Component	What it does	Don’t
1	Hosted model (provider API)	Inference; the provider serves it	Host your own first
2	API key	Authenticates your calls	Commit it to source; hard-code it
3	Prompt template	The spec for behavior	Treat it as an afterthought
4	Application code	Wires input -> prompt -> API -> response	Hide the wiring in magic
5	UI + deployment	How users reach it	Skip deployment; it’s the same app on a server

The pipeline shape (every LLM app)

user input  ->  prompt template  ->  hosted-model API call  ->  render

Every later lesson refines one stage. The shape does not change.

Minimal app: ~30 lines

import os
import streamlit as st
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])  # env var, never committed

SYSTEM_PROMPT = (
    "You are a helpful, careful assistant. Answer the user's question "
    "in two to four sentences. If you do not know, say so plainly."
)

st.title("Ask me anything")
question = st.text_input("Your question:")

if question:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": question}],
    )
    st.write(response.content[0].text)

Run: streamlit run app.py (with the env var set). Same shape works with other hosted-model APIs by swapping the client and the model name.

Why “in one hour” is honest

The provider does:     training, inference serving, scaling.
You do:                orchestration (prompt + wiring + deploy).

Orchestration fits in an hour even on a first attempt.

What the minimum app is NOT yet

Gap	Fixed in
No knowledge beyond pretraining + prompt	Lesson 4 (augmented LMs, RAG, tools)
No rigorous prompt engineering	Lesson 3 (prompt engineering toolkit)
No real UX (streaming, citations, hedging)	Lesson 6 (UX for LUIs)
No observability / evaluation in production	Lesson 7 (LLMOps)
Thin understanding of the model’s behavior	Lesson 2 (LLM foundations)

Each gap is a later lesson; the minimum app is the template they all refine.

Common pitfalls (and the fix)

Pitfall	Fix
API key in code / committed	Env var; `.gitignore`; secrets manager
Picking `max_tokens` too low	Set it generous enough to finish, not so high you waste cost
No system prompt	Use one; it is the spec
Hard-coding the model name in many places	One constant; one place to swap providers

The reframing

The model is the easy part now. The production work is application engineering on top of someone else’s model: orchestration, retrieval, prompts, UX, evaluation, monitoring.

Words to use precisely

Hosted model: a provider’s API you call (Anthropic’s Claude API, or another); not a model you host.
System prompt: the prompt that defines the assistant’s behavior, separate from the user message.
Application code: the wiring (input -> prompt fill -> API call -> render). Small.
Deployment target: where the application runs (cloud function, Space, server, etc.).

Source

Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Launch an LLM App in One Hour. fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.