Skip to content

Cheatsheet: Launch an LLM app in one hour

#ComponentWhat it doesDon’t
1Hosted model (provider API)Inference; the provider serves itHost your own first
2API keyAuthenticates your callsCommit it to source; hard-code it
3Prompt templateThe spec for behaviorTreat it as an afterthought
4Application codeWires input -> prompt -> API -> responseHide the wiring in magic
5UI + deploymentHow users reach itSkip deployment; it’s the same app on a server
user input -> prompt template -> hosted-model API call -> render

Every later lesson refines one stage. The shape does not change.

app.py
import os
import streamlit as st
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"]) # env var, never committed
SYSTEM_PROMPT = (
"You are a helpful, careful assistant. Answer the user's question "
"in two to four sentences. If you do not know, say so plainly."
)
st.title("Ask me anything")
question = st.text_input("Your question:")
if question:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=400,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": question}],
)
st.write(response.content[0].text)

Run: streamlit run app.py (with the env var set). Same shape works with other hosted-model APIs by swapping the client and the model name.

The provider does: training, inference serving, scaling.
You do: orchestration (prompt + wiring + deploy).

Orchestration fits in an hour even on a first attempt.

GapFixed in
No knowledge beyond pretraining + promptLesson 4 (augmented LMs, RAG, tools)
No rigorous prompt engineeringLesson 3 (prompt engineering toolkit)
No real UX (streaming, citations, hedging)Lesson 6 (UX for LUIs)
No observability / evaluation in productionLesson 7 (LLMOps)
Thin understanding of the model’s behaviorLesson 2 (LLM foundations)

Each gap is a later lesson; the minimum app is the template they all refine.

PitfallFix
API key in code / committedEnv var; .gitignore; secrets manager
Picking max_tokens too lowSet it generous enough to finish, not so high you waste cost
No system promptUse one; it is the spec
Hard-coding the model name in many placesOne constant; one place to swap providers

The model is the easy part now. The production work is application engineering on top of someone else’s model: orchestration, retrieval, prompts, UX, evaluation, monitoring.

  • Hosted model: a provider’s API you call (Anthropic’s Claude API, or another); not a model you host.
  • System prompt: the prompt that defines the assistant’s behavior, separate from the user message.
  • Application code: the wiring (input -> prompt fill -> API call -> render). Small.
  • Deployment target: where the application runs (cloud function, Space, server, etc.).
  • Full Stack Deep Learning, LLM Bootcamp (Spring 2023): Launch an LLM App in One Hour. fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.