Skip to content

Cheatsheet: What's next

DirectionWhat it changesBuilder watch
Longer contextMore fits per call (8K → 1M+ tokens)Doesn’t eliminate retrieval; cost/latency still scale with input
MultimodalityText + images + audio + (video) inputs and outputsLesson 1-7 patterns generalize; new categories open
Smaller specialized modelsNarrow tasks at fraction of frontier costMix architecture (small inner sub-tasks + frontier outer synthesis)
Build-vs-buy spectrumHosted (start) → fine-tune → train from scratchBar for “train” rises; bar for “fine-tune small” falls
AgentsMulti-step tool-use loopsLatency stretches, cost compounds, eval gets harder, new failure modes
Reasoning modelsBetter on multi-step problems5-20x thinking tokens per answer; use deliberately, not as default

How each direction interacts with the three productive limits (L2)

Section titled “How each direction interacts with the three productive limits (L2)”
DirectionContextCostLatency
Longer contextBigger budgetInput cost scales with what you put inTTFT scales with prefill
MultimodalityNew token types (images/audio)Often higher per-multimodal-inputHigher TTFT for multimodal prefill
Smaller specializedSame budgetLower per-call for inner sub-tasksOften lower TTFT and tokens/sec
Build-vs-buySameHosted has provider pricing; fine-tune-then-serve has serving costSelf-serve can be lower latency
AgentsEach step shares its ownCompounds with stepsMultiplies with step count
ReasoningBig “thinking” budget burned per taskHigher per-task (many invisible tokens)Higher TTFT-to-final-answer
Start: hosted API (almost always correct)
Prompting fails consistently on a specific recurring task at scale?
-> fine-tune an open model (lesson 9)
Research / structural data advantage no hosted model can match?
-> train from scratch (Track 15 territory; rare for app teams)

The “mix” architecture (worth knowing by name)

Section titled “The “mix” architecture (worth knowing by name)”
[Router] -> small specialized model
[Retriever-rewriter] -> small specialized model
[Re-ranker / classifier] -> small specialized model
[User-facing synthesis] -> frontier model
[LLMOps wrapping everything] -> per lesson 7

Lowers per-request cost without sacrificing user-facing quality.

  • Multi-step math, code with constraints, logic puzzles, tasks where intermediate steps matter.
  • NOT a default. Per-task cost includes many “thinking” tokens (5-20x visible response).
  • Lesson-7 A/B test on real traffic decides per-task whether the quality lift earns the extra cost.
1. Read it through the three productive limits.
2. Place it on the build-vs-buy spectrum.
3. Identify which lesson 1-7 patterns generalize and which need new techniques.
4. Run the lesson-7 regression suite on the new model BEFORE switching.
5. A/B test on real traffic if quality looks comparable.
6. Adopt with versioned prompts + logged comparison.

The builder’s instinct (the durable takeaway)

Section titled “The builder’s instinct (the durable takeaway)”

Specific models go stale within months. The way you READ each new release should not. Three productive limits + build-vs-buy spectrum + LLMOps discipline + lesson 1-7 patterns = the lens that outlasts the churn.

  • Survey lesson: lighter pedagogy, breadth-over-depth, points forward to deeper lessons.
  • Mix architecture: small specialized models for inner sub-tasks + frontier model for outer synthesis.
  • Build-vs-buy spectrum: hosted API → fine-tune → train from scratch.
  • Builder’s instinct: read new capability through productive limits + build-vs-buy + LLMOps + lesson 1-7 patterns.
  • Full Stack Deep Learning, LLM Bootcamp (Spring 2023): What’s Next? fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.