LLM landscape: cheatsheet

Six directions in the LLM landscape

Direction	What it changes	Builder watch
Longer context	More fits per call (8K → 1M+ tokens)	Doesn’t eliminate retrieval; cost/latency still scale with input
Multimodality	Text + images + audio + (video) inputs and outputs	Lesson 1-7 patterns generalize; new categories open
Smaller specialized models	Narrow tasks at fraction of frontier cost	Mix architecture (small inner sub-tasks + frontier outer synthesis)
Build-vs-buy spectrum	Hosted (start) → fine-tune → train from scratch	Bar for “train” rises; bar for “fine-tune small” falls
Agents	Multi-step tool-use loops	Latency stretches, cost compounds, eval gets harder, new failure modes
Reasoning models	Better on multi-step problems	5-20x thinking tokens per answer; use deliberately, not as default

How each direction interacts with the three productive limits (L2)

Direction	Context	Cost	Latency
Longer context	Bigger budget	Input cost scales with what you put in	TTFT scales with prefill
Multimodality	New token types (images/audio)	Often higher per-multimodal-input	Higher TTFT for multimodal prefill
Smaller specialized	Same budget	Lower per-call for inner sub-tasks	Often lower TTFT and tokens/sec
Build-vs-buy	Same	Hosted has provider pricing; fine-tune-then-serve has serving cost	Self-serve can be lower latency
Agents	Each step shares its own	Compounds with steps	Multiplies with step count
Reasoning	Big “thinking” budget burned per task	Higher per-task (many invisible tokens)	Higher TTFT-to-final-answer

The build-vs-buy decision tree (T21 spec)

Start:                hosted API (almost always correct)
Prompting fails consistently on a specific recurring task at scale?
                     -> fine-tune an open model (lesson 9)
Research / structural data advantage no hosted model can match?
                     -> train from scratch (Track 15 territory; rare for app teams)

The “mix” architecture (worth knowing by name)

[Router]                     -> small specialized model
[Retriever-rewriter]         -> small specialized model
[Re-ranker / classifier]     -> small specialized model
[User-facing synthesis]      -> frontier model
[LLMOps wrapping everything] -> per lesson 7

Lowers per-request cost without sacrificing user-facing quality.

When to reach for a reasoning model

Multi-step math, code with constraints, logic puzzles, tasks where intermediate steps matter.
NOT a default. Per-task cost includes many “thinking” tokens (5-20x visible response).
Lesson-7 A/B test on real traffic decides per-task whether the quality lift earns the extra cost.

Adopting a new capability safely

1. Read it through the three productive limits.
2. Place it on the build-vs-buy spectrum.
3. Identify which lesson 1-7 patterns generalize and which need new techniques.
4. Run the lesson-7 regression suite on the new model BEFORE switching.
5. A/B test on real traffic if quality looks comparable.
6. Adopt with versioned prompts + logged comparison.

The builder’s instinct (the durable takeaway)

Specific models go stale within months. The way you READ each new release should not. Three productive limits + build-vs-buy spectrum + LLMOps discipline + lesson 1-7 patterns = the lens that outlasts the churn.

Words to use precisely

Survey lesson: lighter pedagogy, breadth-over-depth, points forward to deeper lessons.
Mix architecture: small specialized models for inner sub-tasks + frontier model for outer synthesis.
Build-vs-buy spectrum: hosted API → fine-tune → train from scratch.
Builder’s instinct: read new capability through productive limits + build-vs-buy + LLMOps + lesson 1-7 patterns.

Source

Full Stack Deep Learning, LLM Bootcamp (Spring 2023): What’s Next? fullstackdeeplearning.com/llm-bootcamp. Independent structural mirror in original prose; see references.