Summary: What's next

A survey opening Phase 3. The LLM landscape is moving in six named directions, and each changes lesson 2’s productive-limits math differently. Longer context (from 8K → 1M+ tokens) makes more fits but does not eliminate retrieval; selective retrieval still wins on cost and latency. Multimodality generalizes the lesson 1-7 patterns with expanded input/output components; expect multimodal-by-default apps. Smaller specialized models often beat frontier models on narrow sub-tasks; the right architecture is often a mix (small inner sub-tasks, frontier outer synthesis). The build-vs-buy spectrum runs from hosted API (almost always start) to fine-tune an open model (lesson 9) to train from scratch (rare; Track 15). Agents scale lesson 4’s tool-use loop into multi-step task accomplishment, with longer latency budgets, compounding cost, harder evaluation, and new failure modes (lesson 10). Reasoning models lift multi-step quality at higher per-task cost (many “thinking” tokens); use deliberately, not as default. The builder’s instinct outlasts specific models: read each new capability through the three productive limits, the build-vs-buy spectrum, and the LLMOps discipline. This is the scan version; the lesson maps the territory.

Core ideas

Six directions, each changes the productive-limits math differently. Longer context, multimodality, smaller specialized models, build-vs-buy spectrum, agents, reasoning models.
Longer context ≠ no retrieval. Costs more per call, raises TTFT, attention is uneven over huge contexts. Selective retrieval still wins.
Multimodality generalizes the patterns. Lesson 1-7 still apply; input/output components expand. New categories: vision docs, voice in/out, screenshot agents.
Smaller specialized models win on narrow sub-tasks. Mix architecture (small inner + frontier outer) is the common shape.
Build-vs-buy spectrum: hosted API (start) → fine-tune (lesson 9 when prompting consistently fails) → train from scratch (rare; Track 15). The bar for self-training rises; the bar for fine-tuning falls.
Agents = multi-step tool use (lesson 10). Longer latency budgets, compounding cost, harder evaluation, new failure modes.
Reasoning models = quality on multi-step, at cost. Use deliberately for tasks that need reasoning; don’t default.
Builder’s instinct: read each new capability through the three productive limits, the build-vs-buy spectrum, and the lesson-7 LLMOps discipline. Discipline outlasts specific models.

What changes for you

This lesson’s payoff is the builder’s-eye reading discipline. A new model release lands every month; a reader who learned only the current models has to re-learn each one; a reader who internalizes the discipline reads every new release with the same eye, what does this change about context, cost, and latency, where does it fit on build-vs-buy, what regression-tests should I run before adopting it, which lesson 1-7 patterns generalize. The next three lessons take three of these directions deeper: training your own model (lesson 9), agents (lesson 10), and an industry-perspective capstone (lesson 11). With this map in hand, Phase 3 stops being “a grab bag of advanced topics” and reads as three deliberate deep dives on the directions a builder needs to know.

The field is moving in named directions; each changes the productive-limits math differently. Read new capabilities with a builder’s eye, what fits, what costs, what generalizes from what you already know, and the rest of Phase 3 takes three of these directions deeper.