References: Prompt caching and context management

Source material

Source curriculum (structural mirror, cited as further study):
• Anthropic Academy (https://anthropic.skilljar.com/):
    "Building with the Claude API" course
    (prompt-caching and context-management sections)
  License: Anthropic Academy course content is account-gated;
    Clawdemy structurally mirrors the Academy's lesson progression
    as inspiration and cites it as further study. Every substantive
    claim in this lesson is verifiable against the public Anthropic
    documentation.

Primary public-doc anchors (every substantive claim verified against):
• Anthropic, "Prompt caching" (the cache_control mechanism,
   placement layers, pricing multipliers, the four-breakpoint rule,
   minimum-cacheable sizes, usage field meanings, workspace
   isolation)
  https://platform.claude.com/docs/en/build-with-claude/prompt-caching
• Anthropic, "Context windows" (per-model context-window sizes,
   context rot, context awareness, overflow stop reason,
   token-counting API pointer)
  https://platform.claude.com/docs/en/build-with-claude/context-windows
• Anthropic, "Compaction" (the compact_20260112 edit shape, the
   compact-2026-01-12 beta header, the default 150K trigger, the
   pause_after_compaction pattern, the system-prompt-cache-survival
   recommendation, the usage.iterations billing breakdown,
   the tool-call-during-summary workaround)
  https://platform.claude.com/docs/en/build-with-claude/compaction
• Anthropic, "Context editing" (clear_tool_uses_20250919 and
   clear_thinking_20251015, the context-management-2025-06-27
   beta header, clear_at_least and keep parameters, the cache-
   interaction asymmetry, client-side SDK compaction note)
  https://platform.claude.com/docs/en/build-with-claude/context-editing
• Anthropic, "Effective context engineering for AI agents"
   (the broader curation principles)
  https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Verbatim claims sourced from the public docs:
• "Prompt caching optimizes your API usage by allowing resuming
   from specific prefixes in your prompts. This significantly
   reduces processing time and costs for repetitive tasks or
   prompts with consistent elements" (prompt-caching docs opener)
• "By default, the cache has a 5-minute lifetime. The cache is
   refreshed for no additional cost each time the cached content
   is used" (prompt-caching docs)
• Pricing multipliers verbatim: "5-minute cache write tokens are
   1.25 times the base input tokens price; 1-hour cache write
   tokens are 2 times the base input tokens price; Cache read
   tokens are 0.1 times the base input tokens price"
   (prompt-caching docs pricing section)
• "Shorter prompts cannot be cached, even if marked with
   cache_control. Any requests to cache fewer than this number
   of tokens will be processed without caching, and no error is
   returned" (prompt-caching docs minimum-token section)
• "As token count grows, accuracy and recall degrade, a phenomenon
   known as context rot. This makes curating what's in context just
   as important as how much space is available" (context-windows
   docs)
• "Server-side context compaction for managing long conversations
   that approach context window limits" (compaction docs purpose)
• "As conversations get longer, models struggle to maintain focus
   across the full history. Compaction keeps the active context
   focused and performant by replacing stale content with concise
   summaries" (compaction docs)
• "Context editing allows you to selectively clear specific
   content from conversation history as it grows. Beyond optimizing
   costs and staying within limits, this is about actively curating
   what Claude sees" (context-editing docs)
• "For most use cases, server-side compaction is the primary
   strategy for managing context in long-running conversations.
   The strategies on this page are useful for specific scenarios
   where you need more fine-grained control over what content is
   cleared" (context-editing docs)

Required attribution: "Based on the structure of the Anthropic Academy
  'Building with the Claude API' course
  (https://anthropic.skilljar.com/). This lesson is an independent
  structural mirror in original prose; every substantive claim is
  verified against the public Anthropic Claude documentation at
  https://platform.claude.com/docs/. Anthropic does not endorse it."

Going deeper

A short, durable list. Each link is a specific next step inside Track 22.

Lesson 8 of this track, “From single call to agent loop.” Where the cached + curated capability set from this lesson becomes the per-step inventory inside a multi-turn loop. Compaction and tool result clearing become load-bearing for any agent that runs for more than a few iterations.
Lesson 9 of this track, “Six effective-agent patterns.” Where session shape (single-shot, chained, parallel, orchestrator-workers, evaluator-optimizer, autonomous) decides which context-management posture fits.
Lesson 12 of this track, “Shipping a Claude application.” Where the cache-hit ratio and usage.iterations breakdown become production monitoring signals; the eval-set discipline measures whether each cached prefix is actually paying back.

Adjacent tracks (the natural next destinations)

Track 20 (AI Agents and Tool Use): pick this if you want the full track-level depth on context engineering for agents, including the principles essay this lesson cites (Effective context engineering for AI agents).
Track 21 (LLM Ops and Production): pick this if you want the provider-agnostic view of cost-and-latency monitoring (lesson 7 LLMOps). The cache-hit-ratio telemetry this lesson recommends maps onto the observability pillar from there.

Where this connects inside the track

The cost-and-staleness layer the rest of T22 relies on:

Lesson 1 (first call): the iterate-the-content-array discipline now includes compaction blocks as another content type that lands in the response, and usage.iterations as the new place where compaction-step billing surfaces.
Lesson 2 (production patterns): L2’s stop_reason dispatch enumerated end_turn, max_tokens, stop_sequence, tool_use; L5 added pause_turn (server-tool mid-multi-iteration); this lesson adds model_context_window_exceeded and the “compaction” value. Per-request error handling from L2 extends to the content: null compaction-summary edge case introduced here.
Lesson 3 (model selection): cache-write multipliers stack onto the per-model input price from L3; the 1.25x and 2.0x and 0.1x apply to whatever per-MTok rate the chosen model has. The minimum-cacheable-size floor varies by model (4,096 on Opus 4.7 + Haiku 4.5; 1,024 on Sonnet 4.6 + Opus 4.8).
Lesson 4 (custom client tools): cache_control on tool definitions in the tools array applies to L4’s custom tools, often the second-highest-leverage breakpoint after the system prompt.
Lesson 5 (server tools + Anthropic-schema + tool_search): cache breakpoints apply to L5’s tool entries too; defer_loading keeps tool definitions out of the prefix when unused, which is complementary to caching.
Lesson 6 (MCP connector): the mcp_toolset entry’s cache_control field is the seam where MCP tool definitions enter the cached prefix.
Lesson 8 onward (agent loop): everything cached and curated here keeps the loop affordable across many iterations; tool result clearing becomes especially load-bearing in tool-heavy agent loops.
Lesson 12 (shipping): cache_read_input_tokens / total and usage.iterations are the production telemetry signals for measuring whether the cost-and-staleness posture is actually working.

References: Prompt caching and context management

Source material

Read this next

Going deeper

Adjacent tracks (the natural next destinations)

Where this connects inside the track