Skip to content

Cheatsheet: Shipping a Claude application

#DisciplinePulls from
1Cost monitoringL2 + L7 usage fields; Usage and Cost Admin API
2Latency budgetsL2 streaming; L3 effort; L7 caching; L9.2 routing; L11 Subagent parallelization; L2 batches
3Eval-set disciplineL9 pattern 5; T21 L7 LLMOps
4RolloutL2 request_id; feature flags + canary + A/B + rollback
5Lifecycle handlingL3 date-pinning; Anthropic deprecation policy + Console Export

Per-call logging shape (every Claude response)

Section titled “Per-call logging shape (every Claude response)”
FieldSource
request_idL2 (required for Anthropic Support)
modelRequest echo
stop_reasonL1 + L2 + L5 + L7 + L8
usage.input_tokensL1
usage.output_tokensL1
usage.cache_creation_input_tokensL7
usage.cache_read_input_tokensL7
usage.iterations (when compaction enabled)L7
Request latencyApplication

Pair with user_id, session_id, feature_flag_state.

Cache-hit ratio (the headline cost metric)

Section titled “Cache-hit ratio (the headline cost metric)”

Identity: total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens.

Headline: cache_read_input_tokens / total_input_tokens. On a well-cached production stack with stable prefixes, often above 90 percent. If below 70 percent, fix cache placement BEFORE any further optimization.

Verbatim: The Usage & Cost Admin API provides programmatic and granular access to historical API usage and cost data for your organization.

EndpointReturnsGranularity
GET /v1/organizations/usage_report/messagesToken counts1m / 1h / 1d
GET /v1/organizations/cost_reportUSD (cents, decimal strings)1d only

Admin API key required (sk-ant-admin…). Distinct from regular API keys. Only org admins can provision via Console.

Terminal window
curl "https://api.anthropic.com/v1/organizations/usage_report/messages?\
starting_at=2025-01-08T00:00:00Z&\
ending_at=2025-01-15T00:00:00Z&\
group_by[]=model&\
bucket_width=1d" \
--header "anthropic-version: 2023-06-01" \
--header "x-api-key: $ANTHROPIC_ADMIN_KEY"
bucket_widthDefaultMaxUse case
1m601440Real-time monitoring
1h24168Daily patterns
1d731Weekly / monthly reports
DimensionUsageCost
modelYesVia description
workspace_idYesYes
api_key_idYesNo
service_tierYes (filter)No (Priority Tier excluded)
context_windowYesVia description
inference_geoYesVia description
speed (beta)Yes (with fast-mode-2026-02-01 header)No
  • Data freshness: typically appears within 5 minutes of API request completion.
  • Polling cadence: once per minute sustained. Cache results for dashboards.
  • Priority Tier costs use a different billing model and never appear in the Cost endpoint. Track Priority Tier usage via Usage endpoint (service_tier = priority).
  • NOT available on Claude Platform on AWS (use Console). NOT available on individual accounts (set up an organization first).
  • Pagination via has_more + next_page.
LeverLessonSavings
Model selection (Sonnet default; Haiku for volume)L3Up to 5x cheaper per token
Effort dial (medium on Sonnet)L3Per-call token spend
Prompt caching (cache_control on stable prefixes)L7~90 percent off hits (0.1x multiplier)
Batches API (bulk non-interactive)L250 percent off, async, under 1 hour
Per-subagent modelL11Per-step instance of the mix-and-match
SurfaceBudget shapePrimary lever
Chat UITTFT (200ms to 2s)Streaming (L2)
Long generationFull-response (per length)Streaming + Effort low (L3)
Agent loopSteps × per-step latencymax_iterations + per-subagent parallelization
Bulk / non-interactiveThroughput (under 1 hour)Batches API (L2)
Long-running automationWall-clock (minutes to hours)Managed Agents (L11)

Never tune latency without a budget. Treat budget breaches as incidents.

StepWhat
Build5-10 prompts (start) → 50+ as discipline matures
ScorePer-prompt rubric (LLM judge OR human-labeled)
Run on every changeModel swap, prompt edit, tool add, cache placement
Promote only on passEval pass rate is the deploy gate

Cross-ref: T21 L7 LLMOps (provider-agnostic playbook). L9 pattern 5 (evaluator-optimizer) is the loop.

MoveWhat
Feature flagsModel + system prompt + tool list + cache_control placement + MCP servers + Subagent configs behind flags. A swap is a config change, not a deploy.
Canary1 percent → 10 percent → 100 percent. Each stage gated by eval-set + dashboards + alerts staying green for 24-72h.
A/B vs current productionRun both for a fixed window; score live responses; pick cheapest that passes.
Rollback (rehearsed)Flag toggles back in one motion. Drill it.

request_id on every call (L2): the handle for incident response (Anthropic Support + your own logs).

StateMeaning (verbatim)
ActiveFully supported and recommended for use
LegacyWill no longer receive updates and may be deprecated in the future
DeprecatedStill functional but no longer recommended; replacement and retirement date assigned
RetiredNo longer available; requests fail

Notice window (verbatim): Anthropic notifies customers with active deployments for models with upcoming retirements, providing at least 60 days notice before model retirement for publicly released models.

Audit path: Console Usage page → Export → CSV by API key and model → migrate before retirement date.

GenerationFormUse in production?
4.6 and later (4.8 / 4.7 / 4.6 / Sonnet 4.6)DatelessYES (already pinned)
Pre-4.6 (Haiku 4.5 / Sonnet 4.5 / Opus 4.5 / etc.)Date-suffixedYES
Pre-4.6 dateless aliasConvenience pointerNO in production (use date-suffixed)

temperature / top_p / top_k deprecated on Opus 4.7+ (including Opus 4.8). Return 400 on non-default values. Omit; use prompting to guide behavior.

  • L1 smallest primitive solid (iterate content array; never index)
  • L2 production patterns (streaming + batches + stop_reason + request_id)
  • L3 model + effort chosen by eval (Sonnet default; effort per workload)
  • L4-L6 tools deliberate (custom + server + MCP; denylist destructive; sandbox computer-use)
  • L7 caching live (cache_control on system + tool stack); cache-hit ratio dashboard
  • L7 context-management posture chosen (compaction for long sessions; tool result clearing for heavy loops)
  • L8 agent loop disciplines (hard max_iterations; explicit stop_reason dispatch; tool inventory as safety surface)
  • L9 pattern by decision tree (simplest pattern that fits)
  • L10 Skills + harness chosen (.claude/skills/ + CLAUDE.md; security audit on third-party Skills)
  • L11 right primitive per workload (self-built loop / Subagents / Managed Agents)
  • L12 cost dashboards live (Usage + Cost Admin API; cache-hit ratio; per-model + per-workspace)
  • L12 latency budget defined + tracked
  • L12 eval set as deploy gate
  • L12 feature flags + canary + rollback (rehearsed)
  • L12 deprecation watch (Console Export reviewed quarterly)
FailureRecognize byFix
No request_id in logsAnthropic Support cannot find the failed callLog response._request_id on EVERY response from day one
Cache placement wrongcache_read_input_tokens always 0 on stable workloadVerify cache_control placement; check minimum-cacheable-size (L7); check workspace isolation
Latency tuning without a budget”Feels slow” or “feels fine” argumentsDefine per-surface TTFT + full-response budgets; treat breaches as incidents
Shipping without evalSilent quality regression discovered weeks laterBuild 5-10 prompt eval set BEFORE the next change
No feature flagModel swap requires a deployPut model + prompt + tool list behind flags; a swap is a config change
Rollback never rehearsedIncident response includes “how do we rollback?”Drill rollback at low-traffic time; flag toggles back in one motion
Missing deprecation noticeProduction fails when retirement landsSubscribe to Anthropic notifications; Console Usage Export quarterly; date-pin pre-4.6 models
temperature on Opus 4.7+400 errorsOmit; use prompting to guide behavior
Admin key in regular code pathSecurity gap (Admin keys can read org-wide usage)Admin keys live in a separate vault, used only by the dashboard/finance service
  • Anthropic public Claude docs: Usage and Cost API, Admin API, Model deprecations, Pricing, Rate limits, at https://platform.claude.com/docs/. See references for the full anchor list.
  • Migration guide at https://platform.claude.com/docs/en/about-claude/models/migration-guide.