# Discipline Pulls from 1 Cost monitoring L2 + L7 usage fields; Usage and Cost Admin API 2 Latency budgets L2 streaming; L3 effort; L7 caching; L9.2 routing; L11 Subagent parallelization; L2 batches 3 Eval-set discipline L9 pattern 5; T21 L7 LLMOps 4 Rollout L2 request_id ; feature flags + canary + A/B + rollback 5 Lifecycle handling L3 date-pinning; Anthropic deprecation policy + Console Export
Field Source request_id L2 (required for Anthropic Support) model Request echo stop_reason L1 + L2 + L5 + L7 + L8 usage.input_tokens L1 usage.output_tokens L1 usage.cache_creation_input_tokens L7 usage.cache_read_input_tokens L7 usage.iterations (when compaction enabled)L7 Request latency Application
Pair with user_id , session_id , feature_flag_state .
Identity: total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens .
Headline: cache_read_input_tokens / total_input_tokens . On a well-cached production stack with stable prefixes, often above 90 percent. If below 70 percent, fix cache placement BEFORE any further optimization.
Verbatim: The Usage & Cost Admin API provides programmatic and granular access to historical API usage and cost data for your organization .
Endpoint Returns Granularity GET /v1/organizations/usage_report/messages Token counts 1m / 1h / 1d GET /v1/organizations/cost_report USD (cents, decimal strings) 1d only
Admin API key required (sk-ant-admin… ). Distinct from regular API keys. Only org admins can provision via Console.
curl " https://api.anthropic.com/v1/organizations/usage_report/messages? \
starting_at=2025-01-08T00:00:00Z& \
ending_at=2025-01-15T00:00:00Z& \
--header " anthropic-version: 2023-06-01 " \
--header " x-api-key: $ANTHROPIC_ADMIN_KEY "
bucket_width Default Max Use case 1m 60 1440 Real-time monitoring 1h 24 168 Daily patterns 1d 7 31 Weekly / monthly reports
Dimension Usage Cost model Yes Via description workspace_id Yes Yes api_key_id Yes No service_tier Yes (filter) No (Priority Tier excluded) context_window Yes Via description inference_geo Yes Via description speed (beta)Yes (with fast-mode-2026-02-01 header) No
Data freshness: typically appears within 5 minutes of API request completion .
Polling cadence: once per minute sustained. Cache results for dashboards.
Priority Tier costs use a different billing model and never appear in the Cost endpoint. Track Priority Tier usage via Usage endpoint (service_tier = priority ).
NOT available on Claude Platform on AWS (use Console). NOT available on individual accounts (set up an organization first).
Pagination via has_more + next_page .
Lever Lesson Savings Model selection (Sonnet default; Haiku for volume) L3 Up to 5x cheaper per token Effort dial (medium on Sonnet) L3 Per-call token spend Prompt caching (cache_control on stable prefixes) L7 ~90 percent off hits (0.1x multiplier) Batches API (bulk non-interactive) L2 50 percent off, async, under 1 hour Per-subagent model L11 Per-step instance of the mix-and-match
Surface Budget shape Primary lever Chat UI TTFT (200ms to 2s) Streaming (L2) Long generation Full-response (per length) Streaming + Effort low (L3) Agent loop Steps × per-step latency max_iterations + per-subagent parallelizationBulk / non-interactive Throughput (under 1 hour) Batches API (L2) Long-running automation Wall-clock (minutes to hours) Managed Agents (L11)
Never tune latency without a budget. Treat budget breaches as incidents.
Step What Build 5-10 prompts (start) → 50+ as discipline matures Score Per-prompt rubric (LLM judge OR human-labeled) Run on every change Model swap, prompt edit, tool add, cache placement Promote only on pass Eval pass rate is the deploy gate
Cross-ref: T21 L7 LLMOps (provider-agnostic playbook). L9 pattern 5 (evaluator-optimizer) is the loop.
Move What Feature flags Model + system prompt + tool list + cache_control placement + MCP servers + Subagent configs behind flags. A swap is a config change, not a deploy. Canary 1 percent → 10 percent → 100 percent. Each stage gated by eval-set + dashboards + alerts staying green for 24-72h. A/B vs current production Run both for a fixed window; score live responses; pick cheapest that passes. Rollback (rehearsed) Flag toggles back in one motion. Drill it.
request_id on every call (L2): the handle for incident response (Anthropic Support + your own logs).
State Meaning (verbatim) Active Fully supported and recommended for use Legacy Will no longer receive updates and may be deprecated in the future Deprecated Still functional but no longer recommended; replacement and retirement date assigned Retired No longer available; requests fail
Notice window (verbatim): Anthropic notifies customers with active deployments for models with upcoming retirements, providing at least 60 days notice before model retirement for publicly released models .
Audit path: Console Usage page → Export → CSV by API key and model → migrate before retirement date.
Generation Form Use in production? 4.6 and later (4.8 / 4.7 / 4.6 / Sonnet 4.6) Dateless YES (already pinned) Pre-4.6 (Haiku 4.5 / Sonnet 4.5 / Opus 4.5 / etc.) Date-suffixed YES Pre-4.6 dateless alias Convenience pointer NO in production (use date-suffixed)
temperature / top_p / top_k deprecated on Opus 4.7+ (including Opus 4.8). Return 400 on non-default values. Omit; use prompting to guide behavior.
Failure Recognize by Fix No request_id in logs Anthropic Support cannot find the failed call Log response._request_id on EVERY response from day one Cache placement wrong cache_read_input_tokens always 0 on stable workloadVerify cache_control placement; check minimum-cacheable-size (L7); check workspace isolation Latency tuning without a budget ”Feels slow” or “feels fine” arguments Define per-surface TTFT + full-response budgets; treat breaches as incidents Shipping without eval Silent quality regression discovered weeks later Build 5-10 prompt eval set BEFORE the next change No feature flag Model swap requires a deploy Put model + prompt + tool list behind flags; a swap is a config change Rollback never rehearsed Incident response includes “how do we rollback?” Drill rollback at low-traffic time; flag toggles back in one motion Missing deprecation notice Production fails when retirement lands Subscribe to Anthropic notifications; Console Usage Export quarterly; date-pin pre-4.6 models temperature on Opus 4.7+400 errors Omit; use prompting to guide behavior Admin key in regular code path Security gap (Admin keys can read org-wide usage) Admin keys live in a separate vault, used only by the dashboard/finance service
Anthropic public Claude docs: Usage and Cost API , Admin API , Model deprecations , Pricing , Rate limits , at https://platform.claude.com/docs/. See references for the full anchor list.
Migration guide at https://platform.claude.com/docs/en/about-claude/models/migration-guide.