Anthropic API monitoring: usage, cost and rate limits in Elastic

Track Anthropic API spend, token usage and rate limit headroom across every workspace and model without touching a line of application code. The new Elastic Anthropic Metrics integration polls Anthropic's Admin API on a schedule and routes org-wide usage, cost and rate limit data into Elasticsearch with pre-built Kibana dashboards ready to use within minutes. If your team has ever debugged a 429 in production or reconciled a Claude invoice after the fact, this is the fix.

Anthropic API monitoring in Elastic

We're pleased to announce the new Elastic integration for Anthropic metrics, now available in Elastic Observability. The integration polls Anthropic's Admin APIs to ingest organization-wide telemetry from the Claude API platform — token usage, cost, and rate limit configuration — into Elasticsearch, with pre-built Kibana dashboards and out-of-the-box alerts. With a single Admin API key, platform teams get a unified view of how their organization is consuming Claude across every workspace, model, and service tier, alongside the rest of the telemetry they already monitor in Elastic.

Two Anthropic products, two monitoring stories: what this integration covers

Anthropic sells Claude through two distinct products. Claude apps (Claude.ai, Claude Code, Cowork, Claude Design) are used by employees across an organization, and the question their owners ask is "who's using Claude, and for what?" The Claude API platform powers the applications a company builds on Anthropic's models. The people accountable for it are the developers shipping those services and the budget holders paying the bill, asking "how much is our software consuming, and are we within our cost and capacity envelopes?"

This integration focuses on the second story — the Claude API platform — pulling data from Anthropic's Admin API for org-wide usage, cost, and rate limits across every workspace and model.

What do teams need to monitor when running on the Claude API?

Three operational needs come up over and over for teams running production workloads on the Claude API.

Cost attribution

A single Anthropic organization usually serves many internal teams and products, each with its own workspace, its own mix of models (Opus for the hardest reasoning tasks, Sonnet as the everyday workhorse, Haiku for simpler, high-volume tasks), and its own service tier choices (standard, batch, priority). When the monthly invoice arrives, the platform team needs to know which workspace, model, and tier each dollar went to, so they can split the bill back to the right team and decide which workloads should move to a cheaper model or to the batch tier.

Rate limit headroom

Anthropic enforces per-model rate limits on requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM). The first time a team learns they're close to the ceiling is usually when production traffic starts being throttled. Surfacing configured limits alongside actual consumption lets platform teams see headroom in advance, plan capacity, and request limit increases before users feel the impact.

Granularity for every audience

The same data needs to serve different cadences. SREs want one-minute resolution to catch spikes and trigger alerts. Platform engineers want hourly views for capacity planning. Finance wants daily totals that reconcile cleanly to the Anthropic invoice. A single integration that exposes all three granularities removes the need to maintain separate pipelines for each audience.

How Elastic polls the Anthropic Admin API for usage, cost and rate limit data

The integration runs on Elastic Agent and uses the CEL input to poll Anthropic's Admin API on a schedule. Authentication is a single Admin API key, stored as an encrypted Fleet secret and redacted from agent logs. From a single configuration, the integration ingests three data streams into Elasticsearch:

Usage (metrics-anthropic_metrics.usage-*): token consumption per time bucket (1 minute, 1 hour, or 1 day), broken down by model, workspace, service tier, and inference geography.
Cost (metrics-anthropic_metrics.cost-*): daily cost in cents (converted to USD in dashboards), broken down by workspace, model, service tier, cost type, token type, context window, and inference geography.
Rate limits (metrics-anthropic_metrics.rate_limit-*): a snapshot of configured RPM, ITPM (cache-aware), and OTPM limits per model group, refreshed on each poll.

Ingest pipelines handle the parsing and field mapping so the data lands queryable, dashboard-ready, and aligned with the rest of Elastic Observability. Because the data is pulled from the Admin API at the organization level, you get this visibility without any application-side instrumentation or SDK changes.

What you need to set up Anthropic API monitoring in Elastic

To get started with the Elastic Anthropic Metrics integration, you will need:

An Elastic deployment:
- Elastic Cloud Hosted (ECH): version 9.4.0 or higher
- Elastic Serverless: no version requirement, works out of the box
An Anthropic organization on a Team or Enterprise plan with Admin API access (individual Free, Pro, and Max accounts cannot create Admin API keys and are not supported)
An Admin API key (sk-ant-admin...) provisioned by an organization admin from the Claude Console under Settings → Admin keys
Elastic Agent installed on a host with outbound HTTPS access to api.anthropic.com

How to set up the Anthropic Metrics integration

Generate an Admin API key from the Claude Console (it starts with sk-ant-admin...).
Search for Anthropic Metrics in Kibana under Management - Integrations and click Add.
Pick your deployment mode: agentless for a zero-install experience, or Elastic Agent on your own host.
Tune the defaults if you like — each data stream has sensible defaults, but you can adjust them to match your needs:
- Usage: Polls every 5 minutes with 1-hour time buckets, grouped by model, workspace, service tier, and inference geography. Switch the bucket width to 1m for real-time alerting or 1d for finance-grade daily rollups. You can also add grouping dimensions like api_key_id, context_window, or speed.
- Cost: Polls every 1 hour. The Anthropic API returns daily cost buckets, so polling more frequently adds no new data. Cost is grouped by workspace and description, which includes model, service tier, cost type, token type, context window, and inference geography.
- Rate Limits: Polls every 15 minutes. This is a snapshot API — each poll returns the full current set of configured RPM, ITPM, and OTPM limits per model group.
Open the assets: within minutes, usage, cost, and rate limit data starts flowing and the pre-built dashboards and alerts are ready to use.

For the full configuration reference, see the Anthropic Metrics integration documentation.

What the Anthropic dashboards show: usage, cost and rate limit views

The integration ships with pre-built Kibana dashboards that give you an immediate, queryable view of your organization's Claude API consumption. An executive overview pulls the headline numbers (total spend, total tokens, active workspaces, top models) into one place for a quick read on the state of your Anthropic usage.

From the overview, you can drill into the views that answer the three operational needs introduced earlier.

Token usage by model, workspace, and service tier

The usage dashboard breaks down token consumption (uncached input, cached input, cache-creation, and output) by model, workspace, and service tier (standard, batch, priority). This is the view that tells you where your token budget is actually going, which workloads are getting the most out of prompt caching, and which teams or models are driving the bulk of your consumption. Filter by workspace or model to scope the view to a single team or product.

Cost reporting and invoice reconciliation

The cost and billing dashboard reports daily spend in USD, broken down by workspace, model, cost type, token type, context window, and inference geography. An invoice reconciliation table maps spend back to the line items on your Anthropic bill, so finance and engineering can agree on the numbers without spreadsheet gymnastics. Inference geography views support data residency tracking for teams that need to know where their inference is running.

Rate limit headroom: RPM, ITPM, OTPM

The rate limits dashboard surfaces your configured ceilings (requests per minute, input tokens per minute, output tokens per minute) for each model group, alongside actual consumption pulled from the usage stream. The headroom view tells you how close each model is to its limit, so platform teams can plan capacity and request increases before traffic starts getting throttled.

Pre-built alert rules for Anthropic API cost and usage

The integration ships with six pre-built alert rule templates covering cost, usage efficiency and routing.

Cost and budget alerts

Cost Anomaly: Fires when daily spend exceeds a configurable threshold, catching runaway workloads before they accumulate.
Monthly Budget Spend Limit: Tracks cumulative spend for the current calendar month and alerts when it crosses your budget ceiling.
Per-Workspace Daily Cost Spike: Surfaces individual workspaces whose daily spend exceeds a threshold, so a single team's spike doesn't hide under the org-wide total.

Usage and efficiency alerts

Token Consumption Spike: Alerts when any model's hourly token count exceeds a threshold, grouped by model so you can pinpoint the source.
Cache Hit Rate Drop: Fires when the input token cache hit ratio drops below 30%, signaling prompt changes or misconfigurations that increase cost and latency.
Single Model Dominance: Alerts when one model accounts for more than 90% of total token consumption, which may indicate a routing misconfiguration.

All thresholds are configurable directly in the ES|QL WHERE clause when you instantiate the template in Kibana.

Granularity for specific use cases

The same data powers different cadences. One-minute usage buckets feed real-time alerts when a workspace spikes or approaches its rate limit. Hourly views support operational monitoring and capacity planning. Daily aggregates roll up cleanly for finance reporting and reconcile with the Anthropic invoice. Out-of-the-box alerts ship for usage and spend thresholds, so you don't have to build them from scratch.

Get started with Anthropic API monitoring in Elastic

The Elastic Anthropic Metrics integration is available today in Elastic Cloud (both Elastic Cloud Hosted and Elastic Serverless). To get started, sign up for a free Elastic Cloud trial, provision an Admin API key in the Claude Console, and add the Anthropic Metrics integration from Kibana under Management → Integrations. Within minutes you'll have token usage, cost, and rate limit data flowing into Elasticsearch, with the pre-built dashboards and out-of-the-box alerts ready to use.

Stop finding out about your Claude bill on invoice day: Anthropic API monitoring is now in Elastic