Elasticsearch: best-in-class for logs, now best-in-class for metrics

Over the past few months, Elastic has shipped a columnar storage engine in Elasticsearch purpose-built for time series data, native Prometheus ingest and storage, PromQL support and we’ve delivered a new metrics exploration experience, pre-built infrastructure dashboards, agentic investigation, and a migration path from Datadog and Grafana. Capabilities now include:

Elasticsearch is a Prometheus-compatible metrics backend — Prometheus Remote Write and PromQL now works natively in Kibana, no translation layer required.
Metrics land in Elasticsearch's columnar TSDS architecture storing data up to 2.5× more efficient than Prometheus and 2× more efficient than ClickHouse.
ES|QL time series queries run up to 30× faster than Prometheus on gauge averages and counter rates, including high-cardinality workloads.
Elastic costs approximately 50% less than Datadog, with no custom metric classification and no cardinality-based billing.
Grafana can query Elasticsearch directly through the native Prometheus API, keeping your visualization layer while replacing the backend.
Kubernetes and AWS monitoring ship with pre-built dashboards, alert templates, ML anomaly jobs, and agentic investigation content ready at ingest. Additionally skills and MCP apps are available.
Unified backend for Metrics, logs, and traces enabling agentic investigations without stitching context across tools.
Metrics exploration in Discover lets anyone start querying and analyzing metrics immediately, no query language expertise required.
Custom dashboarding is fast and flexible — dashboards-as-code, AI-assisted dashboard creation, variable controls, and collapsible panels mean less time building and more time investigating.
Migration tooling to help easily migrate dashboards and alerting rules / monitors from Datadog and Grafana.

Elasticsearch metrics now competes on every dimension that matters to SREs: you can afford to keep every metric at full resolution, query it up to 30x faster than Prometheus, pay 50% less than Datadog, migrate dashboards and alerting rules from Grafana or Datadog easily, and go from alert to root cause without stitching context across disconnected tools. The rest of this post walks through each of these in detail.

Elasticsearch metrics performance: 30× faster than Prometheus and Mimir

Datadog and Prometheus force the same tradeoff: drop high-cardinality data or watch costs spiral. SREs managing Kubernetes, AWS, or any high-cardinality infrastructure know the specific shape of this problem. The Kubernetes labels, ephemeral pod data, and fine-grained OTel dimensions that matter most during an incident are the first to go when budgets tighten.

Elastic rebuilt the time series data store and ES|QL compute engine into a fully columnar metrics engine. Adding a new Kubernetes label, a new AWS instance tag, or a new application dimension doesn't strain the system; it adds far less cost than systems that index every label. OTel, Prometheus, and application-defined metrics all land in the same columnar backend at full resolution, with logs, traces, and metrics in a single store. No data dropped, no retention shortened.

Elasticsearch stores metrics up to 2.5× more efficiently than Prometheus (results may differ due to factors like compaction), and 2× more efficiently than ClickHouse. Query performance via ES|QL runs up to 30× faster than Prometheus on gauge averages and counter rates, including high-cardinality workloads where competitors stall. The architecture post covers how TSDS is organized and why the columnar layout produces these results.


Dimension	vs. Prometheus	vs. Mimir	vs. ClickHouse
Query performance (ES\|QL)	Up to 30× faster	Up to 30× faster	Up to 8× faster
Storage efficiency	Up to 2.5× better	On par	2× better

The key architectural difference is that Elasticsearch metrics does not maintain a per-series in-memory state that scales with cardinality, so adding thousands of new Kubernetes pod labels or OTel dimensions doesn't drive up memory pressure.

OTel, Prometheus-native, and application-defined metrics are all stored the same way at full resolution, queried fast, at half the cost of Datadog.

Elastic Observability metrics pricing without the Datadog custom metric penalties

Observability cost is the #1 reason teams switch platforms. For Datadog customers, the pain comes down to one pricing mechanic: custom metrics. Any user-defined value outside of Datadog's built-in integrations is classified as a custom metric and billed at a premium rate. That includes the high-cardinality data that Kubernetes, OpenTelemetry, and cloud-native workloads generate by default. The more granular your instrumentation, the faster the bill compounds. Teams running modern infrastructure hit this ceiling quickly, and the response is predictable: drop data, shorten retention, lose the context that matters most when an incident happens.

Elasticsearch metrics removes that classification. Every metric is priced the same, with no per-metric penalties, no cardinality-based billing, and no forced rollups. You keep every metric at full resolution without a surprise invoice at the end of the month. And because Elastic is 50% the cost of Datadog, the conversation with finance changes: not what data you had to drop to stay on budget, but what you found because you kept everything. It's also why the AI investigation works. Unlike Grafana's fragmented LGTM stack, the context is already unified when the alert fires, not assembled by hand across disconnected tools.

Native Prometheus and PromQL support in Elasticsearch

Most SRE teams aren't running a clean, single-format telemetry pipeline. Prometheus is deeply embedded in applications, services, platforms, and automations. Migrating metrics backends historically meant rewriting queries, rebuilding dashboards, and retraining engineers — enough friction that teams stay on platforms they've outgrown rather than go through it.

Elasticsearch metrics has removed most of that friction. Prometheus metrics arrive via Prometheus Remote Write and land in the same columnar store without semantic changes, preserving full metric fidelity end to end. Point them at Elasticsearch instead of Mimir and the data flows. No translation layer, no changes to existing scrape configs.

PromQL now works natively in Kibana, so engineers who live in PromQL don't have to change how they work. Existing PromQL queries, dashboards, and alert rules migrate into Kibana directly.

PromQL queries work unchanged on Elasticsearch

If your team already writes PromQL, nothing needs to change. These queries run as-is against Elasticsearch as your backend — copy, paste, and go.

CPU usage rate (container-level) The per-second CPU rate across containers, grouped by pod. Useful for spotting which pods are burning CPU during an incident.

PROMQL sum by (pod) (rate(container_cpu_usage_seconds_total[5m]))

Memory working set (container-level) Current memory in active use per container — the number that matters for OOM risk, not total allocated memory.

PROMQL sum by (container) (avg_over_time(container_memory_working_set_bytes[5m]))

HTTP request rate (application-level) Per-second request throughput grouped by instance. A standard first signal when investigating latency or error spikes.

PROMQL sum by (instance) (rate(http_requests_total[5m]))

All three follow standard PromQL syntax. If you use Elasticsearch as your backend, they run without modification. For the full syntax reference and what's covered, see the PromQL support documentation.

The native Prometheus API makes Elasticsearch a fully Prometheus-compatible backend. Any Prometheus-compatible frontend (Grafana included) can query Elasticsearch directly, so teams that want to keep Grafana as their visualization layer while consolidating onto Elasticsearch can do exactly that without modifying existing dashboards or alert rules.

When SREs need to go deeper than PromQL allows, ES|QL works across metrics, logs, and traces in a single interface. The TS command handles the time series specifics: counter rates, gauge averages, window functions, and multilevel aggregations across high-cardinality dimensions. The same query that pulls a CPU counter rate can join against logs from the same host and surface the deployment event that preceded the spike. No tool switching, no new query language. The query language, the dashboards, the alert rules, the visualization layer — all of it carries over. The only thing that changes is that Elasticsearch is the single backend powering everything.

Elastic Observability: out-of-the-box dashboards, alerts, and infrastructure content

Most Observability vendors require you to build everything from scratch. Elastic Observability has reduced this need across three areas:

Metrics exploration in Discover. The new Elasticsearch metrics exploration experience lets SREs explore metrics in the same interface used for logs — no tab switching, no duplicate queries. Connect an OTel pipeline or Prometheus scrape config, open Streams, and every metric in the data stream renders as a time series chart immediately. No dashboard to build, no query to write. This is where teams can validate data, spot patterns, and start building alerts and SLOs from a live view of what's flowing and cross correlate with logs, traces and other indexed data in Elasticsearch.

Dashboards. Kibana dashboards have gained collapsible panels with lazy loading, so panels that aren't immediately visible don't generate queries until they're needed and ES|QL control variables that let SREs manipulate visualizations through dropdowns without writing new queries. Dashboards-as-code is also shipping, enabling version-controlled dashboard definitions that can be templated, shared, and deployed programmatically across environments.

Out-of-the-box infrastructure content. Elastic is shipping with two new infrastructure OOTB experiences:

The new Kubernetes integration ships with hierarchical dashboards, alert rule templates, ML anomaly detection jobs, and the context and prompts needed for AI-assisted root cause analysis — all pre-configured and ready the moment data starts flowing.

AWS infrastructure monitoring follows the same pattern: OOTB content for core AWS services activates at ingest, so teams aren't starting from scratch every time a new service or account comes online. The same approach extends to databases and other core infrastructure — the platform arrives opinionated, not blank.

Agentic investigations across your infrastructure with Elastic Observability

Elasticsearch correlates metrics, logs, and traces in a single backend, so the investigation context is assembled before an engineer is paged.

The hard part is 2am. An RDS instance hitting connection limits, starving services upstream. An Auto Scaling group failing health checks for a reason buried in application logs. A pod restart cascading across a namespace.

In a Grafana LGTM stack, you're opening three tabs before you have enough context to form a hypothesis.

In Datadog, the context is unified but the AI is a black box: no BYO-LLM, no data residency options.

In Elastic, metrics, logs, and traces share a single backend and a common schema, so the investigation context is already assembled when the alert fires — no manual correlation across tools, no context lost in translation between query languages. ML anomaly detection runs automatically against infrastructure metrics (Kubernetes, AWS, databases), so the investigation starts from a scored anomaly with context about what's typical, what changed, and how severe the deviation is, not just a raw threshold breach.

When an alert fires, Elastic's investigation workflow correlates signals, assembles root cause context, and surfaces recommended next steps before anyone is paged. The agentic Kubernetes observability post walks through a complete example end to end. The EKS troubleshooting walkthrough shows how Agent Builder and MCP work together for a full root cause loop across EC2, EKS, and related AWS services.

In addition to investigating issues in Elastic Observability, you can use Claude, Cursor, VS Code, or your favorite tool to analyze issues using MCP Apps and agent skills from Elastic. The Observability MCP App extends the analysis to wherever your team already works. If your team investigates in Claude, Cursor, or VS Code, the same investigation capabilities (infrastructure health rollup, service dependency graph, anomaly detail, blast radius analysis) render as interactive views directly in the conversation. Neither Grafana nor Datadog offer this.

Observability MCP App — Connects Claude, Cursor, VS Code, or any MCP-compatible tool directly to your Elasticsearch data, so infrastructure health, service dependencies, and anomaly context surface as interactive views inside the conversation without leaving your tool of choice. See how it works with Kubernetes.

Agent Skills — Pre-built skills for Kubernetes, AWS, and other core infrastructure let any agent — in Elastic or your own — run structured investigations against your observability data without custom prompt engineering. Drop them into Claude, Cursor, or your own agent pipeline and they work out of the box. Explore the observability skills or browse the skills library on GitHub.

Migrating from Datadog or Grafana to Elastic Observability

The most common reason SRE teams don't switch observability platforms is migration. Moving years of alert rules, hundreds of dashboards, and runbook-embedded PromQL queries is a daunting operational task, and the cost of maintaining parallel stacks while doing it compounds every day.

The Observability Migration Platform handles the translation automatically. Point the CLI or Claude/Cursor (with Elastic’s agent skills) at your Datadog org or Grafana instance and it converts supported dashboards, alert rules, and PromQL queries into Kibana-native outputs. The tool allows you to see what was fully migrated, what needed tweaks and what is needed from you to migrate everything. You move what you've already built.

On the ingest side, Prometheus Remote Write means the pipeline requires no changes. Scrape configs point to Elasticsearch instead of another Prometheus-compatible backend and the data lands in the same columnar store. Workflows, queries, and alert configurations carry over without change. For teams that want to keep Grafana as a visualization layer during or after migration, the native Prometheus API and PromQL support in Kibana mean the transition can be phased rather than cut over all at once.

Elasticsearch as a backend for Grafana

For teams not ready to leave Grafana, replacing the backend is a migration path in its own right, and there are two ways to do it depending on your workflow.

If your team runs Prometheus today, the lowest-friction path is Grafana's Prometheus data source. Elasticsearch now exposes a native Prometheus-compatible API, so you can point Grafana's existing Prometheus plugin directly at Elasticsearch. No sidecars, no adapters, no pipeline changes required. Existing PromQL dashboards, alert rules, and variable dropdowns work without modification, including Grafana's Metrics Drilldown explorer. Add Elasticsearch as a remote_write target in your Prometheus config and swap the data source URL. That's the full migration for most teams. See the end-to-end setup guide.

For teams that want to go further and query logs, metrics, and traces together from a single Grafana query editor, the official Grafana Elasticsearch plugin now ships with ES|QL support. This unlocks cross-signal correlation directly in Grafana, with Elasticsearch handling all three data types in a unified columnar backend. See how to set it up.

Either way, keep Grafana, replace Mimir and Loki, and gain the full benefit of Elasticsearch's columnar storage and query performance underneath. Years of operational work, preserved. The migration that teams have been putting off becomes a backend swap.

What's GA and what's in tech preview

Capability	Status
Columnar metrics engine (TSDS)	GA
ES\|QL time series support	GA
PromQL support in Kibana	GA
Prometheus Remote Write ingest	GA
Kubernetes infrastructure OOTB experience	GA
AWS infrastructure OOTB experience	Tech Preview
Observability MCP App	Tech Preview
Agent skills	Tech Preview
Observability Migration Platform	Tech Preview

The individual posts linked throughout cover GA versus preview specifics and known limitations.

All of this (the columnar metrics engine, native PromQL, agentic investigations, and migration tooling) runs across Elastic's three deployment modes: serverless, Elastic Cloud, and self-managed. Datadog has no on-prem option; Grafana Cloud limits its highest-value features to hosted deployments. With Elastic, you choose where your data lives.

Elastic Observability: lower cost without dropping data

Modern cloud infrastructure broke the observability model built around separate tools for separate signals. The cost is real: duplicate tooling bills, manual correlation during incidents, and data dropped just to stay on budget.

A single backend that stores every signal efficiently means you keep what you need without the bill that usually comes with it. That's a different kind of conversation to have with finance: not "we had to drop data to stay on budget," but "here's what we found." The AI gets the full picture because there's only one picture, and the platform arrives with enough pre-built content to be useful on day one, not after weeks of dashboard toil.

That's possible because of how Elasticsearch is built differently from the platforms you're likely replacing:

Columnar metrics storage stores stores metrics data highly efficiently in TSDS index mode.
Native Prometheus compatibility means existing scrape configs, PromQL queries, and dashboards work without rewriting.
Unified metrics, logs, and traces in a single backend means investigation context is assembled at query time, not manually across tabs.
Search and analytics in the same engine — an inverted index for logs, a columnar index for metrics, queried together with ES|QL.
Agentic investigations that correlate signals, surface anomalies, and suggest remediation before anyone is paged.
Serverless, Elastic Cloud, or self-managed — you choose where your data lives, which Datadog cannot offer.

The cost conversation with finance becomes about what you found, not what you spent.

Get started

Frequently asked questions

Is Elasticsearch now a production-ready metrics platform?

Yes. As of June 2026, Elasticsearch ships a rebuilt columnar storage engine purpose-built for time series data, native Prometheus Remote Write ingest, PromQL support in Kibana, ES|QL time series querying, and out-of-the-box infrastructure dashboards for Kubernetes and AWS. The columnar metrics engine, ES|QL time series support, PromQL, and Prometheus ingest are all generally available in Elastic Serverless and soon GA in Elastic Cloud Hosted.

How does Elasticsearch compare to Datadog for metrics cost?

In comparable metrics workloads, Elastic Observability Serverless costs significantly less than Datadog — in illustrative examples based on published list pricing, more than 50% less, and often closer to two-thirds less. The gap is structural: Datadog bills primarily per host, then adds charges for custom metrics and containers as instrumentation grows. The cost difference is largest for exactly the workloads where Datadog bills most: high-cardinality, densely instrumented environments like Kubernetes and OTel.

How does Elasticsearch metrics performance compare to Prometheus and Grafana Mimir?

ES|QL queries on Elasticsearch run up to 30× faster than Prometheus and Mimir on gauge averages and counter rates, including high-cardinality workloads. Elasticsearch stores OTel metrics at 3.75 bytes per data point; up to 2.5× more efficiently than Prometheus and 2× more efficiently than ClickHouse.

Can teams migrate from Datadog or Grafana to Elasticsearch without rebuilding everything?

Yes. Elastic's Observability Migration Platform converts Datadog and Grafana dashboards, alert rules, and migrates PromQL queries into Kibana as-is. Teams can also keep Grafana as a visualization layer while replacing the backend with Elasticsearch, using the native Prometheus API and PromQL support in Kibana.

What makes Elasticsearch different from Grafana for metrics observability?

Elasticsearch stores metrics, logs, and traces in a single unified backend with one query language (ES|QL), while Grafana's LGTM stack splits metrics (Mimir/Prometheus) and logs (Loki) across separate backends requiring separate query languages. Elasticsearch also ships agentic investigation capabilities, which includes AI Agent, Workflows, MCP App, and Agent skills, a more comprehensive set of capabilities than Grafana.

Does Elasticsearch support Prometheus and PromQL natively?

Yes, in two distinct ways. First, Elasticsearch accepts Prometheus metrics via Prometheus Remote Write and exposes a native Prometheus-compatible API, so it can serve as a backend for any Prometheus-compatible frontend, including Grafana. Second, Kibana supports PromQL natively, meaning existing queries, dashboards, and alert rules run directly in Kibana without a translation layer or modification.

What infrastructure monitoring content ships out of the box with Elastic Observability?

Elastic ships pre-built dashboards, alert templates, and ML anomaly detection jobs across hundreds of infrastructure integrations covering hosts, containers, cloud services, databases, network devices, and more. For Kubernetes and AWS specifically, the platform also includes agentic investigation content such as agent skills and an Observability MCP App that lets teams run investigations directly from Claude, Cursor, or VS Code. All of this is available at ingest with no configuration required.

Elasticsearch: best- in- class for logs, now best- in- class for metrics