Elastic Observability Labs - Prometheus

Query Prometheus Metrics in Elasticsearch with Native PromQL Support

Wed, 15 Apr 2026 00:00:00 GMT

Many teams already rely on PromQL in their day-to-day work. We're making PromQL a first-class experience in Elasticsearch.

The new PROMQL command in ES|QL lets you query time series data in Elasticsearch with PromQL, whether it came from Prometheus Remote Write, OpenTelemetry, or another source.

Metrics, logs, and traces - all in one place, ready to explore in Kibana.

The PROMQL source command

PROMQL is a source command in ES|QL, similar to FROM or TS. It takes standard PromQL parameters and a PromQL expression, executes the query, and returns the results as regular ES|QL columns that you can continue to process with other commands.

Here is the general syntax:

PROMQL [index=] [step=] [start=] [end=]
  [=]()

The parameters mirror the Prometheus HTTP API query parameters (step, start, end), so they should feel familiar if you have used the Prometheus query API before.

A basic range query

This query calculates the per-second rate of HTTP requests over a sliding 5-minute window, grouped by instance:

PROMQL index=metrics-*
  step=1m
  start="2026-04-01T00:00:00Z"
  end="2026-04-01T01:00:00Z"
  sum by (instance) (rate(http_requests_total[5m]))

The result contains three columns:

Column	Type	Description
`sum by (instance) (rate(http_requests_total[5m]))`	`double`	The computed metric value
`step`	`date`	The timestamp for each evaluation step
`instance`	`keyword`	The grouping label from `by (instance)`

When the PromQL expression includes a cross-series aggregation like sum by (instance), each grouping label becomes its own output column. When there is no cross-series aggregation, all labels are returned in a single _timeseries column as a JSON string.

Naming the value column

By default, the value column name is the PromQL expression itself. You can assign a custom name to make it easier to reference in downstream commands:

PROMQL index=metrics-*
  step=1m
  start="2026-04-01T00:00:00Z"
  end="2026-04-01T01:00:00Z"
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| SORT http_rate DESC

This works the same way as naming aggregations in STATS, for example STATS avg_cpu = avg(system.cpu.usage).

Index patterns

The index parameter accepts the same patterns as FROM and TS, including wildcards and comma-separated lists. If omitted, it defaults to *, which queries all indices configured with index.mode: time_series. In production, specifying an explicit index pattern avoids scanning unrelated data.

How it works under the hood

The PROMQL command does not run a separate query engine. Instead, PROMQL commands execute inside the ES|QL compute engine, using the same logic as time-series aggregations through the TS source command.

Consider this PromQL query:

PROMQL index=metrics-*
  step=1m
  start="2026-04-01T00:00:00Z"
  end="2026-04-01T01:00:00Z"
  sum by (host.name) (rate(http_requests_total[5m]))

Internally, the PROMQL command translates this into an equivalent ES|QL query using the TS source:

TS metrics-*
| WHERE TRANGE("2026-04-01T00:00:00Z", "2026-04-01T01:00:00Z")
| STATS SUM(RATE(http_requests_total, 5m)) BY TBUCKET(1m), host.name

Both queries produce the same result. The PROMQL command parses the PromQL syntax, resolves functions to their ES|QL equivalents (rate to RATE, sum to SUM, avg_over_time to AVG_OVER_TIME, and so on), and constructs a logical plan that the ES|QL engine executes.

This translation approach has a practical benefit: PromQL queries automatically benefit from all the optimizations in the ES|QL engine, including segment-level parallelism and time series-aware data access patterns.

There are currently 19 time series functions available, covering rates, deltas, derivatives, and various *_over_time aggregations.

Smart defaults that simplify queries

In Prometheus, a PromQL query requires explicit start, end, and step parameters. In Kibana, those are usually determined by the date picker and panel size. The PROMQL command has three features that make queries adapt automatically.

Auto-step

If you omit the step parameter, the command derives it automatically based on the time range and a target bucket count (default: 100). You can also set the target explicitly with buckets=.

PROMQL index=metrics-*
  start="2026-04-01T00:00:00Z"
  end="2026-04-01T01:00:00Z"
  sum by (instance) (rate(http_requests_total[5m]))

With a 1-hour range and the default target of 100 buckets, the step would be 1m, resulting in 60 buckets. This uses the same date-rounding logic as the ES|QL BUCKET function.

Inferred start and end

Kibana adds a time range filter to every ES|QL request via a Query DSL range filter on @timestamp. The PROMQL command extracts those bounds and uses them as start and end when they are not specified in the query. The command picks up the date picker range from the request context without any additional configuration.

Implicit range selectors

In standard PromQL, functions like rate require a range selector: rate(http_requests_total[5m]). The PROMQL command allows omitting the range selector entirely:

PROMQL sum by (instance) (rate(http_requests_total))

When the range selector is absent, the window is determined automatically as max(step, scrape_interval). The scrape_interval defaults to 1m and can be overridden with the scrape_interval parameter if your data has a different collection interval, for example: PROMQL scrape_interval=15s sum(rate(http_requests_total)).

The result

Combining all three defaults, a fully adaptive query in Kibana looks like this:

PROMQL sum(rate(http_requests_total))

This query responds to the date picker, adjusts the step size to the selected time range, and sizes the range selector window accordingly. No manual tuning needed.

Post-processing with ES|QL

Because PROMQL is an ES|QL source command, its output flows into the rest of the ES|QL pipeline. You can filter, sort, enrich, and transform PromQL results using any ES|QL command.

Filter results

PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| WHERE http_rate > 100

Sort and limit

PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| SORT http_rate DESC
| LIMIT 10

Enrich with a lookup

PROMQL index=metrics-*
  http_rate=(sum by (instance) (rate(http_requests_total[5m])))
| LOOKUP JOIN instance_metadata ON instance

This is something you cannot do in Prometheus. PromQL results are self-contained; there is no way to join them with external data or apply arbitrary post-processing. In Elasticsearch, the PromQL output is just the first stage of a query that can continue with any ES|QL operation.

Current coverage and what's next

In 9.4, the PROMQL command will be available as a tech preview with over 80% query coverage benchmarked against popular Grafana open source dashboards.

The most notable gaps in the current tech preview:

Group modifiers like on(chip) group_left(chip_name) are not yet supported.
Binary set operators (or, and, unless) are not yet available.
Some functions are still missing, including histogram_quantile, predict_linear, and label_join.

These are all planned for upcoming releases. The roadmap includes broader PromQL function and operator coverage, Prometheus-aligned step semantics, and support for native histograms.

Try it

PromQL support is available as a tech preview on Elasticsearch Serverless with no additional configuration. For self-managed clusters, it is available starting with version 9.4.

To try it in Kibana:

Go to Dashboards, create a new panel, and select ES|QL as the query type.
Enter a PROMQL query, for example: PROMQL index=metrics-* sum by (host.name) (rate(http_requests_total)).
The command automatically infers the time range from the Kibana date picker, so no additional parameters are needed.

You can also run PromQL queries in the ES|QL mode of Discover, which shows results in a table and an XY chart. Stay tuned for a full walkthrough of using PromQL in Kibana Dashboards, Discover, and Alerting in a dedicated Kibana blog post.

For the full command reference, including all options and examples, see the PROMQL command documentation.

If you want to try it with a self-managed cluster, check out start-local to get up and running quickly.

If you run into issues or have feedback, open an issue on the Elasticsearch repository.

Ingesting and analyzing Prometheus metrics with Elastic Observability

Mon, 09 Oct 2023 00:00:00 GMT

In the world of monitoring and observability, Prometheus has grown into the de-facto standard for monitoring in cloud-native environments because of its robust data collection mechanism, flexible querying capabilities, and integration with other tools for rich dashboarding and visualization.

Prometheus is primarily built for short-term metric storage, typically retaining data in-memory or on local disk storage, with a focus on real-time monitoring and alerting rather than historical analysis. While it offers valuable insights into current metric values and trends, it may pose economic challenges and fall short of the robust functionalities and capabilities necessary for in-depth historical analysis, long-term trend detection, and forecasting. This is particularly evident in large environments with a substantial number of targets or high data ingestion rates, where metric data accumulates rapidly.

Numerous organizations assess their unique needs and explore avenues to augment their Prometheus monitoring and observability capabilities. One effective approach is integrating Prometheus with Elastic®. In this blog post, we will showcase the integration of Prometheus with Elastic, emphasizing how Elastic elevates metrics monitoring through extensive historical analytics, anomaly detection, and forecasting, all in a cost-effective manner.

Integrate Prometheus with Elastic seamlessly

Organizations that have configured their cloud-native applications to expose metrics in Prometheus format can seamlessly transmit the metrics to Elastic by using Prometheus integration. Elastic enables organizations to monitor their metrics in conjunction with all other data gathered through Elastic's extensive integrations.

Go to Integrations and find the Prometheus integration.

To gather metrics from Prometheus servers, the Elastic Agent is employed, with central management of Elastic agents handled through the Fleet server.

After enrolling the Elastic Agent in the Fleet, users can choose from the following methods to ingest Prometheus metrics into Elastic.

1. Prometheus collectors

The Prometheus collectors connect to the Prometheus server and pull metrics or scrape metrics from a Prometheus exporter.

2. Prometheus queries

The Prometheus queries execute specific Prometheus queries against Prometheus Query API.

3. Prometheus remote-write

The Prometheus remote_write can receive metrics from a Prometheus server that has configured the remote_write setting.

After your Prometheus metrics are ingested, you have the option to visualize your data graphically within the Metrics Explorer and further segment it based on labels, such as hosts, containers, and more.

You can also query your metrics data in Discover and explore the fields of your individual documents within the details panel.

Storing historical metrics with Elastic’s data tiering mechanism

By exporting Prometheus metrics to Elasticsearch, organizations can extend the retention period and gain the ability to analyze metrics historically. Elastic optimizes data storage and access based on the frequency of data usage and the performance requirements of different data sets. The goal is to efficiently manage and store data, ensuring that it remains accessible when needed while keeping storage costs in check.

After ingesting Prometheus metrics data, you have various retention options. You can set the duration for data to reside in the hot tier, which utilizes high IO hardware (SSD) and is more expensive. Alternatively, you can move the Prometheus metrics to the warm tier, employing cost-effective hardware like spinning disks (HDD) while maintaining consistent and efficient search performance. The cold tier mirrors the infrastructure of the warm tier for primary data but utilizes S3 for replica storage. Elastic automatically recovers replica indices from S3 in case of node or disk failure, ensuring search performance comparable to the warm tier while reducing disk cost.

The frozen tier allows direct searching of data stored in S3 or an object store, without the need for rehydration. The purpose is to further reduce storage costs for Prometheus metrics data that is less frequently accessed. By moving historical data into the frozen tier, organizations can optimize their storage infrastructure, ensuring that the recent, critical data remains in higher-performance tiers while less frequently accessed data is stored economically in the frozen tier. This way, organizations can perform historical analysis and trend detection, identify patterns and make informed decisions, and maintain compliance with regulatory standards in a cost-effective manner.

An alternative way to store your cloud-native metrics more efficiently is to use Elastic Time Series Data Stream (TSDS). TSDS can store your metrics data more efficiently with ~70% less disk space than a regular data stream. The downsampling functionality will further reduce the storage required by rolling up metrics within a fixed time interval into a single summary metric. This not only assists organizations in cutting down on storage expenses for metric data but also simplifies the metric infrastructure, making it easier for users to correlate metrics with logs and traces through a unified interface.

Advanced analytics

Besides Metrics Explorer and Discover, Elasticsearch® provides more advanced analytics capabilities and empowers organizations to gain deeper, more valuable insights into their Prometheus metrics data.

Out of the box, Prometheus integration provides a default overview dashboard.

From Metrics Explorer or Discover, users can also easily edit their Prometheus metrics visualization in Elastic Lens or create new visualizations from Lens.

Elastic Lens enables users to explore and visualize data intuitively through dynamic visualizations. This user-friendly interface eliminates the need for complex query languages, making data analysis accessible to a broader audience. Elasticsearch also offers other powerful visualization methods with aggregations and filters, enabling users to perform advanced analytics on their Prometheus metrics data, including short-term and historical data. To learn more, check out the how-to series: Kibana.

Anomaly detection and forecasting

When analyzing data, maintaining a constant watch on the screen is simply not feasible, especially when dealing with millions of time series of Prometheus metrics. Engineers frequently encounter the challenge of differentiating normal from abnormal data points, which involves analyzing historical data patterns — a process that can be exceedingly time consuming and often exceeds human capabilities. Thus, there is a pressing need for a more intelligent approach to detect anomalies efficiently.

Setting up alerts may seem like an obvious solution, but relying solely on rule-based alerts with static thresholds can be problematic. What's normal on a Wednesday at 9:00 a.m. might be entirely different from a Sunday at 2:00 a.m. This often leads to complex and hard-to-maintain rules or wide alert ranges that end up missing crucial issues. Moreover, as your business, infrastructure, users, and products evolve, these fixed rules don't keep up, resulting in lots of false positives or, even worse, important issues slipping through the cracks without detection. A more intelligent and adaptable approach is needed to ensure accurate and timely anomaly detection.

Elastic's machine learning anomaly detection excels in such scenarios. It automatically models the normal behavior of your Prometheus data, learning trends, and identifying anomalies, thereby reducing false positives and improving mean time to resolution (MTTR). With over 13 years of development experience in this field, Elastic has emerged as a trusted industry leader.

The key advantage of Elastic's machine learning anomaly detection lies in its unsupervised learning approach. By continuously observing real-time data, it acquires an understanding of the data's behavior over time. This includes grasping daily and weekly patterns, enabling it to establish a normalcy range of expected behavior. Behind the scenes, it constructs statistical models that allow accurate predictions, promptly identifying any unexpected variations. In cases where emerging data exhibits unusual trends, you can seamlessly integrate with alerting systems, operationalizing this valuable insight.

Machine learning's ability to project into the future, forecasting data trends one day, a week, or even a month ahead, equips engineers not only with reporting capabilities but also with pattern recognition and failure prediction based on historical Prometheus data. This plays a crucial role in maintaining mission-critical workloads, offering organizations a proactive monitoring approach. By foreseeing and addressing issues before they escalate, organizations can avert downtime, cut costs, optimize resource utilization, and ensure uninterrupted availability of their vital applications and services.

Creating a machine learning job for your Prometheus data is a straightforward task with a few simple steps. Simply specify the data index and set the desired time range in the single metric view. The machine learning job will then automatically process the historical data, building statistical models behind the scenes. These models will enable the system to predict trends and identify anomalies effectively, providing valuable and actionable insights for your monitoring needs.

In essence, Elastic machine learning empowers us to harness the capabilities of data scientists and effectively apply them in monitoring Prometheus metrics. By seamlessly detecting anomalies and predicting potential issues in advance, Elastic machine learning bridges the gap and enables IT professionals to benefit from the insights derived from advanced data analysis. This practical and accessible approach to anomaly detection equips organizations with a proactive stance toward maintaining the reliability of their systems.

Try it out

Start a free trial on Elastic Cloud and ingest your Prometheus metrics into Elastic. Enhance your Prometheus monitoring with Elastic Observability. Stay ahead of potential issues with advanced AI/ML anomaly detection and prediction capabilities. Eliminate data silos, reduce costs, and enhance overall response efficiency.

Elevate your monitoring capabilities with Elastic today!

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Migrating Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform

Tue, 28 Apr 2026 00:00:00 GMT

The Observability Migration Platform is a CLI-driven workflow that translates supported Grafana and Datadog assets into Kibana-native outputs and produces the evidence needed to review the result. It changes migration from a manual rebuild into a translation-and-verification workflow that gets teams into Elastic Observability faster.

Migrations covered by the Observability Migration Platform

The current scope covers Datadog and Grafana. The platform can work from exported assets or live APIs, and it focuses on dashboards and alerting content on the Datadog and Grafana paths it currently covers.

Support is not identical across the two sources. Datadog has end-to-end extraction, validation, compile, upload, smoke, and verification workflows, but it currently covers a narrower slice of widgets and monitors. Grafana coverage is broader. The platform provides a practical translation pipeline for the supported paths.

The screenshots below show examples of dashboards after migration.

How the Observability Migration Platform works

At a high level, the workflow has two halves: source-aware translation on the way in and target-aware validation and delivery on the way out. That split matters because Grafana and Datadog differ not only in JSON shape, but also in query languages, panel types, controls, and alerting models.

A run starts with exported assets or live source APIs. From there, the workflow normalizes source-specific objects, chooses a translation path for each supported dashboard, panel, and alerting artifact, and emits Kibana-native output. This is where most of the source-specific logic lives: translating queries or Datadog formulas, mapping panel semantics, carrying forward controls and links where possible, and deciding when an exact translation is not the right answer.

The second half is target-aware. The emitted output can be validated against an Elastic target, compiled, and uploaded to Kibana through the shared runtime. In the happy path, that yields a working translated dashboard. In rougher cases, validation may show that a panel cannot run safely as emitted. When that happens, the workflow is designed to fail conservatively: it can mark the panel for manual review or replace it with an upload-safe placeholder instead of shipping a broken runtime panel.

Just as important, the outcome is not simply "a dashboard showed up in Kibana." The workflow also produces reviewer-facing evidence such as a migration report, manifest, verification packets, and rollout plan so you can see what translated cleanly, what was downgraded or manualized, and what still needs human judgment. Those artifacts are what make the process operationally credible: they give teams something concrete to inspect, compare, and act on.

Running the migration

The platform is CLI-driven, and a good fit for migration work that needs to be repeatable, reviewable, and easy to automate. Users can start with a representative slice of dashboards and alerting content from Grafana or Datadog, point the workflow at an Elastic target, and use that first run to understand translation quality, validation results, and how much follow-up review is required.

To run the full path against Elastic, create an Elastic Observability Serverless project, generate a Serverless project API key, and point the CLI at your Elasticsearch and Kibana endpoints:

obs-migrate migrate \
  --source grafana \
  --input-mode files \
  --input-dir ./grafana_exports \
  --output-dir ./migration_output \
  --assets all \
  --native-promql \
  --data-view "metrics-*" \
  --validate \
  --es-url "$ELASTICSEARCH_ENDPOINT" \
  --es-api-key "$KEY" \
  --kibana-url "$KIBANA_ENDPOINT" \
  --kibana-api-key "$KEY" \
  --upload

The run validates the emitted queries against Elastic, compiles the generated dashboards, uploads them to Kibana, and produces the standard migration artifacts for review.

A typical run looks like this:

Start with exported assets or live source APIs from Grafana or Datadog.
Choose the asset scope with --assets dashboards, --assets alerts, or --assets all.
Translate the supported dashboards, queries, controls, and alerting artifacts into Kibana-native output.
Validate the emitted content against an Elastic target (if configured), then compile and upload the translated dashboards for dashboard-capable runs.
Review the migration evidence, including migration_report.json, verification_packets.json, run_summary.json, etc., to understand what translated cleanly, where semantic gaps remain, and which dashboards, panels, or alert rules still require human review.
If alert rule creation is enabled, review the migrated rules (which are disabled by default) in Kibana before deciding which ones to enable or redesign.

What's next

The platform is still evolving, and will continue to gain depth and self-service capabilities. The biggest open areas are stronger measured source-to-target semantic verification, further coverage for Datadog, deeper coverage for harder query families and non-dashboard surfaces, and cleaner shared runtime contracts across the workflow.

It is also built to grow over time. The source and target boundaries are explicit by design, which gives the platform room to expand coverage and support additional source paths in the future.

In conclusion

If you are planning a move into Elastic, a good starting point is to create an Elastic Observability Serverless project. That gives you the target environment where translated dashboards and alerting content can be validated and reviewed.

To learn more about the migration workflow, talk to your Elastic representative about current access, supported coverage, and how it can help with your migration needs.

How Prometheus Remote Write Ingestion Works in Elasticsearch

Tue, 14 Apr 2026 00:00:00 GMT

Elasticsearch recently added native support for the Prometheus Remote Write protocol. You can point Prometheus (or Grafana Alloy) at an Elasticsearch endpoint and ship metrics without any adapter in between.

This post looks at what happens inside Elasticsearch when a Remote Write request arrives.

If you want to understand the implementation, evaluate how Elasticsearch compares to other Prometheus-compatible backends, or contribute, this is the post for you. A companion post, Ship Prometheus Metrics to Elasticsearch with Remote Write, covers the setup and configuration side.

Request lifecycle: from HTTP to indexed documents

A quick note on the Prometheus data model before we dive in: Prometheus stores all metric values as 64-bit floats and treats the metric name as just another label (__name__). The storage engine itself is agnostic of whether a value is a counter or a gauge. Keep this in mind as we walk through how Elasticsearch maps these concepts.

Here is the full path of a Remote Write request through Elasticsearch:

HTTP layer — The endpoint receives a compressed protobuf payload, checks indexing pressure, decompresses with Snappy, and parses the protobuf WriteRequest.
Document construction — Each sample in each time series becomes an Elasticsearch document with @timestamp, labels.*, and metrics.* fields.
Bulk indexing — All documents from a single request are written to the target data stream via a single bulk call.

The sections below walk through each stage in detail.

HTTP layer

The endpoint accepts application/x-protobuf POST requests. The incoming request body is tracked against the same indexing pressure limits that protect the bulk indexing API. If the cluster is already under heavy indexing load, the request gets rejected with a 429 before any parsing happens.

Prometheus compresses Remote Write payloads with Snappy. Elasticsearch decompresses the body in a streaming fashion without materializing it into a single contiguous allocation, and validates the declared uncompressed size against a configurable maximum to guard against decompression bombs.

The decompressed body is then deserialized as a protobuf WriteRequest. Each WriteRequest contains a list of TimeSeries entries, and each TimeSeries contains a set of labels (key-value pairs) and a list of samples (timestamp + float64 value).

Document construction

For each sample in each time series, Elasticsearch builds an index request. Here is what a single document looks like:

{
  "@timestamp": "2026-04-01T12:00:00.000Z",
  "data_stream": {
    "type": "metrics",
    "dataset": "generic.prometheus",
    "namespace": "default"
  },
  "labels": {
    "__name__": "http_requests_total",
    "job": "prometheus",
    "instance": "localhost:9090",
    "method": "GET",
    "status": "200"
  },
  "metrics": {
    "http_requests_total": 1027.0
  }
}

All labels from the Prometheus time series (including __name__) end up in the labels.* fields. The metric value goes into metrics., where is the value of the __name__ label.

Time series without a __name__ label are dropped entirely, and the samples are counted as failures. Non-finite values (NaN, Infinity, negative Infinity) are silently skipped. This includes Prometheus staleness markers, which use a special NaN bit pattern (0x7ff0000000000002) to signal that a series has disappeared.

One sample, one document

You might wonder whether storing each individual sample as its own document creates significant storage overhead, especially for labels. A common pattern to reduce that overhead was to group all metrics sharing the same labels and timestamp into a single document.

With recent TSDB improvements, that optimization is no longer necessary. Elasticsearch has trimmed the per-document storage overhead to the point where there is negligible difference between packing many metrics in a single document and writing each sample separately. A dedicated post covering these TSDB storage improvements in detail is coming soon.

Bulk indexing

All documents from a single Remote Write request are sent to Elasticsearch via a single bulk request. Each document targets the data stream metrics-{dataset}.prometheus-{namespace} and is indexed as an append-only create operation.

Metric type inference

Remote Write v1 does not reliably transmit metric types alongside samples. Prometheus sends metadata (type, help text, unit) in separate requests roughly once per minute, and those requests may land on a different node than the samples. Buffering samples until metadata arrives is not practical in a distributed system, so Elasticsearch infers the type from naming conventions instead.

Metric names ending in _total, _sum, _count, or _bucket are mapped as counters. Everything else defaults to gauge. This is a well-established convention that other Prometheus-compatible backends use as well.

http_requests_total             → counter
request_duration_seconds_sum    → counter
request_duration_seconds_count  → counter
request_duration_seconds_bucket → counter
process_resident_memory_bytes   → gauge
go_goroutines                   → gauge

The heuristic can be wrong. A metric like temperature_total (if someone named a gauge that way) would be misclassified as a counter. The main consequence today is that some ES|QL functions like rate() require the metric type to be a counter and will reject a misclassified gauge. For PromQL, we plan to lift this restriction so that rate() works regardless of the declared type, which will make incorrect inference less consequential.

You can override the inference by creating a metrics-prometheus@custom component template with custom dynamic templates. For example, to treat all *_counter fields as counters:

PUT /_component_template/metrics-prometheus@custom
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "counter": {
            "path_match": "metrics.*_counter",
            "mapping": {
              "type": "double",
              "time_series_metric": "counter"
            }
          }
        }
      ]
    }
  }
}

Custom dynamic templates are merged with the built-in ones, so the default naming-convention rules still apply for metrics you don't explicitly override.

The index template

Elasticsearch installs a built-in index template that matches metrics-*.prometheus-*. This template is what makes field type inference work without manual mapping configuration.

TSDS mode is enabled, which gives you time-based partitioning, optimized storage, deduplication, and the ability to downsample data as it ages.

Passthrough object fields are used for both the labels and metrics namespaces. This serves three purposes:

Namespace isolation: Labels and metrics live in separate object namespaces (labels.* and metrics.*), so a label named status and a metric named status cannot conflict with each other.
Dimension identification: The labels passthrough object is configured with time_series_dimension: true, which means every field under labels.* is automatically treated as a TSDS dimension. When Prometheus sends a time series with a label you have never seen before, it becomes a dimension without any explicit field mapping.
Transparent queries: You don't need to write the labels. or metrics. prefix in ES|QL or PromQL. A query can reference job instead of labels.job, or http_requests_total instead of metrics.http_requests_total. The passthrough mapping handles the resolution.

Dynamic inference for metrics applies the naming-convention heuristics described above. When a new metric name appears for the first time, its field mapping is created automatically under metrics.* with the correct time_series_metric annotation.

Failure store is enabled. Documents that fail indexing (for example, due to a mapping conflict where the same metric name appears with incompatible types) are routed to a separate failure store instead of being dropped silently.

Data stream routing

The three URL patterns map directly to data stream names:

URL pattern	Data stream
`/_prometheus/api/v1/write`	`metrics-generic.prometheus-default`
`/_prometheus/metrics/{dataset}/api/v1/write`	`metrics-{dataset}.prometheus-default`
`/_prometheus/metrics/{dataset}/{namespace}/api/v1/write`	`metrics-{dataset}.prometheus-{namespace}`

This lets you separate metrics from different Prometheus instances or environments into different data streams. That separation is useful for a few reasons.

Lifecycle isolation: you can apply different retention policies per data stream. Production metrics might be kept for 90 days, while dev metrics might expire after 7 days.

Access control: you can scope API keys to specific data streams. A team's Prometheus instance writes to metrics-teamA.prometheus-prod, and their API key only has access to that stream.

Query performance: PromQL queries and Grafana dashboards can be scoped to a specific index pattern, avoiding scans of unrelated data.

Error handling and the Remote Write spec

The Remote Write spec defines two response classes: retryable (5xx, 429) and non-retryable (4xx). Prometheus uses this distinction to decide whether to retry or drop a failed request.

Elasticsearch returns 429 (Too Many Requests) if any sample in the bulk request was rejected due to indexing pressure. This signals Prometheus to back off and retry with exponential backoff.

For partial failures (some samples indexed, others rejected), the response includes a summary. It reports how many samples failed, grouped by target index and status code, along with a sample error message from each group.

Time series without a __name__ label result in a 400 error for those samples. Non-finite values (NaN, Infinity) are silently dropped: Prometheus receives a success response and will not retry.

NaN appears most commonly for summary quantiles when no observations have been recorded (for example, a p99 latency metric before any requests arrive) and for staleness markers. The practical impact of dropping these is limited today: for most queries, a missing sample behaves similarly to a NaN one, since PromQL's lookback window fills the gap with the last known value either way. The more significant gap is staleness markers, which are covered below.

What's next: Remote Write v2 and beyond

Remote Write v2 is still experimental, which is why the current implementation starts with v1. But v2 addresses several of v1's shortcomings.

Metadata alongside samples: v2 sends metric type, unit, and description with each time series in the same request. This eliminates the need for naming-convention heuristics entirely.

Native histograms: v2 supports Prometheus native histograms, which map naturally to Elasticsearch's exponential_histogram field type. Classic histograms (one counter per bucket boundary) are verbose and lose precision at query time. Native histograms are more compact and more accurate.

Dictionary encoding: v2 replaces repeated label strings with integer references, reducing payload size significantly for high-cardinality label sets.

Created timestamps: counters in v2 include a "created" timestamp that marks when the counter was initialized. This allows backends to detect counter resets more accurately than the current heuristic (value decreased since last sample).

Beyond v2, there are two other items in consideration for future enhancements.

Staleness marker support: currently, staleness markers (the special NaN that Prometheus writes when a scrape target disappears) are dropped. Supporting them would allow correct PromQL lookback behavior and avoid the 5-minute "trailing data" artifact where a disappeared series still appears in query results.

Shared metric field: the current layout creates a separate field for each metric name (metrics.http_requests_total, metrics.go_goroutines, etc.). This works, but it means the number of field mappings grows with the number of distinct metric names, which is why the field limit is set to 10,000 for Prometheus data streams. A different approach we're considering is to store the metric name only in the __name__ label and write the metric value to a single shared field. This eliminates the field explosion problem entirely and more closely matches how Prometheus stores data internally. This direction is part of the broader effort to make Elasticsearch's metrics storage more efficient and more compatible with Prometheus conventions.

Availability

The Prometheus Remote Write endpoint is available now on Elasticsearch Serverless with no additional configuration.

For self-managed clusters, check out start-local to get up and running quickly.

If you run into issues or have feedback, open an issue on the Elasticsearch repository.

Ship Prometheus Metrics to Elasticsearch with Remote Write

Tue, 14 Apr 2026 00:00:00 GMT

Prometheus has a well-defined protocol for shipping metrics to external storage: Remote Write. Elasticsearch now implements this protocol natively, so you can add it as a remote_write destination with a single config block.

This lets you bring your Prometheus metrics into the same cluster which can also store logs, traces, and other data. One storage backend, one set of access controls, one place to query.

Why store Prometheus metrics in Elasticsearch?

Prometheus local storage is designed for short retention, typically 15 to 30 days. For anything beyond that, you need a remote storage backend.

Elasticsearch's time series data streams (TSDS) are built for highly efficient long term metrics storage: automatic rollover, time-based partitioning, compression via index sorting, and downsampling to reduce storage costs as data ages. Your Prometheus scrape configs stay the same.

Recent Elasticsearch releases have significantly reduced the storage footprint for metrics. A dedicated post with the numbers is coming soon.

On the query side, ES|QL embraces PromQL: a built-in PROMQL function lets your existing queries run unchanged, while the rest of ES|QL is available when you want joins, aggregations, or transformations that span multiple datasets.

And because metrics land in the same store as your logs, traces, and profiling data, correlating signals across types becomes a single query rather than a cross-system investigation.

How it works

For a detailed look at what happens inside Elasticsearch when a Remote Write request arrives — protobuf parsing, metric type inference, TSDS mapping, and data stream routing — see How Prometheus Remote Write Ingestion Works in Elasticsearch.

Prometheus sends metrics to Elasticsearch via the standard Remote Write protocol (v1). The endpoint accepts protobuf-encoded, snappy-compressed WriteRequest payloads.

Each sample becomes an Elasticsearch document in a pre-defined time series data stream. Prometheus labels become TSDS dimensions. The metric value is stored in a typed field under metrics..

Elasticsearch infers the metric type (counter vs gauge) from naming conventions. Names ending in _total, _sum, _count, or _bucket are treated as counters. Everything else is treated as a gauge.

Setting it up

Step 1: Get an Elasticsearch endpoint

You need an Elasticsearch cluster with the Prometheus endpoints enabled. The simplest option is Elastic Cloud Serverless, where this works out of the box.

For serverless: sign in to cloud.elastic.co, create an Observability project, and copy the Elasticsearch endpoint from the project settings page. The endpoint looks like https://.es...elastic.cloud.

Step 2: Create an API key

Create an API key scoped to writing metrics data streams only. In your Elastic Cloud Serverless project, go to Admin and settings (the gear icon at the bottom left of the side nav), then API keys.

Use the following role descriptor in the Control security privileges section:

{
  "ingest": {
    "indices": [
      {
        "names": ["metrics-*"],
        "privileges": ["auto_configure", "create_doc"]
      }
    ]
  }
}

Copy the key value before closing the dialog. You will not be able to retrieve it again.

Step 3: Configure Prometheus

Add the following remote_write block to your prometheus.yml:

remote_write:
  - url: "https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write"
    authorization:
      type: ApiKey
      credentials: YOUR_API_KEY

That's it. Prometheus will start shipping metrics to Elasticsearch on the next scrape interval.

If you use Grafana Alloy instead of Prometheus, the equivalent configuration is:

prometheus.remote_write "elasticsearch" {
  endpoint {
    url = "https://YOUR_ES_ENDPOINT/_prometheus/api/v1/write"
    headers = {"Authorization" = "ApiKey YOUR_API_KEY"}
  }
}

Routing metrics to separate data streams

By default, all metrics land in metrics-generic.prometheus-default. You can route metrics from different environments or teams into separate data streams using the dataset and namespace path segments in the URL.

The three URL patterns are:

/_prometheus/api/v1/write routes to metrics-generic.prometheus-default
/_prometheus/metrics/{dataset}/api/v1/write routes to metrics-{dataset}.prometheus-default
/_prometheus/metrics/{dataset}/{namespace}/api/v1/write routes to metrics-{dataset}.prometheus-{namespace}

For example, using /_prometheus/metrics/infrastructure/production/api/v1/write routes data to metrics-infrastructure.prometheus-production.

This is useful for separating production from staging metrics, or giving different teams their own data streams with independent lifecycle policies.

What gets stored

Here is what a sample document looks like in Elasticsearch:

{
  "@timestamp": "2026-04-02T10:30:00.000Z",
  "data_stream": {
    "type": "metrics",
    "dataset": "generic.prometheus",
    "namespace": "default"
  },
  "labels": {
    "__name__": "prometheus_http_requests_total",
    "handler": "/api/v1/query",
    "code": "200",
    "instance": "localhost:9090",
    "job": "prometheus"
  },
  "metrics": {
    "prometheus_http_requests_total": 42
  }
}

Labels map to keyword fields that serve as TSDS dimensions. The metric value is stored under metrics. with the inferred time_series_metric type (counter or gauge).

Elasticsearch installs a built-in index template matching metrics-*.prometheus-* that configures TSDS mode, passthrough dimension container objects, and a 10,000 field limit. The field limit is configurable via a custom component template (see the custom metric type inference section below for how to use one). You do not need to create any templates or mappings yourself.

Custom metric type inference

Metric type inference is based on naming conventions. Metrics that don't follow Prometheus naming best practices may be classified incorrectly. You can override the defaults by creating a metrics-prometheus@custom component template with your own dynamic templates. For example, to mark all *_counter metrics as counters:

{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "counter": {
            "path_match": "metrics.*_counter",
            "mapping": {
              "type": "double",
              "time_series_metric": "counter"
            }
          }
        }
      ]
    }
  }
}

Custom rules are merged with the built-in patterns, so the defaults still apply for metrics you don't override.

Current limitations

Only Remote Write v1 is supported. v2, which brings native histograms and exemplars, is planned.

Staleness markers (special NaN values Prometheus uses to signal a series has disappeared) are not yet stored or respected in queries.

Non-finite values (NaN, Infinity) are silently dropped.

Get started

The Prometheus Remote Write endpoint is available now on Elasticsearch Serverless with no configuration needed. To get started with a local cluster, start-local gets you a single-node cluster in minutes.

Once metrics are flowing, you can query them with ES|QL using the built-in PROMQL function for PromQL compatibility, or write native ES|QL queries to join metrics with logs and traces in the same store.