Felix Barnsteiner

How Prometheus Remote Write Ingestion Works in Elasticsearch

A look under the hood at Elasticsearch's Prometheus Remote Write implementation: protobuf parsing, metric type inference, TSDS mapping, and data stream routing.

Elasticsearch recently added native support for the Prometheus Remote Write protocol. You can point Prometheus (or Grafana Alloy) at an Elasticsearch endpoint and ship metrics without any adapter in between.

This post looks at what happens inside Elasticsearch when a Remote Write request arrives.

If you want to understand the implementation, evaluate how Elasticsearch compares to other Prometheus-compatible backends, or contribute, this is the post for you.

Request lifecycle: from HTTP to indexed documents

A quick note on the Prometheus data model before we dive in: Prometheus stores all metric values as 64-bit floats and treats the metric name as just another label (__name__). The storage engine itself is agnostic of whether a value is a counter or a gauge. Keep this in mind as we walk through how Elasticsearch maps these concepts.

Here is the full path of a Remote Write request through Elasticsearch:

  1. HTTP layer — The endpoint receives a compressed protobuf payload, checks indexing pressure, decompresses with Snappy, and parses the protobuf WriteRequest.
  2. Document construction — Each sample in each time series becomes an Elasticsearch document with @timestamp, labels.*, and metrics.* fields.
  3. Bulk indexing — All documents from a single request are written to the target data stream via a single bulk call.

The sections below walk through each stage in detail.

HTTP layer

The endpoint accepts application/x-protobuf POST requests. The incoming request body is tracked against the same indexing pressure limits that protect the bulk indexing API. If the cluster is already under heavy indexing load, the request gets rejected with a 429 before any parsing happens.

Prometheus compresses Remote Write payloads with Snappy. Elasticsearch decompresses the body in a streaming fashion without materializing it into a single contiguous allocation, and validates the declared uncompressed size against a configurable maximum to guard against decompression bombs.

The decompressed body is then deserialized as a protobuf WriteRequest. Each WriteRequest contains a list of TimeSeries entries, and each TimeSeries contains a set of labels (key-value pairs) and a list of samples (timestamp + float64 value).

Document construction

For each sample in each time series, Elasticsearch builds an index request. Here is what a single document looks like:

{
  "@timestamp": "2026-04-01T12:00:00.000Z",
  "data_stream": {
    "type": "metrics",
    "dataset": "generic.prometheus",
    "namespace": "default"
  },
  "labels": {
    "__name__": "http_requests_total",
    "job": "prometheus",
    "instance": "localhost:9090",
    "method": "GET",
    "status": "200"
  },
  "metrics": {
    "http_requests_total": 1027.0
  }
}

All labels from the Prometheus time series (including __name__) end up in the labels.* fields. The metric value goes into metrics.<metric_name>, where <metric_name> is the value of the __name__ label.

Time series without a __name__ label are dropped entirely, and the samples are counted as failures. Non-finite values (NaN, Infinity, negative Infinity) are silently skipped. This includes Prometheus staleness markers, which use a special NaN bit pattern (0x7ff0000000000002) to signal that a series has disappeared.

One sample, one document

You might wonder whether storing each individual sample as its own document creates significant storage overhead, especially for labels. A common pattern to reduce that overhead was to group all metrics sharing the same labels and timestamp into a single document.

With recent TSDB improvements, that optimization is no longer necessary. Elasticsearch has trimmed the per-document storage overhead to the point where there is negligible difference between packing many metrics in a single document and writing each sample separately. A dedicated post covering these TSDB storage improvements in detail is coming soon.

Bulk indexing

All documents from a single Remote Write request are sent to Elasticsearch via a single bulk request. Each document targets the data stream metrics-{dataset}.prometheus-{namespace} and is indexed as an append-only create operation.

Metric type inference

Remote Write v1 does not reliably transmit metric types alongside samples. Prometheus sends metadata (type, help text, unit) in separate requests roughly once per minute, and those requests may land on a different node than the samples. Buffering samples until metadata arrives is not practical in a distributed system, so Elasticsearch infers the type from naming conventions instead.

Metric names ending in _total, _sum, _count, or _bucket are mapped as counters. Everything else defaults to gauge. This is a well-established convention that other Prometheus-compatible backends use as well.

http_requests_total             → counter
request_duration_seconds_sum    → counter
request_duration_seconds_count  → counter
request_duration_seconds_bucket → counter
process_resident_memory_bytes   → gauge
go_goroutines                   → gauge

The heuristic can be wrong. A metric like temperature_total (if someone named a gauge that way) would be misclassified as a counter. The main consequence today is that some ES|QL functions like rate() require the metric type to be a counter and will reject a misclassified gauge. For PromQL, we plan to lift this restriction so that rate() works regardless of the declared type, which will make incorrect inference less consequential.

You can override the inference by creating a metrics-prometheus@custom component template with custom dynamic templates. For example, to treat all *_counter fields as counters:

PUT /_component_template/metrics-prometheus@custom
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "counter": {
            "path_match": "metrics.*_counter",
            "mapping": {
              "type": "double",
              "time_series_metric": "counter"
            }
          }
        }
      ]
    }
  }
}

Custom dynamic templates are merged with the built-in ones, so the default naming-convention rules still apply for metrics you don't explicitly override.

The index template

Elasticsearch installs a built-in index template that matches metrics-*.prometheus-*. This template is what makes field type inference work without manual mapping configuration.

TSDS mode is enabled, which gives you time-based partitioning, optimized storage, deduplication, and the ability to downsample data as it ages.

Passthrough object fields are used for both the labels and metrics namespaces. This serves three purposes:

  1. Namespace isolation: Labels and metrics live in separate object namespaces (labels.* and metrics.*), so a label named status and a metric named status cannot conflict with each other.

  2. Dimension identification: The labels passthrough object is configured with time_series_dimension: true, which means every field under labels.* is automatically treated as a TSDS dimension. When Prometheus sends a time series with a label you have never seen before, it becomes a dimension without any explicit field mapping.

  3. Transparent queries: You don't need to write the labels. or metrics. prefix in ES|QL or PromQL. A query can reference job instead of labels.job, or http_requests_total instead of metrics.http_requests_total. The passthrough mapping handles the resolution.

Dynamic inference for metrics applies the naming-convention heuristics described above. When a new metric name appears for the first time, its field mapping is created automatically under metrics.* with the correct time_series_metric annotation.

Failure store is enabled. Documents that fail indexing (for example, due to a mapping conflict where the same metric name appears with incompatible types) are routed to a separate failure store instead of being dropped silently.

Data stream routing

The three URL patterns map directly to data stream names:

URL patternData stream
/_prometheus/api/v1/writemetrics-generic.prometheus-default
/_prometheus/metrics/{dataset}/api/v1/writemetrics-{dataset}.prometheus-default
/_prometheus/metrics/{dataset}/{namespace}/api/v1/writemetrics-{dataset}.prometheus-{namespace}

This lets you separate metrics from different Prometheus instances or environments into different data streams. That separation is useful for a few reasons.

Lifecycle isolation: you can apply different retention policies per data stream. Production metrics might be kept for 90 days, while dev metrics might expire after 7 days.

Access control: you can scope API keys to specific data streams. A team's Prometheus instance writes to metrics-teamA.prometheus-prod, and their API key only has access to that stream.

Query performance: PromQL queries and Grafana dashboards can be scoped to a specific index pattern, avoiding scans of unrelated data.

Error handling and the Remote Write spec

The Remote Write spec defines two response classes: retryable (5xx, 429) and non-retryable (4xx). Prometheus uses this distinction to decide whether to retry or drop a failed request.

Elasticsearch returns 429 (Too Many Requests) if any sample in the bulk request was rejected due to indexing pressure. This signals Prometheus to back off and retry with exponential backoff.

For partial failures (some samples indexed, others rejected), the response includes a summary. It reports how many samples failed, grouped by target index and status code, along with a sample error message from each group.

Time series without a __name__ label result in a 400 error for those samples. Non-finite values (NaN, Infinity) are silently dropped: Prometheus receives a success response and will not retry.

NaN appears most commonly for summary quantiles when no observations have been recorded (for example, a p99 latency metric before any requests arrive) and for staleness markers. The practical impact of dropping these is limited today: for most queries, a missing sample behaves similarly to a NaN one, since PromQL's lookback window fills the gap with the last known value either way. The more significant gap is staleness markers, which are covered below.

What's next: Remote Write v2 and beyond

Remote Write v2 is still experimental, which is why the current implementation starts with v1. But v2 addresses several of v1's shortcomings.

Metadata alongside samples: v2 sends metric type, unit, and description with each time series in the same request. This eliminates the need for naming-convention heuristics entirely.

Native histograms: v2 supports Prometheus native histograms, which map naturally to Elasticsearch's exponential_histogram field type. Classic histograms (one counter per bucket boundary) are verbose and lose precision at query time. Native histograms are more compact and more accurate.

Dictionary encoding: v2 replaces repeated label strings with integer references, reducing payload size significantly for high-cardinality label sets.

Created timestamps: counters in v2 include a "created" timestamp that marks when the counter was initialized. This allows backends to detect counter resets more accurately than the current heuristic (value decreased since last sample).

Beyond v2, there are two other items in consideration for future enhancements.

Staleness marker support: currently, staleness markers (the special NaN that Prometheus writes when a scrape target disappears) are dropped. Supporting them would allow correct PromQL lookback behavior and avoid the 5-minute "trailing data" artifact where a disappeared series still appears in query results.

Shared metric field: the current layout creates a separate field for each metric name (metrics.http_requests_total, metrics.go_goroutines, etc.). This works, but it means the number of field mappings grows with the number of distinct metric names, which is why the field limit is set to 10,000 for Prometheus data streams. A different approach we're considering is to store the metric name only in the __name__ label and write the metric value to a single shared field. This eliminates the field explosion problem entirely and more closely matches how Prometheus stores data internally. This direction is part of the broader effort to make Elasticsearch's metrics storage more efficient and more compatible with Prometheus conventions.

Availability

The Prometheus Remote Write endpoint is available now on Elasticsearch Serverless with no additional configuration.

For self-managed clusters, check out start-local to get up and running quickly.

If you run into issues or have feedback, open an issue on the Elasticsearch repository.

Share this article