Elastic Observability Labs - Elastic Architecture Enhancements

How to cut Elasticsearch log storage costs with LogsDB

Thu, 09 Apr 2026 00:00:00 GMT

LogsDB is a specialized Elasticsearch index mode that gives you full functionality at a fraction of the storage cost. Your Kibana dashboards, searches, alerts, and visualizations all continue to work exactly as before. No data is discarded. No queries need to be updated. No workflows break. It is one setting, and everything else gets cheaper.

In benchmarks, LogsDB brought a dataset from 162.7 GB down to 39.4 GB — a 76% reduction in storage. You can explore the full nightly benchmark results at elasticsearch-benchmarks.elastic.co.

In this tutorial you'll reproduce the experiment yourself using Kibana Dev Tools and an Apache logs dataset. You'll create two identical indices, ingest the same documents into both, and measure the storage difference with the _stats API. By the end, you'll see a 44% reduction on your test data — and understand exactly why production numbers push even higher.

Already on Elasticsearch 9.2+? Any data stream with a logs- prefix already uses LogsDB by default. Jump to What about your existing logs? to verify your setup.

Want the full picture? For the engineering history behind these savings — how Lucene doc values, synthetic _source, index sorting, and ZSTD were developed and stacked over twelve years — see Elasticsearch over the years: how LogsDB cuts index size by up to 75%.

Prerequisites

Elasticsearch 8.17+ cluster, Elastic Cloud deployment, or Serverless
Kibana with Dev Tools access
Some logs
Basic familiarity with running API calls in Kibana Dev Tools

How LogsDB saves storage

LogsDB stacks three mechanisms to achieve its storage reduction:

Index sorting — documents are sorted by host.name then @timestamp, grouping similar log lines so compression codecs find far more repeated patterns. Sorting alone accounts for roughly 30% of the savings.
ZSTD compression with delta/GCD/run-length encoding — best_compression switches from LZ4 to Zstandard and applies numeric codecs to each doc values column. The standard index in this tutorial uses LZ4, so part of what you're measuring is the full package LogsDB delivers automatically.
Synthetic _source — Elasticsearch skips storing the raw JSON blob entirely and reconstructs _source on demand from doc values, adding another 20–40% of savings on top.

Synthetic _source trade-offs: Field ordering in returned documents may differ from the original, and some edge cases around multi-value array fields behave differently. For most log analytics workloads these differences are invisible, but check the synthetic _source documentation before enabling it in latency-sensitive applications.

For a deep dive into the architecture behind each mechanism, see Elasticsearch over the years: how LogsDB cuts index size by up to 75%.

Let's now walk through the steps you can take to enable LogsDB and measure the storage savings.

Step 1: Collect logs with Elastic Agent

The recommended way to ingest Apache logs into Elasticsearch is through Elastic Agent with the Apache integration. It handles collection, parsing, ECS field mapping, and routing automatically.

Browse all available integrations in the Elastic integrations catalog.

Once the Agent is collecting logs and routing them to logs-apache.access-*, move to the next step.

Step 2: Create the two indices

All commands in this tutorial are run in Kibana Dev Tools.

Create one standard index and one LogsDB index with identical field mappings. The only difference is "index.mode": "logsdb".

Standard index:

PUT /apache-standard
{
  "mappings": {
    "properties": {
      "@timestamp":                  { "type": "date" },
      "host.name":                   { "type": "keyword" },
      "http.request.method":         { "type": "keyword" },
      "url.path":                    { "type": "keyword" },
      "http.version":                { "type": "keyword" },
      "http.response.status_code":   { "type": "integer" },
      "http.response.bytes":         { "type": "integer" },
      "http.request.referrer":       { "type": "keyword" },
      "user_agent.original":         { "type": "keyword" }
    }
  }
}

LogsDB index:

PUT /apache-logsdb
{
  "settings": {
    "index.mode": "logsdb"
  },
  "mappings": {
    "properties": {
      "@timestamp":                  { "type": "date" },
      "host.name":                   { "type": "keyword" },
      "url.path":                    { "type": "keyword" },
      "http.request.method":         { "type": "keyword" },
      "http.version":                { "type": "keyword" },
      "http.response.status_code":   { "type": "integer" },
      "http.response.bytes":         { "type": "integer" },
      "http.request.referrer":       { "type": "keyword" },
      "user_agent.original":         { "type": "keyword" }
    }
  }
}

That single "index.mode": "logsdb" line activates all three storage mechanisms. Elasticsearch enables these additional settings behind the scenes — you don't set any of them manually:

{
  "index.sort.field":              ["host.name", "@timestamp"],
  "index.sort.order":              ["asc", "desc"],
  "index.codec":                   "best_compression",
  "index.mapping.ignore_malformed": true,
  "index.mapping.ignore_above":    8191
}

Step 3: Reindex the logs

Use the _reindex API to copy the same documents into both test indices:

POST /_reindex
{
  "source": { "index": "logs-apache.access-*" },
  "dest":   { "index": "apache-standard" }
}

POST /_reindex
{
  "source": { "index": "logs-apache.access-*" },
  "dest":   { "index": "apache-logsdb" }
}

Both indices now hold identical documents, so the storage comparison in the next step reflects only the index mode difference.

Step 4: Force merge for a fair comparison

Before measuring, force merge both indices to a single segment:

POST /apache-standard/_forcemerge?max_num_segments=1

POST /apache-logsdb/_forcemerge?max_num_segments=1

These calls block until the merge finishes. Wait for both responses before continuing.

Why this matters: Elasticsearch writes data into multiple Lucene segments before merging them in the background. Measuring mid-merge gives artificially inflated numbers because each segment is compressed independently. Forcing a single segment shows the real steady-state storage footprint you'd see in a mature production index.

Only run _forcemerge on indices that are no longer being written to. Force merging an index that is still receiving writes is resource-intensive and can impact ingestion performance. In production, you can use Index Lifecycle Management (ILM) to automate force merges as part of the warm or cold phase, once an index is rolled over and no longer actively ingested into.

Step 5: Measure the difference

GET /apache-standard/_stats?filter_path=indices.*.primaries.store

GET /apache-logsdb/_stats?filter_path=indices.*.primaries.store

The filter_path parameter keeps the response focused. Look for primaries.store.size_in_bytes in each response.

In our test with Apache log records, the results were:

Index	Documents	Size
apache-standard	111,818	15.37 MB
apache-logsdb	111,818	8.6 MB
Reduction		44%

To put this in perspective: at 1 TB of log data, LogsDB brings that down to around 560 GB. That's 450 GB saved without any changes to your queries. At production scale with billions of documents and synthetic _source enabled, savings push to 76% — taking 162.7 GB down to 39.4 GB in our benchmark.

Visualize in Kibana

To see the storage difference visually, open Kibana and go to Management → Stack Management → Index Management. You'll see both indices listed with their current sizes side by side.

Why Kibana shows larger numbers than _stats: Kibana Index Management displays the total index size including all replica shards. The _stats query above uses primaries to report primary shards only. The ratio between the two indices remains the same either way.

What about your existing logs?

Elasticsearch 9.2+ (already enabled by default)

Since 9.2, any data stream matching the logs-* naming pattern automatically uses LogsDB. You're likely already saving storage without any configuration change.

Verify your existing data streams:

GET /.ds-logs-*/_settings?filter_path=*.settings.index.mode

If you see "index.mode": "logsdb" in the responses, you're already getting the savings.

Elasticsearch 8.x or 9.0–9.1 (enable per data stream via index template)

For earlier versions, enable LogsDB on a data stream by updating its index template. This affects all new indices created from that template — existing indices are not changed, so the transition is safe and gradual.

Option A — Update an existing template:

PUT _index_template/logs-myapp-template
{
  "index_patterns": ["logs-myapp-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "index.mode": "logsdb"
    }
  },
  "priority": 200
}

Option B — Check and patch an existing integration template:

First, find the template managing your data stream:

GET _index_template/logs-apache*

Then add the index.mode setting to the template.settings block using a PUT _index_template/ call with the full template body including your addition.

After updating the template, the next index rollover will use LogsDB. Trigger a rollover immediately if you don't want to wait:

POST /logs-myapp-default/_rollover

Upgrading from 8.x to 9.0+: Existing data streams are not changed automatically. Only new rollovers will use LogsDB. There is no data loss and no reindexing required — the savings accumulate as new indices roll over.

What about query performance?

LogsDB does not significantly impact query performance for typical log analytics workloads. The index sorting by host.name and @timestamp can actually improve range query and aggregation performance on those fields, since matching documents are stored adjacently. Queries that don't filter on those fields perform comparably to a standard index.

For indexing throughput data across releases, see the performance section of the companion article.

Conclusion

LogsDB activates with a single "index.mode": "logsdb" setting and delivers measurable storage savings immediately: 44% in our hands-on test, and 76% (162.7 GB → 39.4 GB) in production benchmarks with synthetic _source. On Elasticsearch 9.2+, logs-* data streams already use LogsDB by default. For 8.x or earlier 9.x clusters, a one-line index template change enables it on your next rollover with no data loss and no reindexing required.

Next steps

Elasticsearch over the years — how LogsDB cuts index size by up to 75% at no throughput cost

Thu, 09 Apr 2026 00:00:00 GMT

Elasticsearch was built as a search engine. That heritage has a cost for log storage: every event fans out to multiple on-disk structures, each optimized for retrieval rather than compression. LogsDB changes both. On our nightly benchmark, Enterprise mode produces a 37.5 GB index from the same data that takes 161.9 GB without LogsDB — a 77% reduction from a single setting.

![Standard vs LogsDB storage breakdown](/assets/images/elasticsearch-logsdb-storage-evolution/storage-breakdown-v3-bold@2x.png)

The write overhead

Lucene, the library underneath, keeps multiple structures for every indexed document:

The inverted index maps terms to documents. This is what makes text search fast.
_source stores the original JSON blob, returned when you fetch a document.
Doc values store field values in columns for sorting and aggregation.
Points / BKD trees index numeric and date fields for range queries.

The inverted index earns its keep: it's what lets you search a billion log lines by keyword in milliseconds, and there's no cheaper way to build that capability. _source exists to give you back exactly what you indexed: search results and GET requests return this blob directly. The problem is that it stores the full event even though the same field values are already available through doc values and the other structures.

Take a log event with fields like host.name, @timestamp, http.response.status_code, and duration_ms. The entire event is serialized as JSON in _source. The same field values are also written into doc values columns, indexed into the inverted index, and stored in BKD trees for range queries. Same data, multiple structures, each with its own on-disk footprint.

For a search engine where you need fast retrieval across all dimensions, that overhead is a reasonable tradeoff. For logs, where you rarely need the raw JSON and almost never do relevance-ranked search, much of it is pure waste.

![One incoming log event fans out to four on-disk structures](/assets/images/elasticsearch-logsdb-storage-evolution/dual-storage-bold@2x.png) _One write, four on-disk structures: `_source` (the raw JSON blob), the inverted index, doc values columns, and BKD / points trees for numeric range queries. The same field values end up in multiple places._

Why columnar storage matters for compression

Doc values are the key to everything LogsDB does. Unlike _source, which stores entire documents as blobs, doc values store each field as a separate column across all documents in a Lucene segment.

Picture a segment with a million log events. The _source representation is a million JSON blobs, one per event, each containing all fields jumbled together. The doc values representation is a set of columns: one column of a million timestamps, one column of a million host names, one column of a million status codes, and so on.

![Row-oriented vs column-oriented storage](/assets/images/elasticsearch-logsdb-storage-evolution/doc-values-columns-bold@2x.png) _Row-oriented `_source` keeps all fields for each document in one blob — doc0 through doc5 each carry `host.name`, `@timestamp`, `status`, `duration_ms`, and more jumbled together. Column-oriented doc values restructure the same data so all `host.name` values sit in one column, all timestamps in another, all status codes in another. Compression codecs can then run on each contiguous column independently._

That columnar layout is what makes per-column compression possible. When all values of http.response.status_code sit in a contiguous column, Lucene can apply codecs that exploit patterns in the sequence.

Delta encoding stores differences between adjacent values instead of full values. GCD encoding finds a common factor and divides everything down. Run-length encoding collapses repeats. Lucene picks the codec per segment and re-evaluates when segments merge.

![Numeric codec pipeline: RAW → DELTA → GCD → BIT-PACK](/assets/images/elasticsearch-logsdb-storage-evolution/numeric-codec-pipeline-bold@2x.png) _Four sorted `@timestamps` from the same host, compressed in four stages. RAW: four 32-bit integers, 128 bits total. DELTA: store differences instead of full values — base stays, deltas +100, +200, +300 take 59 bits. GCD: divide out the common factor of 100, leaving 1, 2, 3 at 39 bits. BIT-PACK: pack those three small integers into contiguous bit storage, 9 bits freed._

But here's the catch: these codecs only work well when adjacent documents have correlated values. Consider the @timestamp column.

If logs arrive from dozens of hosts interleaved randomly, the timestamps in the column jump around. The delta between adjacent values might be +3 seconds, then -47 seconds, then +120 seconds. Delta encoding can't do much with that.

Now consider what happens if you sort by host.name and @timestamp before writing to the segment. All logs from host-A land in a contiguous run, followed by all logs from host-B, and so on. Within each host's run, the timestamps are monotonically increasing and the deltas are predictable.

Four timestamps from the same host might look like 1706745600, +100s, +200s, +300s. Delta encoding shrinks those to a base value plus three small integers.

GCD encoding finds that 100, 200, 300 are all divisible by 100 and stores 1, 2, 3 instead. Bit-packing then fits those three values into a handful of bits. The same pattern applies to fields like host.name, service.name, or http.response.status_code: within a sorted run, long stretches of identical values collapse to near nothing under run-length encoding.

![Index sorting: arrival order → sorted by host.name → after RLE](/assets/images/elasticsearch-logsdb-storage-evolution/index-sorting-bold@2x.png) _Five hosts — api-01, api-02, db-01, web-01, web-02 — scattered randomly in arrival order (left). Sorting by `host.name` groups them into five contiguous blocks of eight (center). Run-length encoding collapses each block to a single (value, count) pair — 5 pairs stored instead of 40, the remaining slots freed (right)._

Elasticsearch never sorted by default. Documents landed in arrival order, compressed with DEFLATE. We left a lot on the table.

How we got here: 2012–2026

Not all of the individual techniques in LogsDB were designed for logs. They were built over twelve years to solve different problems, and LogsDB is what happens when you stack them.

The foundation (2012–2017). Lucene 4.0 introduced doc values in 2012. By Elasticsearch 5.0 in 2016, they were on by default for all keyword and numeric fields. Lucene 7.0 added sparse doc values, so fields that only appear in some documents don't waste space on every document in the segment. That fixed a significant force-merge bloat problem (up to 10× on sparse fields) and set up the storage model everything else depends on.

![Dense vs sparse doc values encoding](/assets/images/elasticsearch-logsdb-storage-evolution/sparse-doc-values-bold@2x.png) _Dense encoding reserves an 8-byte slot per document regardless of presence. Sparse encoding stores only documents that have a value at 12 bytes each (value + doc ID). For `error_code` with 2 of 16 docs populated (12% fill), sparse is 81% smaller: 24 B vs 128 B. For `request_path` at 88% fill, sparse is larger: 168 B vs 128 B. Lucene picks per field; sparse wins below ~67% fill._

Incremental wins (2020–2021). Two smaller changes targeted observability workloads. Dictionary-based stored fields compression deduplicated repetitive string metadata for about a 10% win.

The match_only_text field type dropped term frequencies and positions from the inverted index. Term frequencies are what BM25 uses to score documents by relevance — how often a term appears in a document relative to the rest of the corpus. For log search that signal is meaningless: you don't care whether "timeout" appeared twice or seven times in a log line, you just want to find it. Positions are similar: they're stored so Elasticsearch can do exact phrase matching, but the position data is expensive and phrase queries on logs are rare enough that the tradeoff is worth it. When you do run a phrase query on a match_only_text field, it still works — it just falls back to a slower path that rescores candidates rather than using stored positions directly.

![text vs match_only_text inverted index storage](/assets/images/elasticsearch-logsdb-storage-evolution/match-only-text-bold@2x.png) _`text` stores each term with its frequency and every position it appears at. `match_only_text` keeps only the doc IDs — enough to find the document, nothing more. The `timeout` term appears twice in this message (positions 1 and 4), which is exactly the kind of data that gets dropped._

Dropping frequencies and positions cuts the inverted index for a text field by roughly 40%. The overall index impact in 2021 was only ~10%, which sounds like a poor return on a 40% field-level reduction. The reason is where storage was going at the time: _source was stored in full for every document as a raw JSON blob, doc values were uncompressed and unsorted, and nothing was using ZSTD. The message field's inverted index was a small slice of a much larger, poorly-compressed whole. As the next five years of work addressed those other structures, the same 40% field-level savings became a meaningful fraction of a much smaller total.

Neither change was decisive on its own, but they established that log-specific storage optimization was worth pursuing.

The TSDB turning point (April 2023). This is where the story really starts. We shipped synthetic _source and index sorting for time series metrics in Elasticsearch 8.7.

Synthetic source changes the write-and-read contract. At write time, we skip storing the raw JSON blob entirely. At read time, when a query needs to return the original document, we reconstruct it by reading each field's value out of doc values and stored fields and assembling them back into JSON. The result is functionally equivalent to the original _source (with minor differences like field ordering), but we never stored the blob.

Index sorting groups documents by dimension fields and timestamp before writing to disk. Together, synthetic source and index sorting cut metrics storage by up to 70%.

That result told us something important: the same architecture could work for logs.

![Standard _source vs synthetic _source](/assets/images/elasticsearch-logsdb-storage-evolution/synthetic-source-bold@2x.png) _Without LogsDB, Elasticsearch writes every log event twice: once as a raw `_source` blob on disk, once into doc values columns. LogsDB skips the blob entirely. At read time, a `GET /_doc/1` request gathers field values from doc values and assembles the document on the fly._

The TSDB codec (2024). In 8.13 and 8.14, we built a custom doc values codec with run-length encoding optimized for sorted consecutive values, PFOR-delta encoding, and cyclic ordinal encoding for multi-valued dimensions. The numbers were striking: kubernetes.pod.name doc values dropped from 110 MB to 7.25 MB in one benchmark. We extended coverage to all numeric and keyword types including ip, scaled_float, and unsigned_long.

LogsDB Tech Preview (August 2024). In 8.15, we combined everything into index.mode: logsdb: host-first sorting, synthetic _source, ZSTD compression, and the TSDB numeric codecs. One decision mattered more than expected: sort order. Sorting by host.name first, then @timestamp, delivers up to ~40% storage reduction. Sorting by timestamp first gives ≤10%. The host-first ordering co-locates documents that share field values, which is exactly what the numeric codecs need.

ZSTD and GA (November–December 2024). In 8.16, we switched best_compression from DEFLATE to ZSTD permanently (level 3, blocks up to 2,048 documents or 240 kB, native bindings via Panama FFI on JDK 21+). ZSTD gave us ~12% smaller stored fields and ~14% higher indexing throughput at the same time, which almost never happens. LogsDB went GA in 8.17.

At GA, we claimed up to 65% storage reduction.

Routing and recovery (April 2025). In 8.18, route_on_sort_fields started routing documents to shards by sort field values instead of _id. Without this optimization, Elasticsearch hashes the _id to pick a shard, so logs from the same host scatter across all shards. With routing on sort fields, logs with similar host.name values land on the same shard. This co-locates similar documents at the shard level, not just within segments, adding ~20% storage reduction at a 1–4% ingest penalty. Routing on sort fields requires auto-generated _id.

![Shard routing: standard, routed, routed + sorted](/assets/images/elasticsearch-logsdb-storage-evolution/shard-routing-bold@2x.png) _Data stream `.ds-logs-nginx-default-00001` with six hosts across three shards. STANDARD (hashed by `_id`): all host colors scattered randomly. ROUTED (`route_on_sort_fields`): same-host logs land on the same shard, but remain in arrival order within it. ROUTED + SORTED (host-first sort): each shard contains contiguous blocks of a single host — the combination that lets numeric codecs and RLE reach their full potential._

We also switched peer recovery to synthetic source reconstruction, eliminating the duplicate _recovery_source blob. In 9.0, logs-*-* indices default to LogsDB.

![Index size written: _recovery_source eliminated](/assets/images/elasticsearch-logsdb-storage-evolution/recovery-source-bold@2x.png) _Nightly synthetic source benchmark, December 2024. Index size written drops 39% — from ~279 GB to ~171 GB — the day peer recovery switches from copying the raw `_recovery_source` blob to reconstructing documents from doc values._

Merge and recovery overhaul: 9.1 (July 2025). We fully eliminated the recovery source. Peer recovery uses batched synthetic reconstruction, cutting write I/O by ~50% and boosting median indexing throughput ~19% over the 8.17 baseline. We replaced up to four separate doc values merge passes with a single pass, cutting background merge CPU by up to 40%. And we swapped _seq_no's BKD tree for Lucene doc value skippers, halving _seq_no storage.

pattern_text and Failure Store: 9.2–9.3 (October 2025–February 2026). In 9.2, we shipped pattern_text as a Tech Preview: a new field type that decomposes log messages into static templates and dynamic variable parts. A log line like Session opened for user alice from 10.0.1.42 via TLS gets split into the template Session opened for user {} from {} via TLS (stored once, as a template ID) and the variables alice, 10.0.1.42 (stored per document). For logs with high template repetition, this cuts message field storage by up to 50%. A companion template_id sub-field lets you sort by template, and the LogsDB setting index.logsdb.default_sort_on_message_template enables this automatically. pattern_text went GA in 9.3.

![TEXT vs PATTERN_TEXT field type](/assets/images/elasticsearch-logsdb-storage-evolution/pattern-text-bold@2x.png) _TEXT stores each log message as a full string per document — eight copies of near-identical blobs. PATTERN_TEXT decomposes them: the shared template `Session opened for user {} from {} via TLS` is stored once with ID T0, and only the variable columns (`user`, `ip`) are stored per document — alice/10.0.1.42, bob/10.0.1.87, carol/10.0.2.11, and so on._

pattern_text does come with an indexing CPU cost: decomposing each message into template and variables takes more work at write time than storing a raw string. Whether that tradeoff makes sense depends on your dataset and your priorities.

If your log messages follow highly repetitive patterns (structured application logs, Kubernetes events, access logs), the storage wins are large and the CPU overhead is bounded. If your messages are free-form or low-repetition, the compression gains shrink while the CPU cost stays roughly the same.

For data you keep for months or years, the cumulative storage reduction usually makes it worthwhile. For high-cardinality, rapidly changing messages where storage isn't the constraint, it may not be.

9.3 also brought compression for binary doc values, making wildcard field types significantly more storage-efficient. Internally, wildcard fields store an inverted index of trigrams in a binary doc values column; that column is now compressed with Zstandard instead of being stored raw. In one benchmark, a URL field dropped from 2.92 GB to 1.12 GB, more than 60% compression. If you use wildcard fields heavily, the gain is automatic with no mapping changes needed.

Also in 9.3, skip lists for @timestamp and host.name became available as an opt-in for LogsDB. Skip lists let Elasticsearch jump ahead in a doc values column without reading every entry, which speeds up time-range queries on large segments. Other index modes have skip lists disabled by default; in LogsDB you can enable them selectively for the fields you range-query most.

Also in 9.3, the Failure Store became enabled by default for logs-*-* data streams. Failed documents (mapping conflicts, ingest pipeline errors) now land in dedicated ::failures indices instead of being rejected, which means LogsDB's strict synthetic source requirements are less likely to cause silent data loss during migration.

Performance, not just storage

LogsDB started as a storage optimization, and the early releases came with a throughput cost — sorting, synthetic source reconstruction, and ZSTD all add work at write time. Over two years of releases, we clawed that back. Indexing throughput is now on par with what users had before enabling LogsDB. You get the storage reduction without giving up the ingest rate you were used to.

![LogsDB throughput and storage on disk over time](/assets/images/elasticsearch-logsdb-storage-evolution/performance-over-time-bold@2x.png) _Throughput (teal) has climbed from ~25k to ~35k docs/s since the Tech Preview. Storage on disk (blue) has dropped from ~65 GB to ~36 GB on the same benchmark dataset. Both curves move in the right direction, driven by the same layered releases: ZSTD in 8.16, routing optimization in 8.18, the merge and recovery overhaul in 9.1. Live numbers at [elasticsearch-benchmarks.elastic.co](https://elasticsearch-benchmarks.elastic.co/#tracks/logsdb/nightly/default/90d)._

The two trends compound each other. Less storage means fewer segments to merge, which frees CPU for indexing. Synthetic source reconstruction is cheaper to compute than it is to store and replicate the raw blob. Each release that shrank the index also reduced background I/O, which fed back into throughput.

The practical result: if you were running standard Elasticsearch for log ingestion two years ago, the throughput you had then is roughly what LogsDB delivers now — with a 50–75% smaller index alongside it.

How to enable it

As of 9.0, logs-*-* data streams default to LogsDB automatically. If your data streams match that pattern, you're already using it.

Want a hands-on walkthrough? Cut Elasticsearch log storage costs by 76% with LogsDB walks through creating two indices, reindexing, and measuring the difference with the _stats API — including version-specific enable instructions for 8.x clusters.

For other index patterns, set it in your template:

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.mode": "logsdb"
    }
  }
}

Synthetic _source turns on automatically with index.mode: logsdb.

For the routing optimization (8.18+), add one more setting:

PUT _index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.mode": "logsdb",
      "index.logsdb.route_on_sort_fields": true
    }
  }
}

This routes shards by sort field values instead of _id, adding ~20% storage reduction at a 1–4% ingestion penalty. It requires at least two sort fields beyond @timestamp and auto-generated _id.

Switching an existing index to LogsDB requires a reindex. So does rolling back. There's no in-place conversion, so try it on new data streams first.

Storage improves further as segments merge — freshly written data compresses well, but merged segments compress even better.

What's next

Elasticsearch still carries some structural overhead from its search engine roots. _id and _seq_no are two examples: both consume meaningful disk space (on small documents they can account for more than half the index size), but neither is essential for log analytics workloads.

We've already taken the first step for TSDB: PR #144026 eliminated stored _id bytes from TSDB indices by reconstructing the field on the fly from doc values, the same approach synthetic _source uses. We're exploring the same direction for LogsDB.

9.4 and beyond. The architecture still has room to improve, and we're on it.

For the full reference, see the logs data stream documentation.

How to use Elasticsearch and Time Series Data Streams for observability metrics

Thu, 04 May 2023 00:00:00 GMT

Elasticsearch is used for a wide variety of data types — one of these is metrics. With the introduction of Metricbeat many years ago and later our APM Agents, the metric use case has become more popular. Over the years, Elasticsearch has made many improvements on how to handle things like metrics aggregations and sparse documents. At the same time, TSVB visualizations were introduced to make visualizing metrics easier. One concept that was missing that exists for most other metric solutions is the concept of time series with dimensions.

Mid 2021, the Elasticsearch team embarked on making Elasticsearch a much better fit for metrics. The team created Time Series Data Streams (TSDS), which were released in 8.7 as generally available (GA).

This blog post dives into how TSDS works and how we use it in Elastic Observability, as well as how you can use it for your own metrics.

A quick introduction to TSDS

Time Series Data Streams (TSDS) are built on top of data streams in Elasticsearch that are optimized for time series. To create a data stream for metrics, an additional setting on the data stream is needed. As we are using data streams, first an Index Template has to be created:

PUT _index_template/metrics-laptop
{
  "index_patterns": [
    "metrics-laptop-*"
  ],
  "data_stream": {},
  "priority": 200,
  "template": {
    "settings": {
      "index.mode": "time_series"
    },
    "mappings": {
      "properties": {
        "host.name": {
          "type": "keyword",
          "time_series_dimension": true
        },
        "packages.sent": {
          "type": "integer",
          "time_series_metric": "counter"
        },
        "memory.usage": {
          "type": "double",
          "time_series_metric": "gauge"
        }
      }
    }
  }
}

Let's have a closer look at this template. On the top part, we mark the index pattern with metrics-laptop-*. Any pattern can be selected, but it is recommended to use the data stream naming scheme for all your metrics. The next section sets the "index.mode": "time_series" in combination with making sure it is a data_stream: "data_stream": {}.

Dimensions

Each time series data stream needs at least one dimension. In the example above, host.name is set as a dimension field with "time_series_dimension": true. You can have up to 16 dimensions by default. Not every dimension must show up in each document. The dimensions define the time series. The general rule is to pick fields as dimensions that uniquely identify your time series. Often this is a unique description of the host/container, but for some metrics like disk metrics, the disk id is needed in addition. If you are curious about default recommended dimensions, have a look at this ECS contribution with dimension properties.

Reduced storage and increased query speed

At this point, you already have a functioning time series data stream. Setting the index mode to time series automatically turns on synthetic source. By default, Elasticsearch typically duplicates data three times:

row-oriented storage (_source field)
column-oriented storage (doc_values: true for aggregations)
indices (index: true for filtering and search)

With synthetic source, the _source field is not persisted; instead, it is reconstructed from the doc values. Especially in the metrics use case, there are little benefits to keeping the source.

Not storing it means a significant reduction in storage. Time series data streams sort the data based on the dimensions and the time stamp. This means data that is usually queried together is stored together, which speeds up query times. It also means that the data points for a single time series are stored alongside each other on disk. This enables further compression of the data as the rate at which a counter increases is often relatively constant.

Metric types

But to benefit from all the advantages of TSDS, the field properties of the metrics fields must be extended with the time_series_metric: {type}. Several types are supported — as an example, gauge and counter were used above. Giving Elasticsearch knowledge about the metric type allows Elasticsearch to offer more optimized queries for the different types and reduce storage usage further.

When you create your own templates for data streams under the data stream naming scheme, it is important that you set "priority": 200 or higher, as otherwise the built-in default template will apply.

Ingest a document

Ingesting a document into a TSDS isn't in any way different from ingesting documents into Elasticsearch. You can use the following commands in Dev Tools to add a document, and then search for it and also check out the mappings. Note: You have to adjust the @timestamp field to be close to your current date and time.

# Add a document with `host.name` as the dimension
POST metrics-laptop-default/_doc
{
  # This timestamp neesd to be adjusted to be current
  "@timestamp": "2023-03-30T12:26:23+00:00",
  "host.name": "ruflin.com",
  "packages.sent": 1000,
  "memory.usage": 0.8
}

# Search for the added doc, _source will show up but is reconstructed
GET metrics-laptop-default/_search

# Check out the mappings
GET metrics-laptop-default

If you do search, it still shows _source but this is reconstructed from the doc values. The additional field added above is @timestamp. This is important as it is a required field for any data stream.

Why is this all important for Observability?

One of the advantages of the Elastic Observability solution is that in a single storage engine, all signals are brought together in a single place. Users can query logs, metrics, and traces together without having to jump from one system to another. Because of this, having a great storage and query engine not only for logs but also metrics is key for us.

Usage of TSDS in integrations

With integrations, we give our users an out of the box experience to integrate with their infrastructure and services. If you are using our integrations, eventually you will automatically get all the benefits of TSDS for your metrics assuming you are on version 8.7 or newer.

Currently we are working through the list of our integration packages, add the dimensions, metric type fields and then turn on TSDS for the metrics data streams. What this means is as soon as the package has all properties enabled, the only thing you have to do is upgrade the integration and everything else will happen automatically in the background.

To visualize your time series in Kibana, use Lens, which has native support built in for TSDS.

Learn more

If you switch over to TSDS, you will automatically benefit from all the future improvements Elasticsearch is making for metrics time series, be it more efficient storage, query performance, or new aggregation capabilities. If you want to learn more about how TSDS works under the hood and all available config options, check out the TSDS documentation. What Elasticsearch supports in 8.7 is only the first iteration of the metrics time series in Elasticsearch.

TSDS can be used since 8.7 and will be in more and more of our integrations automatically when integrations are upgraded. All you will notice is lower storage usage and faster queries. Enjoy!

Improving the Elastic APM UI performance with continuous rollups and service metrics

Thu, 29 Jun 2023 00:00:00 GMT

In today's fast-paced digital landscape, the ability to monitor and optimize application performance is crucial for organizations striving to deliver exceptional user experiences. At Elastic, we recognize the significance of providing our user base with a reliable observability platform that scales with you as you’re onboarding thousands of services that produce terabytes of data each day. We have been diligently working behind the scenes to enhance our solution to meet the demands of even the largest deployments.

In this blog post, we are excited to share the significant strides we have made in improving the UI performance of Elastic APM. Maintaining a snappy user interface can be a challenge when interactively summarizing the massive amounts of data needed to provide an overview of the performance for an entire enterprise-scale service inventory. We want to assure our customers that we have listened, taken action, and made notable architectural changes to elevate the scalability and maturity of our solution.

Architectural enhancements

Our journey began back in the 7.x series where we noticed that doing ad-hoc aggregations on raw transaction data put Elasticsearch^® under a lot of pressure in large-scale environments. Since then, we’ve begun to pre-aggregate the transactions into transaction metrics during ingestion. This has helped to keep the performance of the UI relatively stable. Regardless of how busy the monitored application is and how many transaction events it is creating, we’re just querying pre-aggregated metrics that are stored at a constant rate. We’ve enabled the metrics-powered UI by default in 7.15.

However, when showing an inventory of a large number of services over large time ranges, the number of metric data points that need to be aggregated can still be large enough to cause performance issues. We also create a time series for each distinct set of dimensions. The dimensions include metadata, such as the transaction name and the host name. Our documentation includes a full list of all available dimensions. If there’s a very high number of unique transaction names, which could be a result of improper instrumentation (see docs for more details), this will create a lot of individual time series that will need to be aggregated when requesting a summary of the service’s overall performance. Global labels that are added to the APM Agent configuration are also added as dimensions to these metrics, and therefore they can also impact the number of time series. Refer to the FAQs section below for more details.

Within the 8.7 and 8.8 releases, we’ve addressed these challenges with the following architectural enhancements that aim to reduce the number of documents Elasticsearch needs to search and aggregate on-the-fly, resulting in faster response times:

Pre-aggregation of transaction metrics into service metrics. Instead of aggregating all distinct time series that are created for each individual transaction name on-the-fly for every user request, we’re already pre-aggregating a summary time series for each service during data ingestion. Depending on how many unique transaction names the services have, this reduces the number of documents Elasticsearch needs to look up and aggregate by a factor of typically 10–100. This is particularly useful for the service inventory and the service overview pages.
Pre-aggregation of all metrics into different levels of granularity. The APM UI chooses the most appropriate level of granularity, depending on the selected time range. In addition to the metrics that are stored at a 1-minute granularity, we’re also summarizing and storing metrics at a 10-minute and 60-minute granularity level. For example, when looking at a 7-day period, the 60-minute data stream is queried instead of the 1-minute one, resulting in 60x fewer documents for Elasticsearch to examine. This makes sure that all graphs are rendered quickly, even when looking at larger time ranges.
Safeguards on the number of unique transactions per service for which we are aggregating metrics. Our agents are designed to keep the cardinality of the transaction name low. But in the wild, we’ve seen some services that have a huge amount of unique transaction names. This used to cause performance problems in the UI because APM Server would create many time series that the UI needed to aggregate at query time. In order to protect APM Server from running out of memory when aggregating a large number of time series for each unique transaction name, metrics were published without aggregating when limits for the number of time series were reached. This resulted in a lot of individual metric documents that needed to be aggregated at query time. To address the problem, we've introduced a system where we aggregate metrics in a dedicated overflow bucket for each service when limits are reached. Refer to our documentation for more details.

The exact factor of the document count reduction depends on various conditions. But to get a feeling for a typical scenario, if your services, on average, have 10 instances, no instance-specific global labels, 100 unique transaction names each, and you’re looking at time ranges that can leverage the 60m granularity, you’d see a reduction of documents that Elasticsearch needs to aggregate by a factor of 180,000 (10 instances x 100 transaction names x 60m x 3 because we’re also collapsing the event.outcome dimension). While the response times of Elasticsearch aggregations isn’t exactly scaling linearly with the number of documents, there is a strong correlation.

FAQs

When upgrading to the latest version, will my old data also load faster?

Updating to 8.8 doesn’t immediately make the UI faster. Because the improvements are powered by pre-aggregations that APM Server is doing during ingestion, only new data will benefit from it. For that reason, you should also make sure to update APM Server as well. The UI can still display data that was ingested using an older version of the stack.

If the UI is based on metrics, can I still slice and dice using custom labels?

High cardinality analysis is a big strength of Elastic Observability, and this focus on pre-aggregated metrics does not compromise that in any way.

The UI implements a sophisticated fallback mechanism that uses service metrics, transaction metrics, or raw transaction events, depending on which filters are applied. We’re not creating metrics for each user.id, for example. But you can still filter the data by user.id and the UI will then use raw transaction events. Chances are that you’re looking at a narrow slice of data when filtering by a dimension that is not available on the pre-aggregated metrics, therefore aggregations on the raw data are typically very fast.

Note that all global labels that are added to the APM agent configuration are part of the dimension of the pre-aggregated metrics, with the exception of RUM (see more details in this issue).

Can I use the pre-aggregated metrics in custom dashboards?

Yes! If you use Lens and select the "APM" data view, you can filter on either metricset.name:service_transaction or metricset.name:transaction, depending on the level of detail you need. Transaction latency is captured in transaction.duration.histogram, and successful outcomes and failed outcomes are stored in event.success_count. If you don't need a distribution of values, you can also select the transaction.duration.summary field for your metric aggregations, which should be faster. If you want to calculate the failure rate, here's a Lens formula: 1 - (sum(event.success_count) / count(event.success_count)). Note that the only granularity supported here is 1m.

Do the additional metrics have an impact on the storage?

While we’re storing more metrics than before, and we’re storing all metrics in different levels of granularity, we were able to offset that by enabling synthetic source for all metric data streams. We’ve even increased the default retention for the metrics in the coarse-grained granularity levels, so that the 60m rollup data streams are now stored for 390 days. Please consult our documentation for more information about the different metric data streams.

Are there limits on the amount of time series that APM Server can aggregate?

APM Server performs pre-aggregations in memory, which is fast, but consumes a considerable amount of memory. There are limits in place to protect APM Server from running out of memory, and from 8.7, most of them scale with available memory by default, meaning that allocating more memory to APM Server will allow it to handle more unique pre-aggregation groups like services and transactions. These limits are described in APM Server Data Model docs.

On the APM Server roadmap, we have plans to move to a LSM-based approach where pre-aggregations are performed with the help of disks in order to reduce memory usage. This will enable APM Server to scale better with the input size and cardinality.

A common pitfall when working with pre-aggregations is to add instance-specific global labels to APM agents. This may exhaust the aggregation limits and cause metrics to be aggregated under the overflow bucket instead of the corresponding service. Therefore, make sure to follow the best practice of only adding a limited set of global labels to a particular service.

Validation

To validate the effectiveness of the new architecture, and to ensure that the accuracy of the data is not negatively affected, we prepared a test environment where we generated 35K+ transactions per minute in a timespan of 14 days resulting in approximately 850 million documents.

We’ve tested the queries that power our service inventory, the service overview, and the transaction details using different time ranges (1d, 7d, 14d). Across the board, we’ve seen orders of magnitude improvements. Particularly, queries across larger time ranges that benefit from using the coarse-grained metrics in addition to the pre-aggregated service metrics saw incredible reductions of the response time.

We’ve also validated that there’s no loss in accuracy when using the more coarse-grained metrics for larger time ranges.

Every environment will behave a bit differently, but we’re confident that the impressive improvements in response time will translate well to setups of even bigger scale.

Planned improvements

As mentioned in the FAQs section, the number of time series for transaction metrics can grow quickly, as it is the product of multiple dimensions. For example, given a service that runs on 100 hosts and has 100 transaction names that each have 4 transaction results, APM Server needs to track 40,000 (100 x 100 x 4) different time series for that service. This would even exceed the maximum per-service limit of 32,000 for APM Servers with 64GB of main memory.

As a result, the UI will show an entry for “Remaining Transactions” in the Service overview page. This tracks the transaction metrics for a service once it hits the limit. As a result, you may not see all transaction names of your service. It may also be that all distinct transaction names are listed, but that the transaction metrics for some of the instances of that service are combined in the “Remaining Transactions” category.

We’re currently considering restructuring the dimensions for the metrics to avoid that the combination of the dimensions for transaction name and service instance-specific dimensions (such as the host name) lead to an explosion of time series. Stay tuned for more details.

Conclusion

The architectural improvements we’ve delivered in the past releases provide a step-function in terms of the scalability and responsiveness of our UI. Instead of having to aggregate massive amounts of data on-the-fly as users are navigating through the user interface, we pre-aggregate the results for the most common queries as data is coming in. This ensures we have the answers ready before users have even asked their most frequently asked questions, while still being able to answer ad-hoc questions.

We are excited to continue supporting our community members as they push boundaries on their growth journey, providing them with a powerful and mature platform that can effortlessly handle the demands of the largest workloads. Elastic is committed to its mission to enable everyone to find the answers that matter. From all data. In real time. At scale.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

How Prometheus Remote Write Ingestion Works in Elasticsearch

Tue, 14 Apr 2026 00:00:00 GMT

Elasticsearch recently added native support for the Prometheus Remote Write protocol. You can point Prometheus (or Grafana Alloy) at an Elasticsearch endpoint and ship metrics without any adapter in between.

This post looks at what happens inside Elasticsearch when a Remote Write request arrives.

If you want to understand the implementation, evaluate how Elasticsearch compares to other Prometheus-compatible backends, or contribute, this is the post for you. A companion post, Ship Prometheus Metrics to Elasticsearch with Remote Write, covers the setup and configuration side.

Request lifecycle: from HTTP to indexed documents

A quick note on the Prometheus data model before we dive in: Prometheus stores all metric values as 64-bit floats and treats the metric name as just another label (__name__). The storage engine itself is agnostic of whether a value is a counter or a gauge. Keep this in mind as we walk through how Elasticsearch maps these concepts.

Here is the full path of a Remote Write request through Elasticsearch:

HTTP layer — The endpoint receives a compressed protobuf payload, checks indexing pressure, decompresses with Snappy, and parses the protobuf WriteRequest.
Document construction — Each sample in each time series becomes an Elasticsearch document with @timestamp, labels.*, and metrics.* fields.
Bulk indexing — All documents from a single request are written to the target data stream via a single bulk call.

The sections below walk through each stage in detail.

HTTP layer

The endpoint accepts application/x-protobuf POST requests. The incoming request body is tracked against the same indexing pressure limits that protect the bulk indexing API. If the cluster is already under heavy indexing load, the request gets rejected with a 429 before any parsing happens.

Prometheus compresses Remote Write payloads with Snappy. Elasticsearch decompresses the body in a streaming fashion without materializing it into a single contiguous allocation, and validates the declared uncompressed size against a configurable maximum to guard against decompression bombs.

The decompressed body is then deserialized as a protobuf WriteRequest. Each WriteRequest contains a list of TimeSeries entries, and each TimeSeries contains a set of labels (key-value pairs) and a list of samples (timestamp + float64 value).

Document construction

For each sample in each time series, Elasticsearch builds an index request. Here is what a single document looks like:

{
  "@timestamp": "2026-04-01T12:00:00.000Z",
  "data_stream": {
    "type": "metrics",
    "dataset": "generic.prometheus",
    "namespace": "default"
  },
  "labels": {
    "__name__": "http_requests_total",
    "job": "prometheus",
    "instance": "localhost:9090",
    "method": "GET",
    "status": "200"
  },
  "metrics": {
    "http_requests_total": 1027.0
  }
}

All labels from the Prometheus time series (including __name__) end up in the labels.* fields. The metric value goes into metrics., where is the value of the __name__ label.

Time series without a __name__ label are dropped entirely, and the samples are counted as failures. Non-finite values (NaN, Infinity, negative Infinity) are silently skipped. This includes Prometheus staleness markers, which use a special NaN bit pattern (0x7ff0000000000002) to signal that a series has disappeared.

One sample, one document

You might wonder whether storing each individual sample as its own document creates significant storage overhead, especially for labels. A common pattern to reduce that overhead was to group all metrics sharing the same labels and timestamp into a single document.

With recent TSDB improvements, that optimization is no longer necessary. Elasticsearch has trimmed the per-document storage overhead to the point where there is negligible difference between packing many metrics in a single document and writing each sample separately. A dedicated post covering these TSDB storage improvements in detail is coming soon.

Bulk indexing

All documents from a single Remote Write request are sent to Elasticsearch via a single bulk request. Each document targets the data stream metrics-{dataset}.prometheus-{namespace} and is indexed as an append-only create operation.

Metric type inference

Remote Write v1 does not reliably transmit metric types alongside samples. Prometheus sends metadata (type, help text, unit) in separate requests roughly once per minute, and those requests may land on a different node than the samples. Buffering samples until metadata arrives is not practical in a distributed system, so Elasticsearch infers the type from naming conventions instead.

Metric names ending in _total, _sum, _count, or _bucket are mapped as counters. Everything else defaults to gauge. This is a well-established convention that other Prometheus-compatible backends use as well.

http_requests_total             → counter
request_duration_seconds_sum    → counter
request_duration_seconds_count  → counter
request_duration_seconds_bucket → counter
process_resident_memory_bytes   → gauge
go_goroutines                   → gauge

The heuristic can be wrong. A metric like temperature_total (if someone named a gauge that way) would be misclassified as a counter. The main consequence today is that some ES|QL functions like rate() require the metric type to be a counter and will reject a misclassified gauge. For PromQL, we plan to lift this restriction so that rate() works regardless of the declared type, which will make incorrect inference less consequential.

You can override the inference by creating a metrics-prometheus@custom component template with custom dynamic templates. For example, to treat all *_counter fields as counters:

PUT /_component_template/metrics-prometheus@custom
{
  "template": {
    "mappings": {
      "dynamic_templates": [
        {
          "counter": {
            "path_match": "metrics.*_counter",
            "mapping": {
              "type": "double",
              "time_series_metric": "counter"
            }
          }
        }
      ]
    }
  }
}

Custom dynamic templates are merged with the built-in ones, so the default naming-convention rules still apply for metrics you don't explicitly override.

The index template

Elasticsearch installs a built-in index template that matches metrics-*.prometheus-*. This template is what makes field type inference work without manual mapping configuration.

TSDS mode is enabled, which gives you time-based partitioning, optimized storage, deduplication, and the ability to downsample data as it ages.

Passthrough object fields are used for both the labels and metrics namespaces. This serves three purposes:

Namespace isolation: Labels and metrics live in separate object namespaces (labels.* and metrics.*), so a label named status and a metric named status cannot conflict with each other.
Dimension identification: The labels passthrough object is configured with time_series_dimension: true, which means every field under labels.* is automatically treated as a TSDS dimension. When Prometheus sends a time series with a label you have never seen before, it becomes a dimension without any explicit field mapping.
Transparent queries: You don't need to write the labels. or metrics. prefix in ES|QL or PromQL. A query can reference job instead of labels.job, or http_requests_total instead of metrics.http_requests_total. The passthrough mapping handles the resolution.

Dynamic inference for metrics applies the naming-convention heuristics described above. When a new metric name appears for the first time, its field mapping is created automatically under metrics.* with the correct time_series_metric annotation.

Failure store is enabled. Documents that fail indexing (for example, due to a mapping conflict where the same metric name appears with incompatible types) are routed to a separate failure store instead of being dropped silently.

Data stream routing

The three URL patterns map directly to data stream names:

URL pattern	Data stream
`/_prometheus/api/v1/write`	`metrics-generic.prometheus-default`
`/_prometheus/metrics/{dataset}/api/v1/write`	`metrics-{dataset}.prometheus-default`
`/_prometheus/metrics/{dataset}/{namespace}/api/v1/write`	`metrics-{dataset}.prometheus-{namespace}`

This lets you separate metrics from different Prometheus instances or environments into different data streams. That separation is useful for a few reasons.

Lifecycle isolation: you can apply different retention policies per data stream. Production metrics might be kept for 90 days, while dev metrics might expire after 7 days.

Access control: you can scope API keys to specific data streams. A team's Prometheus instance writes to metrics-teamA.prometheus-prod, and their API key only has access to that stream.

Query performance: PromQL queries and Grafana dashboards can be scoped to a specific index pattern, avoiding scans of unrelated data.

Error handling and the Remote Write spec

The Remote Write spec defines two response classes: retryable (5xx, 429) and non-retryable (4xx). Prometheus uses this distinction to decide whether to retry or drop a failed request.

Elasticsearch returns 429 (Too Many Requests) if any sample in the bulk request was rejected due to indexing pressure. This signals Prometheus to back off and retry with exponential backoff.

For partial failures (some samples indexed, others rejected), the response includes a summary. It reports how many samples failed, grouped by target index and status code, along with a sample error message from each group.

Time series without a __name__ label result in a 400 error for those samples. Non-finite values (NaN, Infinity) are silently dropped: Prometheus receives a success response and will not retry.

NaN appears most commonly for summary quantiles when no observations have been recorded (for example, a p99 latency metric before any requests arrive) and for staleness markers. The practical impact of dropping these is limited today: for most queries, a missing sample behaves similarly to a NaN one, since PromQL's lookback window fills the gap with the last known value either way. The more significant gap is staleness markers, which are covered below.

What's next: Remote Write v2 and beyond

Remote Write v2 is still experimental, which is why the current implementation starts with v1. But v2 addresses several of v1's shortcomings.

Metadata alongside samples: v2 sends metric type, unit, and description with each time series in the same request. This eliminates the need for naming-convention heuristics entirely.

Native histograms: v2 supports Prometheus native histograms, which map naturally to Elasticsearch's exponential_histogram field type. Classic histograms (one counter per bucket boundary) are verbose and lose precision at query time. Native histograms are more compact and more accurate.

Dictionary encoding: v2 replaces repeated label strings with integer references, reducing payload size significantly for high-cardinality label sets.

Created timestamps: counters in v2 include a "created" timestamp that marks when the counter was initialized. This allows backends to detect counter resets more accurately than the current heuristic (value decreased since last sample).

Beyond v2, there are two other items in consideration for future enhancements.

Staleness marker support: currently, staleness markers (the special NaN that Prometheus writes when a scrape target disappears) are dropped. Supporting them would allow correct PromQL lookback behavior and avoid the 5-minute "trailing data" artifact where a disappeared series still appears in query results.

Shared metric field: the current layout creates a separate field for each metric name (metrics.http_requests_total, metrics.go_goroutines, etc.). This works, but it means the number of field mappings grows with the number of distinct metric names, which is why the field limit is set to 10,000 for Prometheus data streams. A different approach we're considering is to store the metric name only in the __name__ label and write the metric value to a single shared field. This eliminates the field explosion problem entirely and more closely matches how Prometheus stores data internally. This direction is part of the broader effort to make Elasticsearch's metrics storage more efficient and more compatible with Prometheus conventions.

Availability

The Prometheus Remote Write endpoint is available now on Elasticsearch Serverless with no additional configuration.

For self-managed clusters, check out start-local to get up and running quickly.

If you run into issues or have feedback, open an issue on the Elasticsearch repository.