Elasticsearch TSDS: storage reduction via sequence trimming

Test Elastic's leading-edge, out-of-the-box capabilities. Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

Elasticsearch time series indices now store metrics at 3.74 bytes per sample (down from 6.33 bytes, a 41% reduction). The savings come from trimming sequence numbers once replication no longer needs them. Combined with earlier synthetic _id optimizations, per-sample storage has dropped by roughly two-thirds overall. The index.disable_sequence_numbers setting is on by default for newly created TSDS indices in serverless and generally available in 9.4.

How sequence numbers work in Elasticsearch

Every write operation in Elasticsearch gets a sequence number (_seq_no) value. Sequence numbers are an integral part of:

Replication. The primary assigns a sequence number to each operation, and replicas use it to stay in sync and to figure out what operations they need to replay after a network blip or a restart.
Optimistic concurrency control (OCC). Clients can condition an update on if_seq_no + if_primary_term, so a write fails cleanly if the document changed in the meantime.

To coordinate replication, the primary shard tracks a global checkpoint: the highest sequence number that every in-sync replica is known to have. Operations above it may still be in flight, or already replicated but awaiting confirmation back to the primary.

For a deeper history of how sequence numbers came to be and what they replaced, see Elasticsearch sequence IDs in 6.0.

Why do sequence numbers use so much storage in Elasticsearch?

Lucene writes documents in flush order, not sequence number order. In high-throughput indexing, that reordering breaks delta-encoding compression, so _seq_no compresses poorly compared to @timestamp and other structured columns. This doesn't apply to @timestamp as the latter is used for index sorting, and sequence numbers are never used for sorting.

In a time series index, where most other columns compress efficiently thanks to structure and repetition, _seq_no ends up being one of the largest contributors to the storage footprint.

When Elasticsearch sequence numbers are no longer needed

_seq_no is only valuable during a narrow window after each write:

It's needed for replication and recovery until the global checkpoint advances past it. Once the checkpoint moves past a sequence number, every in-sync replica already has it, and a recovering replica wouldn't ask for it either.
It's needed for OCC only if a client later issues an if_seq_no update against that specific document.

For metrics workloads, that second condition almost never applies. Metrics are append-only: you ingest samples, query them, and purge them as they age out. Compare-and-swap updates against individual metric documents are rare.

Once replication is done, the _seq_no field continues occupying storage without providing further value for append-only time-series workloads.

How does Elasticsearch trim sequence numbers?

Trimming sequence numbers required two changes.

Firstly, we changed the backing data structure for sequence numbers from a searchable BKD tree to a simple column-based doc values structure with a skipper index. Range queries against doc values fields with skippers still perform well, so we avoid the overhead of constructing the heavy BKD tree without harming recovery times. This also means that it is simple to partially delete the field at merge time: we simply don't write values for the documents that don't need them any more, whereas removing values from a BKD tree requires the tree to be completely reconstructed.

Secondly, we added an index setting, index.disable_sequence_numbers, aimed at time series indices. Sequence numbers are still assigned and written at index time (replication depends on them), but they are trimmed during merges once the global checkpoint has advanced past the operations in the segments being merged. When Lucene merges segments whose maximum sequence number sits below the global checkpoint, the resulting merged segment no longer carries the _seq_no doc values column. This piggybacks on the normal merge lifecycle, so the trimming happens as a natural side effect of work Elasticsearch is already doing.

CPU savings: fewer merge operations over time

Beyond the storage savings, dropping _seq_no removes future merge work: once a segment carries no sequence number column, subsequent merges don't have to merge doc values for it. For high-throughput metrics clusters, that's a CPU win on top of the storage savings. This fits a broader pattern of merge-time pruning Elasticsearch already does.

What do you lose when you disable sequence numbers?

Dropping the _seq_no doc values column means a handful of operations that rely on it stop working. The full list is in the documentation, but the highlights are:

No optimistic concurrency control (OCC). if_seq_no and if_primary_term are rejected on index and bulk requests.
Single-document update operations are not supported. Update API calls and update actions in bulk requests are rejected.
Weaker consistency for update-by-query and delete-by-query. These run without sequence-number-based conflict detection, so concurrent modifications can be silently overwritten instead of raising a version conflict.
_seq_no is not queryable. You can't filter, sort, or aggregate on it.
random_score needs an explicit field. It can no longer fall back to _seq_no as a randomness source.
Bulk API responses return sentinel values for _seq_no and _primary_term.

None of these are meaningful restrictions for metrics, which is why the setting is scoped to those workloads rather than turned on globally. For any index where OCC, update-by-query conflict detection, or querying on _seq_no actually matters, keep the setting disabled and nothing changes.

How much storage does sequence number trimming save?

For representative time series indices, this change takes us from 6.33 bytes per sample down to 3.74 bytes per sample, a ~41% reduction in on-disk size. The graph below tracks per-sample storage on a benchmark cluster as recent storage optimizations have rolled in:

The first step down, from roughly 10.8 to 6.3 bytes per sample, comes from synthetic _ids for time series indices. . The second step, from 6.3 to 3.74, is the sequence number trimming this post is about. Together, these two changes cut per-sample storage by roughly two-thirds compared with where we started.

Because the savings scale with your retention window and your ingest rate, the absolute storage numbers get large quickly as the number of tracked time series scales. You can track more time series for longer, at a reduced storage cost and with no loss of functionality for metrics workloads.

This feature is generally available as of 9.4 and has been available in serverless environments. It is enabled by default for newly created time series indices. Existing indices aren't retroactively changed, but as your data ages out and fresh indices take over, the savings compound.

Frequently asked questions

Why do sequence numbers use so much storage in Elasticsearch time series indices?

Lucene writes documents in flush order, not sequence number order. In high-throughput indexing, that reordering breaks delta-encoding compression, so _seq_no compresses poorly compared to @timestamp and other structured columns. In a metrics index where most columns are highly repetitive, _seq_no contributes disproportionately to total on-disk size.

What is index.disable_sequence_numbers in Elasticsearch?

It is an index setting that tells Elasticsearch to drop the _seq_no doc values column from merged segments once the global checkpoint has advanced past them. Sequence numbers are still assigned and used during replication. They are trimmed only after every in-sync replica has confirmed the operations, as a side effect of normal Lucene segment merges.

How much storage does sequence number trimming save?

In representative time series benchmarks, this setting reduces per-sample storage from 6.33 bytes to 3.74 bytes, a 41% reduction. Combined with synthetic _id optimizations for TSDS, total per-sample storage drops by roughly two-thirds compared to a standard Elasticsearch index.

What do I lose by disabling sequence numbers in Elasticsearch?

Optimistic concurrency control (OCC) via if_seq_no / if_primary_term is disabled, single-document update operations are rejected, update-by-query and delete-by-query lose sequence-number-based conflict detection, and _seq_no is no longer queryable. For append-only metrics workloads these restrictions have no practical impact.

Is it safe to disable sequence numbers for metrics workloads in Elasticsearch?

Yes. Metrics pipelines are append-only (although updates and deletes are still allowed). No conditional writes, no querying on _seq_no. Replication correctness is unaffected because sequence numbers are only trimmed after the global checkpoint advances, meaning every replica already has the data before the column is dropped.

When does Elasticsearch actually trim sequence numbers?

Trimming happens during normal Lucene segment merges, not in a separate background job. A segment's sequence numbers are dropped in the merged result only if the segment's maximum sequence number is below the global checkpoint at merge time.

How does this compare to other Elasticsearch time series storage optimizations?

The two largest recent wins are synthetic _ids (10.8 to 6.3 bytes per sample) and sequence number trimming (6.3 to 3.74 bytes). Together they cut per-sample storage by roughly two-thirds. Both are on by default for new TSDS indices in serverless and are generally available as of 9.4.

Wie hilfreich war dieser Inhalt?

Nicht hilfreich

Einigermaßen hilfreich

Sehr hilfreich

Ein Problem melden

Zugehörige Inhalte

Why Elasticsearch is becoming a columnar database

Inside Elastic Analytics+1

9. Juli 2026

Why Elasticsearch is becoming a columnar database

Elasticsearch is becoming a first-class columnar database. Columnar Mode ships in 9.5, storing data once alongside the existing modes and cutting storage footprints while speeding up analytical queries.

Von: Yannis Roussos

Your compliance posture just got an upgrade: Elasticsearch now supports FIPS 140-3

Operations Inside Elastic

7. Juli 2026

Your compliance posture just got an upgrade: Elasticsearch now supports FIPS 140-3

Elastic 9.4 brings FIPS 140-3 support for Elasticsearch and Kibana to GA. Here's what changes for federal, defense and regulated deployments, and how to migrate from 140-2.

Von: Fabio Busatto

A simdvec deep-dive: How Elasticsearch uses neural-net and video-codec CPU instructions for vector search

Vector Database Inside Elastic

2. Juli 2026

A simdvec deep-dive: How Elasticsearch uses neural-net and video-codec CPU instructions for vector search

Four ways Elasticsearch's vector search engine reuses neural-network, video-codec and cryptography CPU instructions for up to 6x speedups; with the math, the failed attempts and the benchmarks.

LD CH

Von: Lorenzo Dematte und Chris Hegarty

Bringing it together: How we rebuilt Elasticsearch as a columnar metrics engine; 6.6x less storage, 160x faster queries

ES|QL Inside Elastic+1

29. Juni 2026

Bringing it together: How we rebuilt Elasticsearch as a columnar metrics engine; 6.6x less storage, 160x faster queries

Elasticsearch metrics in version 9.4 run on a fully columnar engine: 6.6x less storage, 160x faster queries, native PromQL and OTel support.

YR VC

Von: Yannis Roussos und Vinay Chandrasekhar

Elasticsearch simdvec deep-dive: Walking the memory tightrope to 2x better vector throughput

Vector Database Inside Elastic

8. Juni 2026

Elasticsearch simdvec deep-dive: Walking the memory tightrope to 2x better vector throughput

A deep dive into four optimizations (cascade unrolling, batch prefetching, dim-axis unrolling, a structural refactor) that pushed Elasticsearch simdvec to 2x vector throughput by working with the CPU, not against it.

LD FB CH

Von: Lorenzo Dematte, Florian Bernd und Chris Hegarty

How Elasticsearch cut metrics storage by 41% by dropping sequence numbers after replication