Less merging and faster ingestion in Elasticsearch 8.11

Discover how Elasticsearch 8.11 improved its indexing buffer, resulting in less segment merging and faster ingestion.

We made significant changes to how Elasticsearch 8.11 reclaims memory from its indexing buffer, which helps reduce the merging overhead and thus make indexing faster. Using our Logging track, we observed an 8% improvement in ingestion throughput with these changes when running with a 1GB heap.

How indexing buffer works in Elasticsearch 8.10 and earlier

When indexing data, Elasticsearch starts building new segments in memory and writes index operations into a transaction log for durability. These in-memory segments eventually get serialized to disk, either when changes need to be made visible - an operation called "refresh" in Elasticsearch, or when memory needs to be reclaimed. This blog focuses on the latter.

To manage memory of its indexing buffer, Elasticsearch keeps track of how much RAM is used across all shards on the local node. Whenever this amount of memory crosses the limit (10% of the heap size by default), it will identify the shard that uses the most memory and refresh it.

How Elasticsearch 8.11 improved its indexing buffer

Change 1: flush a single segment at a time

When changes get buffered in memory for a given shard, there is not a single pending segment. In order to be able to index concurrently, Lucene maintains a pool of pending segments. When a thread wants to index a new document, it picks a pending segment from this pool, updates it and then moves the pending segment back to the pool. If there is no free pending segment in the pool, a new one will be created. There are usually a number of pending segments in the pool which are in the order of the peak indexing concurrency.

The first change we applied was to update this logic to no longer flush all segments from a shard at once, and instead only flush the largest pending segment using Lucene's IndexWriter#flushNextBuffer() API. This helps because the size of pending segments is generally not uniform, as Lucene has a bias towards updating the largest pending segments, so this new approach helps flush fewer segments, which should also be significantly larger. And since fewer segments get merged, less merging is required to keep the number of segments under control.

Change 2: flush shards in a round-robin fashion

Managing a shared indexing buffer across many shards is a hard problem. The existing logic assumed that it would be sensible to pick the shard that uses the most memory for its indexing buffer as the next shard to reclaim memory from. After all, this is the most efficient way to buy some time until we reach the maximum amount of memory for the indexing buffer again. But on the other hand, this also penalizes shards that ingest most actively, as they would flush segments much more frequently than shards that have a modest ingestion rate. There are many moving parts here, which make it hard to get a good intuition about how these various factors play together and figure out the best strategy for picking the next shard to flush.

So we ran experiments with various approaches to picking the next shard to flush, and interestingly picking the largest shard was the worst one, significantly outperformed by picking shards at random. Actually, the only approach that slightly outperformed picking shards at random was to pick shards in a round-robin fashion. This is now the way how Elasticsearch picks the next shard to flush.

Conclusion

These two changes should help reduce the merging overhead and speed up ingestion, especially with small heaps and field types that consume significant amounts of RAM in the indexing buffer like text and match_only_text fields, or field types that are expensive to merge like dense_vector. Enjoy the speedups!

Ready to try this out on your own? Start a free trial.

Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.

Related content

Elasticsearch vs. OpenSearch: Vector Search Performance Comparison

Elasticsearch vs. OpenSearch: Vector Search Performance Comparison

Elasticsearch is out-of-the-box 2x–12x faster than OpenSearch for vector search

Bringing maximum-inner-product into Lucene

September 1, 2023

Bringing maximum-inner-product into Lucene

Explore how we brought maximum-inner-product into Lucene and the investigations undertaken to ensure its support.

Understanding Int4 scalar quantization in Lucene

April 25, 2024

Understanding Int4 scalar quantization in Lucene

This blog explains how int4 quantization works in Lucene, how it lines up, and the benefits of using int4 quantization.

Making Lucene faster with vectorization and FFI/madvise

April 17, 2024

Making Lucene faster with vectorization and FFI/madvise

Discover how modern Java features, including vectorization and FFI/madvise, are speeding up Lucene's performance.

Speeding Up Multi-graph Vector Search

March 12, 2024

Speeding Up Multi-graph Vector Search

Explore multi-graph vector search in Lucene and discover how sharing information between segment searches enhances search speed.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself