Improve Elasticsearch performance with best_compression

Test Elastic's leading-edge, out-of-the-box capabilities. Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

When tuning Elasticsearch for high-concurrency workloads, the standard approach is to maximize RAM to keep the working set of documents in memory to achieve low search latency. Consequently, best_compression is rarely considered for search workloads, as it is primarily viewed as a storage saving measure for Elastic Observability and Elastic Security use cases where storage efficiency takes priority.

In this blog, we demonstrate that when the dataset size significantly exceeds the OS page cache, best_compression improves search performance and resource efficiency by reducing the I/O bottleneck.

The setup

Our use case is a high-concurrency search application running on Elastic Cloud CPU optimized instances.

Data volume: ~500 million documents
Infrastructure: 6 Elastic Cloud (Elasticsearch service) instances (each instance: 1.76 TB storage | 60 GB RAM | 31.9 vCPU)
Memory-to-storage ratio: ~5% of the total dataset fits into RAM

The symptoms: high latency

We observed that when the number of current requests spiked around 19:00, the search latency deteriorated significantly. As shown in Figure 1 and Figure 2, while traffic peaked around 400 requests per minute per Elasticsearch instance, the average query service time degraded to over 60ms.

Requests per minute per Elasticsearch peaked — *Figure 1. Requests per minute per Elasticsearch instance peaked just after 19:00 at about 400.*

Average query service time Elasticsearch — *Figure 2. The average query service time started to spike and climbed to and stayed at >60ms.*

The CPU usage remained relatively low after the initial connections handling, indicating that compute was not the bottleneck.

Elasticsearch CPU usage — *Figure 3. After the initial jump, CPU usage remains relatively low.*

A strong correlation emerged between query volume and page faults. As requests increased, we observed a proportional rise in page faults, peaking around 400k/minute. This indicated that the active dataset could not fit in the page cache.

Number of page faults Elasticsearch performance — *Figure 4. The number of page faults was high, peaking around 400k/minute.*

Simultaneously, the JVM heap usage appeared to be normal and healthy. This ruled out garbage collection issues and confirmed the bottleneck was I/O.

Heap usage in Elasticsearch — *Figure 5. Heap usage remained flat.*

The diagnosis: I/O bound

The system was I/O bound. Elasticsearch relies on the OS page cache to serve index data from memory. When the index is too large for the cache, queries trigger expensive disk reads. While the typical solution is to scale horizontally (add nodes/RAM), we wanted to exhaust efficiency improvements on our existing resources first.

The fix

By default, Elasticsearch uses LZ4 compression for its index segments, striking a balance between speed and size. We hypothesized that switching to best_compression (which uses zstd) would reduce the size of indices. A smaller footprint allows a larger percentage of the index to fit in the page cache, trading a negligible increase in CPU (for decompression) for a reduction in disk I/O.

To enable best_compression, we reindexed the data with the index setting index.codec: best_compression. Alternatively, the same result could be achieved by closing the index, resetting the index codec to best_compression, and then performing a segment merge.

The results

The results confirmed our hypothesis: improved storage efficiency directly translated into a substantial boost in search performance with no accompanying increase in CPU utilization.

Applying best_compression reduced the index size by approximately 25%. While less than the reduction seen in repetitive log data, this 25% reduction effectively increased our page cache capacity by the same margin.

During the next load test (starting at 17:00), the traffic was even higher, peaking at 500 requests per minute per Elasticsearch node.

Load test in Elaticserach — *Figure 6. The load test started around 17:00.*

Despite the higher load, the CPU utilization was lower than in the previous run. The elevated usage in the earlier test was likely due to the overhead of excessive page fault handling and disk I/O management.

Elasticsearch CPU utilization performance improvement with best_compression — *Figure 7. CPU utilization was lower than the previous run.*

Crucially, page faults dropped significantly. Even at higher throughput, faults hovered around <200k per minute, compared to >300k in the baseline test.

Number of page faults Elasticsearch performance improvement with best_compression — *Figure 8. Number of page faults saw a significant improvement.*

Although the page fault results were still less than optimal, query service time was cut by about 50%, hovering below 30ms even under heavier load.

Average query service time performance improvement in Elasticsearch with best_compression — *Figure 9. Average query service time was <30ms.*

The conclusion: best_compression for search

For search use cases where data volume exceeds available physical memory, best_compression is a potent performance-tuning lever.

The conventional solution to cache misses is to scale out to increase RAM. However, by reducing the index footprint, we achieved the same goal: maximizing the document count in the page cache. Our next step is to explore index sorting to further optimize storage and squeeze even more performance out of our existing resources.

Report an issue

Related Content

Adaptive early termination for HNSW in Elasticsearch

Vector Database Inside Elastic

March 2, 2026

Adaptive early termination for HNSW in Elasticsearch

Introducing a new adaptive early termination strategy for HNSW in Elasticsearch.

By: Tommaso Teofili

Building automation with Elastic Workflows

Inside Elastic

February 3, 2026

Building automation with Elastic Workflows

A practical introduction to workflow automation in Elastic. Learn what workflows look like, how they work, and how to build one.

TE TB SG

By: Tinsae Erkailo, Tal Borenstein and Shahar Glazner

Skip MLOps: Managed cloud inference for self-managed Elasticsearch with EIS via Cloud Connect

Inside Elastic

February 3, 2026

Skip MLOps: Managed cloud inference for self-managed Elasticsearch with EIS via Cloud Connect

Introducing Elastic Inference Service (EIS) via Cloud Connect, which provides a hybrid architecture for self-managed Elasticsearch users and removes MLOps and CPU hardware barriers for semantic search and RAG.

JC MR

By: Jordi Mon Companys and Matt Ryan

Speed up vector ingestion using Base64-encoded strings

Vector Database Inside Elastic

February 4, 2026

Speed up vector ingestion using Base64-encoded strings

Introducing Base64-encoded strings to speed up vector ingestion in Elasticsearch.

JF BT IS

By: Jim Ferenczi, Benjamin Trent and Ignacio Vera Sequeiros

Influencing BM25 ranking with multiplicative boosting in Elasticsearch

Inside Elastic

December 22, 2025

Influencing BM25 ranking with multiplicative boosting in Elasticsearch

Learn why additive boosting methods can destabilize BM25 rankings and how multiplicative scoring provides controlled, scalable ranking influence in Elasticsearch.

By: Alexander Marquardt

Improve search performance with `best_compression`

The setup

The symptoms: high latency

The diagnosis: I/O bound

The fix

The results

The conclusion: best_compression for search

Related Content

Adaptive early termination for HNSW in Elasticsearch

Building automation with Elastic Workflows

Skip MLOps: Managed cloud inference for self-managed Elasticsearch with EIS via Cloud Connect

Speed up vector ingestion using Base64-encoded strings

Influencing BM25 ranking with multiplicative boosting in Elasticsearch

Ready to build state of the art search experiences?