Tune for indexing speed

Elasticsearch offers a wide range of indexing performance optimizations, which are especially useful for high-throughput ingestion workloads. This page provides practical recommendations to help you maximize indexing speed, from bulk sizing and refresh intervals to hardware and thread management.

Note

Indexing performance is also affected by your sharding and indexing strategies. Whether you’re indexing into a single index or hundreds in parallel, and how many shards each index has, can significantly influence indexing speed.

Make sure to consider also your cluster’s shard count, index layout, and overall data distribution when tuning for indexing speed. Refer to Size your shards for more details about sharing strategies and recommendations.

Use bulk requests

Bulk requests will yield much better performance than single-document index requests. In order to know the optimal size of a bulk request, you should run a benchmark on a single node with a single shard. First try to index 100 documents at once, then 200, then 400, etc. doubling the number of documents in a bulk request in every benchmark run. When the indexing speed starts to plateau then you know you reached the optimal size of a bulk request for your data. In case of tie, it is better to err in the direction of too few rather than too many documents. Beware that too large bulk requests might put the cluster under memory pressure when many of them are sent concurrently, so it is advisable to avoid going beyond a couple tens of megabytes per request even if larger requests seem to perform better.

Note

In Elastic Cloud Serverless, the minimum response time for a single bulk indexing request is 200ms.

Use multiple workers/threads to send data to Elasticsearch

A single thread sending bulk requests is unlikely to be able to max out the indexing capacity of an Elasticsearch cluster. In order to use all resources of the cluster, you should send data from multiple threads or processes. In addition to making better use of the resources of the cluster, this should help reduce the cost of each fsync.

On the other hand, sending data to a single shard from too many concurrent threads or processes can overwhelm the cluster. If the indexing load exceeds what Elasticsearch can handle, it may become a bottleneck and start rejecting requests or slowing down overall performance.

Make sure to watch for TOO_MANY_REQUESTS (429) response codes (EsRejectedExecutionException with the Java client), which is the way that Elasticsearch tells you that it cannot keep up with the current indexing rate. When it happens, you should pause indexing a bit before trying again, ideally with randomized exponential backoff.

Similarly to sizing bulk requests, only testing can tell what the optimal number of workers is. This can be tested by progressively increasing the number of workers until either I/O or CPU is saturated on the cluster.

Unset or increase the refresh interval

The operation that consists of making changes visible to search - called a refresh - is costly, and calling it often while there is ongoing indexing activity can hurt indexing speed.

By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.

This is the optimal configuration if you have no or very little search traffic (e.g. less than one search request every 5 minutes) and want to optimize for indexing speed. This behavior aims to automatically optimize bulk indexing in the default case when no searches are performed. In order to opt out of this behavior set the refresh interval explicitly.

On the other hand, if your index experiences regular search requests, this default behavior means that Elasticsearch will refresh your index every 1 second. If you can afford to increase the amount of time between when a document gets indexed and when it becomes visible, increasing the index.refresh_interval to a larger value, e.g. 30s, might help improve indexing speed.

Disable refresh interval

To maximize indexing performance during large bulk operations, you can disable refreshing by setting the refresh interval to -1. This prevents Elasticsearch from performing any refreshes during the bulk indexing process.

To disable the refresh interval, run the following request:

						PUT /my-index-000001/_settings
					{
  "index" : {
    "refresh_interval" : "-1"
  }
}
		
	

While refresh is disabled, your newly indexed documents will not be visible to search operations. Only re-enable refreshing after your bulk indexing is complete and you need the data to be searchable.

To restore the refresh interval, run the following request with your desired value:

						PUT /my-index-000001/_settings
					{
  "index" : {
    "refresh_interval" : "5s"
  }
}
		
	

For Elastic Cloud Serverless deployments, refresh_interval must be either -1, or equal to or greater than 5s

When bulk indexing is complete, consider running a force merge to optimize search performance. Force merging is not available on Elastic Cloud Serverless.

				POST /my-index-000001/_forcemerge?max_num_segments=5

Warning

Force merge is an expensive operation.

Disable replicas for initial loads

If you have a large amount of data that you want to load all at once into Elasticsearch, it may be beneficial to set index.number_of_replicas to 0 in order to speed up indexing. Having no replicas means that losing a single node may incur data loss, so it is important that the data lives elsewhere so that this initial load can be retried in case of an issue. Once the initial load is finished, you can set index.number_of_replicas back to its original value.

If index.refresh_interval is configured in the index settings, it may further help to unset it during this initial load and setting it back to its original value once the initial load is finished.

Disable swapping

You should make sure that the operating system is not swapping out the java process by disabling swapping.

Give memory to the filesystem cache

The filesystem cache is used to buffer I/O operations and plays a critical role in Elasticsearch performance. You should make sure to give at least half of the system's memory to the filesystem cache.

By default, Elasticsearch automatically sets its JVM heap size to follow this best practice. However, in self-managed or Elastic Cloud on Kubernetes deployments, you have the flexibility to allocate even more memory to the filesystem cache.

While the filesystem cache primarily benefits search workloads, it can also improve indexing speed in certain scenarios—especially when indexing into many shards or performing frequent segment merges that involve reading existing data.

Note

On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by Elasticsearch or other processes.

Use auto-generated ids

When indexing a document that has an explicit id, Elasticsearch needs to check whether a document with the same id already exists within the same shard, which is a costly operation and gets even more costly as the index grows. By using auto-generated ids, Elasticsearch can skip this check, which makes indexing faster.

Use faster hardware

If indexing is I/O-bound, consider increasing the size of the filesystem cache (see above) or using faster storage. Elasticsearch generally creates individual files with sequential writes. However, indexing involves writing multiple files concurrently, and a mix of random and sequential reads too, so SSD drives tend to perform better than spinning disks.

Stripe your index across multiple SSDs by configuring a RAID 0 array. Remember that it will increase the risk of failure since the failure of any one SSD destroys the index. However this is typically the right tradeoff to make: optimize single shards for maximum performance, and then add replicas across different nodes so there’s redundancy for any node failures. You can also use snapshot and restore to backup the index for further insurance.

Note

In Elastic Cloud Hosted and Elastic Cloud Enterprise, you can choose the underlying hardware by selecting different hardware profiles or deployment templates. Refer to ECH > Manage hardware profiles and ECE > Manage deployment templates for more details.

Local vs. remote storage

Elasticsearch clusters using directly-attached (local) storage generally perform better than those using remote storage. Direct storage typically provides lower latency for I/O operations, which is more critical for most Elasticsearch workloads than the high throughput that remote storage can often achieve.

Some remote storage performs very poorly, especially under the kind of load that Elasticsearch imposes. However, on certain workloads and with careful tuning, it is sometimes possible to achieve acceptable performance using remote storage too. Before committing to a particular storage architecture, benchmark your system with a realistic workload to determine whether it will meet your performance goals. If you cannot achieve the performance you expect, work with the vendor of your storage system to identify suitable tuning parameter values.

Note

For Elastic Cloud on Kubernetes deployments, refer to the ECK storage recommendations for a complete overview of storage options in Kubernetes, along with their implications and best practices. In Kubernetes, remote storage solutions are commonly used and well-supported.

Indexing buffer size

If your node is doing only heavy indexing, be sure indices.memory.index_buffer_size is large enough to give at most 512 MB indexing buffer per shard doing heavy indexing (beyond that indexing performance does not typically improve). Elasticsearch takes that setting (a percentage of the java heap or an absolute byte-size), and uses it as a shared buffer across all active shards. Very active shards will naturally use this buffer more than shards that are performing lightweight indexing.

The default is 10% which is often plenty: for example, if you give the JVM 10GB of memory, it will give 1GB to the index buffer, which is enough to host two shards that are heavily indexing.

Use cross-cluster replication to prevent searching from stealing resources from indexing

Within a single cluster, indexing and searching can compete for resources. By setting up two clusters, configuring cross-cluster replication to replicate data from one cluster to the other one, and routing all searches to the cluster that has the follower indices, search activity will no longer steal resources from indexing on the cluster that hosts the leader indices.

Avoid hot spotting

Hot Spotting can occur when node resources, shards, or requests are not evenly distributed. Elasticsearch maintains cluster state by syncing it across nodes, so continually hot spotted nodes can cause overall cluster performance degradation.

Additional optimizations

Many of the strategies outlined in Tune for disk usage can also help improve indexing speed.