Replica management in Elasticsearch Serverless

Free yourself from operations with Elastic Cloud Serverless. Scale automatically, handle load spikes, and focus on building—start a 14-day free trial to test it out yourself!

You can follow these guides to build an AI-Powered search experience or search across business systems and software.

In a recent post, we shared a new system in Elastic Cloud Serverless that automatically adjusts the number of replicas for your indices based on search load. In this post, we’ll zoom out and look at the entire flow for determining the optimal number of replicas. We’ll also look more closely at a crucial consideration when adding replicas, cache capacity, and explain how each replica system ensures that data is entirely cacheable given the currently provisioned disk space.

Some Serverless basics

In Elasticsearch Serverless, the object store is the single source of truth. Indexing nodes hold all primary shards which index data and upload it to the object store; search nodes hold replica shards which handle all searches, reading data from the object store. So, in Serverless, indices always have at least one replica. The more search nodes, the more opportunities for additional replicas.

Both tiers are scaled independently based on the needs of the specific Serverless project. The search tier scales based on search power, a user configuration that determines the range of resources available for the cluster, including the minimum amount of disk for caching data. Specifically, boosted data: recent time-based (@timestamp) documents within the user-defined boost window, plus any non-time-based documents.

Replicas for instant failover (RfIF): one or two replicas based on search power

Back in 2024, we shipped the first replica system, Replicas for instant failover (RfIF), which decides whether to assign one or two replicas to an index in Serverless. In Serverless, the object store provides durability, but data cached directly on search nodes is what makes searches fast. Two replicas give us resiliency in case we lose a search node unexpectedly. However, more replicas are only effective if the additional copies of the data can fit in the cache.

RfIF looks at the amount of cache space guaranteed by search power. Lower-cost options can only fit one copy of the boosted data on disk. In these cases, all indices get one replica. As search power increases, we take advantage of additional cache space to provision two replicas for some indices. Finally, for the largest search power setting, there’s enough cache space for all indices to get two replicas.

Replicas for load balancing (RfLB): scaling replicas under search load

Replicas for load balancing (RfLB) was shipped in early 2026. A previous blog post explains the system in detail (a must-read if you like pizza). In short, the goal of RfLB is to add replicas to indices under high search load. More replicas means higher throughput as they allow more search nodes to participate in serving results.

Two systems, one recommendation

Both RfIF and RfLB run every five minutes and provide a recommended number of replicas for each index. RfIF recommends either one or two replicas based on search power and the amount of boosted data. RfLB recommends between one and N replicas based on search load, where N is the current number of search nodes. For each index, we take the larger of the two recommendations. These recommendations are then passed into a cache budgeting system, which may limit them based on available cache space.

Recall from the previous post that if the final recommendation increases replicas from the current state, we apply it immediately. However, if the recommendation is to decrease replicas, we simply capture the signal. Only after ~30 minutes of repeated signals to decrease replicas do we apply the new value.

Cache budgeting: ensuring replicas fit on disk

Now we’ll look a bit more closely at that last step: the cache budgeting system. If we provision too many replicas, a broad enough set of searches could cause frequent cache evictions, resulting in slow searches. To avoid this, we sometimes decrease the replica system’s recommendations to ensure all replicas will fit in the cache. RfIF already does this based on search power, but RfLB may recommend larger numbers of replicas for indices with high search load. These are exactly the recommendations we may limit here, those where RfLB exceeds RfIF.

Here’s some pseudocode of the algorithm (and explanation below):

That first line of code has a lot going on:

totalCacheBudget is the amount of the provisioned disk in the cluster allocated for caching data.
cacheUsedByCurrentReplicas is the amount of cache used by the current replica configuration.
cacheFreedByDecreases represents the amount of cache freed up by replica decreases that will occur as a result of the new recommendations. This is valuable cache space that we may want to use for more replicas!
cacheNeededForRfIFIncreases reflects the fact that we will not limit RfIF’s recommendations in this cache budgeting system. Therefore, we must account for all the additional cache RfIF’s recommendations will use.

Accounting for all of this, remainingBytes is precisely the amount of cache space we have for any additional replicas RfLB recommends over RfIF. The algorithm that follows considers all indices (allIndicesRankedByLoad is an ordered list of all indices prioritizing those with higher search load) and makes use of these remainingBytes to grant replicas. This procedure makes sure to squeeze all the replicas we can out of the bytes we have. For example, if RfLB recommends an additional four replicas, but there’s only space for two, we’ll take two!

Wrapping up

To summarize, replicas in Serverless are determined by two cooperating systems. One is focused on replication for instant failover, while the other replicates to provide better search throughput for indices under substantial search load. These systems run side by side, making a joint decision for each index and checking that the decision actually fits in the cache. The result is a Serverless offering you can rely on to adapt to changing conditions while respecting the bounds of the underlying resources to keep data cached and searches fast.

Wie hilfreich war dieser Inhalt?

Nicht hilfreich

Einigermaßen hilfreich

Sehr hilfreich

Ein Problem melden

Zugehörige Inhalte

Your Elastic agent, Google's ADK, and zero custom APIs: building “Lucky Planet” over A2A

Agentic AI Python+1

5. Juni 2026

Your Elastic agent, Google's ADK, and zero custom APIs: building “Lucky Planet” over A2A

Elastic Agent Builder's native A2A endpoint lets Google's ADK orchestrate a remote agent, with no custom REST API. Watch it work in 'Lucky Planet,' a random-exoplanet game built end-to-end.

Von: Jonathan Simon

Elasticsearch reindex now relocates across nodes automatically: zero user intervention, no lost progress

Index Data Elastic Cloud Serverless

2. Juni 2026

Elasticsearch reindex now relocates across nodes automatically: zero user intervention, no lost progress

Elasticsearch reindex now survives node shutdowns, uses Point in Time for more efficient source iteration, and ships with dedicated management APIs. Reindex-from-remote is GA in Serverless.

Von: Pete Naylor

One query, multiple Elasticsearch Serverless projects: introducing cross-project search

Elastic Cloud Serverless

18. Mai 2026

One query, multiple Elasticsearch Serverless projects: introducing cross-project search

Cross-project search in Elastic Cloud Serverless lets you query data across isolated projects in a single Elasticsearch or ES|QL request: no duplication, no network peering, and no egress costs from copying logs.

MP NH

Von: Michael Peterson und Najwa Harif

Faster cross-project search in Elasticsearch Serverless with project tags and routing

Elastic Cloud Serverless

15. Mai 2026

Faster cross-project search in Elasticsearch Serverless with project tags and routing

Scope cross-project search in Elasticsearch Serverless with project routing to skip non-matching projects entirely, or with project tag fields to filter, aggregate, and sort by tag inside the query.

SM D

Von: Stas Malyshev und Luigi Dell'Aquila

How cross-project search (CPS) works in Elasticsearch Serverless

Elastic Cloud Serverless Inside Elastic

30. April 2026

How cross-project search (CPS) works in Elasticsearch Serverless

Elastic Cloud Serverless cross-project search (CPS) treats index expressions as cross-project by default. This post explains how TransportSearchAction scopes projects, resolves index expressions, skips projects with no matches, and validates index resolution against allow_no_indices and ignore_unavailable.

MP PK

Von: Matteo Piergiovanni und Pawan Kartik

Replica management: Inside the system that keeps Elasticsearch Serverless searches fast at scale