New to Elasticsearch? Join our getting started with Elasticsearch webinar. You can also start a free cloud trial or try Elastic on your machine now.
While a charge per query may have been easier to reason about from a consumption perspective, it would be a lot harder to reason about from a compute perspective. Would we charge a hybrid query that includes full-text search, vector search, learning-to-rank, and a rerank step the same as a simple terms query? Would we charge a search that has to scan a 3-terabyte index the same as a search that has to scan a 100 megabyte index? These are hard, complicated questions that we didn't have a confident answer for.
Instead, we've decided to charge end-users for the actual resources we have to allocate to fulfill your requested workloads. That means that you can feel confident we're charging you for your actual workloads.
Unfortunately, that also means understanding our pricing can be complex: to fully grasp how we get to the resources we use, you would need a complete understanding of our internal system.
VCU, ECU, and other terms
Let's start by defining a few terms that will keep coming back throughout this post.
VCU
A VCU is a Virtual Compute Unit. One VCU represents a slice of a virtual host with 1 GB of RAM and associated CPU and storage. We separate compute by the workloads they support, so we have three flavors of VCU:
- Search VCU
- Ingest VCU
- ML VCU
VCU’s are charged by the hour.
Regional pricing
We charge you based on what we are charged by the cloud providers we use. This means that we have different prices for different regions and different cloud providers. You can find a full list of prices on this page.
ECU
An ECU is an Elastic Consumption Unit, which is the unit we bill you in. The nominal value of an ECU is $1.00 USD. All of the different components of consumption are charged at a specific rate of ECUs per time unit. For example, one Gigabyte of storage might cost 0.047 ECU per month, so 100 GB of storage will cost you 4.7 ECU = $4.70 for one month. Similarly, if your search workload consumed 10 VCUs in a day and the Search VCU rate in your region is 0.09 ECU, your cost for that day would be $0.90.
Price components
Serverless Elasticsearch contains many different pricing components. For most use cases, the components you will care most about are Search, Ingest, and ML VCUs, as well as the Elastic Inference Service's token consumption.
Search VCUs
Search VCU consumption is the most complex part of pricing. At its core, a single Search VCU is a slice of a virtual host with 1GB of RAM that is conducting searches. What makes this complex is all the different components that go into determining how many VCUs we need to fulfill your workloads. For more details on how our autoscaling logic works, see our earlier blog on the topic.
Search VCU inputs
Search VCUs are allocated based on a complicated set of factors, but for our purposes, we can boil it down to roughly three inputs: the interactive dataset size, the search load on the system, and Search Power.
The interactive dataset size is the portion of your data that we need to keep cached so it is immediately available for search. For traditional search use cases, this will generally be your entire dataset. For time series use cases, it will be the portion of your dataset that fits inside the Boost Window, which is the number of days of data that needs to be available. By default, the boost window is seven days, and you can change that in your project settings.
Search load measures the amount of load being placed on the system by currently active searches. The main contributing factors are the number of searches per second, the complexity of the searches (the more that needs to be computed, the higher the load), and the size of the dataset that needs to be searched to fulfill the result. If we can get you the right number of results by scanning 10% of the dataset, then the load will be much lower than if we need to scan the full dataset.
Finally, Search Power influences the minimum and maximum number of VCUs we allocate. At the default setting, Performant, we allocate a minimum of one VCU for every GB of interactive data, and will scale up to meet your demand. At the lowest setting, On Demand, we will never scale beyond one VCU of RAM per GB of interactive data. At the highest setting, High Throughput, we will allocate a minimum of 2.5 VCUs for every 1 GB of interactive data and scale to meet demand.
In short, the larger the dataset size we need to keep cached, the more RAM (and thus VCU) we need to be able to do that. And the higher the search load, the more CPU (and thus VCU) we need to fulfill your search requests. Search Power allows you to tune to what extent we will scale up and down.
Minimum VCUs
We do not have a minimum number of VCUs we have assigned to search. If you have very little data stored and are not executing any searches, we will scale down our VCUs entirely. This happens after fifteen minutes of inactivity. Note that inactivity means actual inactivity: no user-initiated searches whatsoever. As soon as we need to serve a search of any kind, we need to allocate hardware resources to execute that search.
There is one complicating factor: if your interactive dataset is large enough (around 60GB or more), we do need to start allocating a minimum number of VCUs to keep that data available, even if no searches are being executed against that data. We try to keep that scaling minimal as long as no searches are coming in, which is why you may see a scale-up on the first search in these cases.
VCU consumption is not linear
Because our hardware is allocated in steps, consumption of VCUs does not necessarily scale linearly with workload size. Each scaling step can contain a wide range of workloads, and if your workload is at the bottom of that range, it may have a lot of room to grow before we need to jump to the next scaling step.
This can make estimating based on a non-representative workload hard: while you may be consuming 2 VCUs per hour on a small workload, that's our minimum scaling step. It's entirely possible that you could increase your workload size by a factor of 100 and still fit in that 2 VCU per hour load before we need to start increasing the amount of VCUs we allocate to serve your workload.
We know this makes estimating your cost a little harder, and we are working on ways to make that easier for you. If you need more help estimating your likely price, you can always talk to our customer team and get more personalized assistance.
Ingest VCUs
Ingest VCUs are much simpler than Search VCUs.
Ingest VCU Inputs
Ingest VCUs have essentially three inputs: the number of indices, the ingest rate, and the ingest complexity. We need to allocate a little bit of memory for every index in your system, which is why the number of indices matters. Read indices in data streams do not count for this calculation.
The faster you ingest, the more CPU we will need to process that ingestion. And the more complex your ingest requests, the more CPU we will need. Some factors that make ingest requests more expensive to execute are complicated field mappings or a lot of post-processing.
Minimum Ingest VCUs
We do not have a minimum number of VCUs we allocate to your ingest. If you do not ingest data, we do not need to allocate any VCUs to processing ingestion.
VCU consumption is not linear
As with Search VCUs, we allocate Ingest VCUs based on step functions. Each step can contain a wide range of workloads: it's entirely possible that if you have a minimal amount of ingest, you could increase your ingest rate by a factor of 100 and still fit in the first scaling step, thus not actually increasing your cost.
AI workloads
When running machine learning tasks in Serverless, we give you three options:
- You use our Elastic Inference Service (EIS) to run your inference and completion workloads. We take care of everything, and you are charged per token.
- You use traditional Elasticsearch Machine Learning capabilities to run your workloads. These use our Trained Models capabilities. We will scale up and down based on your machine learning workload requirements.
- You do it yourself, outside of our systems, and just bring your vectors or other inference results to store and search in Elasticsearch.
EIS
The pricing for EIS is quite straightforward: you get charged a rate per one million consumed tokens. Token consumption is straightforward to predict for inference workloads. For LLM-based tasks, particularly agentic ones, this can be more complex, and some experimentation and trial runs may be useful to determine how many tokens your workloads typically consume.
ML VCUs
Machine Learning VCUs work on one simple input: machine learning workloads. The more inference you require, the more VCUs we will consume. Once you stop performing inference, we will scale down. We will keep a trained model in memory for about 24 hours after you last used it so that we can be responsive, which means that the minimal amount of VCU required to keep that model available will remain up for 24 hours before scaling down entirely.
We generally recommend our customers use EIS instead of our Machine Learning nodes for inference, particularly if your usage is periodic. By switching to EIS, you will not have to wait for machine learning nodes to spin up, and we won't charge you for unused ML node time before scaling down. EIS charges on a per token basis.
Storage
We charge storage per gigabyte per month. Storage does serve as an input into other parts of our system, particularly the Search Tier Autoscaling (see Search VCU above), but the pricing for storage itself is quite straightforward.
Data Out (egress)
We charge you for the data you take out of the system.
To minimize your egress costs, we recommend a few optimizations on your queries:
- Do not return vectors in your query responses. We do this by default for indices created after October 2025. You can always return vectors in your responses explicitly if necessary.
- Return only the fields needed for your application. You can do this by using the
fieldsand_sourceparameters.
Support
We charge support as a percentage of your total ECU usage. We currently have four levels of support:
- Limited support
- Base support
- Enhanced support
- Premium support
Project settings
We have three project settings that allow you to control your project's usage.
Search power
Search power dictates the minimum and maximum search autoscaling, relative to a project's interactive dataset size. At the default setting, Performant, we allocate a minimum of one Search VCU for every GB of interactive data, and will scale up to meet your demand. At the lowest setting, On Demand, we will never scale beyond one Search VCU of RAM per GB of interactive data. At the highest setting, High Throughput, we will allocate a minimum of 2.5 VCUs for every 1 GB of interactive data and scale to meet demand.
Boost window
For time series use cases, the boost window is the number of days of data that constitutes your interactive dataset size. The interactive dataset is the portion of your data that we keep cached, and that we use determine how to scale the Search tier for your project. By default, the boost window is seven days.
Data retention
You can set the number of days of data that are retained in your project, which will affect the amount of storage we need. You can do this on a per-data stream basis in your project.
Infrastructure profiles
We currently offer two infrastructure options for Serverless Elasticsearch, referred to as “General Purpose” and “Vector Optimized”. All Serverless Elasticsearch projects created through the cloud console UI will be created using the “General Purpose” option. You may create a “Vector Optimized” by calling the API directly with the optimized_for parameter (see documentation for all options).
The difference between the two options is the allocation of resources. We allocate approximately four times more resources (aka VCUs) to the “Vector Optimized” profile, which will result in your costs being up to four times higher. This is why we recommend starting on the “General Purpose” profile and only using the “Vector Optimized” profile when your use case demands the use of uncompressed dense vectors with high dimensionality, and quantization and DiskBBQ will not serve your needs.
When Serverless Elasticsearch was envisioned years ago, we thought that vector workloads would require much more resources to remain performant. However, with innovations like semantic_text, sparse_vector models, and Better Binary Quantization (BBQ), we’ve found that many vector workloads perform well on the “General Purpose” profile at a fraction of the cost. Therefore, don’t let the “Vector Optimized” label fool you…you can get excellent price and performance for vector workloads on the “General Purpose” profile.
Monitoring costs
We recognize that keeping track of your costs, especially when you are new to Serverless Elasticsearch, is important to you. We built a few tools just for this purpose, and continue to improve them for even greater visibility.
Cloud console billing usage
The Elastic Cloud Console provides all billing details for our cloud account, across all cloud-based resources, including Serverless Elasticsearch. There, you can find a breakdown of all the price components described in this article. Powerful filters allow you to zoom in on specific time periods and resources.
To further monitor your costs, you can also configure custom budget alerts from the Budgets and notifications tab under the Billing and subscriptions page.
AutoOps monitoring
We’re bringing AutoOps to Serverless! One of the key value propositions of Serverless Elasticsearch is that we ensure everything runs smoothly, but that also means you have limited observability into the infrastructure. AutoOps for Serverless gives users visibility into what is driving usage, and, therefore, costs.
AutoOps is rolled out in new Serverless regions regularly, and we're always working to add new monitoring tools. Make sure to check out the region coverage and future planned monitoring tools.




