Higher throughput and lower latency: Elastic Cloud Serverless on AWS gets a significant performance boost

We've upgraded the AWS infrastructure for Elasticsearch Serverless to newer, faster hardware. Learn how this massive performance boost delivers faster queries, better scaling, and lower costs.

Free yourself from operations with Elastic Cloud Serverless. Scale automatically, handle load spikes, and focus on building—start a 14-day free trial to test it out yourself!

You can follow these guides to build an AI-Powered search experience or search across business systems and software.

Elastic Cloud Serverless is already the definitive solution for developers who want to build efficient search and AI applications without the operational burden of managing infrastructure. Now, we're taking the performance of your serverless projects to a whole new level.

We've completed a major infrastructure upgrade for all Elastic Cloud Serverless projects running on AWS, migrating to newer, faster hardware. This change has been rolled out to every serverless project automatically. It delivers higher throughput and lower latency for Elasticsearch, Elastic Observability, and Elastic Security serverless projects on AWS.

Key performance benefits for developers

The new AWS hardware infrastructure underpins everything you do with Elastic Cloud Serverless, translating to tangible benefits for your applications' speed and responsiveness.

Reduced query latency… increased throughput

The improved hardware dramatically enhances the speed of compute resources, which means your search queries are processed faster than ever.

  • Search and vector search: Whether you're running traditional full-text queries or using cutting-edge vector search for your generative AI and retrieval-augmented generation (RAG) applications, you'll see a marked decrease in latency. Internal benchmarking showed a 35% average decrease in search latency.
  • Faster indexing: Data ingestion rates are optimized, allowing you to index massive data volumes and complex documents with increased throughput. This is crucial for applications that require near–real-time data visibility. Internal benchmarking showed a 26% average increase in indexing throughput.

Consistent performance under load

Elastic Cloud Serverless is designed to autoscale dynamically in real time to meet demand, minimizing latency, regardless of your workload. With this hardware upgrade, that scaling is now more performant and responsive.

  • Handling spikes with ease: Whether you're facing a sudden surge in user traffic or a massive batch data ingest, the new infrastructure ensures that your search and indexing resources scale up more efficiently to maintain consistently low latency.
  • Optimized compute-storage decoupling: The serverless architecture separates compute and storage, which allows workloads to scale independently for optimal performance and cost efficiency. The faster hardware enhances the compute layer, maximizing the efficiency of this decoupled design.

Under the hood: Internal benchmarking results

To quantify the impact of our AWS infrastructure upgrade, the Elastic engineering team conducted comprehensive internal benchmarking against a range of serverless workloads. These workloads provided empirical evidence of performance improvements that you can expect across your applications, regardless of your use case.

The benchmarking approach

We focused our testing on the key metrics that directly affect the developer experience and application responsiveness: response time (that is, latency) and throughput on search and on indexing operations.

  • Workloads tested: The tests included high-concurrency search operations typical of user-facing applications, complex vector search queries, and high-volume data ingestion/indexing for observability and security use cases. In particular, our testing methodology used publicly available datasets for Rally, Elastic’s benchmarking tool.
    • wikipedia: A dataset derived from a snapshot of Wikipedia’s text contents, to measure general-purpose text search performance.
    • MSMARCO-Passage-Ranking: A dataset derived from Microsoft’s Machine Reading Comprehension (MS MARCO), to measure search performance on sparse vector fields.
    • OpenAI_Vector: A dataset derived from BEIR’s NQ and enriched with embeddings generated by OpenAI’s text-embedding-ada-002 model, to measure search performance on dense vector fields.
  • Measurement: We compared performance on the old and new infrastructure, measuring latency at the 99th percentile (P99) to capture the worst-case, tail-latency performance and operations per second. Each track was run five times for each hardware profile to ensure consistency in the results.
  • The goal: Our aim was to validate the infrastructure's ability to deliver consistently faster and more predictable performance across the board, even during periods of rapid autoscaling.

Performance data summary

The results confirm significant gains in efficiency and speed. These gains translate directly into lower response times for your users and lower operational costs as a result of the ability to complete the same amount of work with fewer compute resources.

The following tables detail the quantitative improvements. Higher values are better for throughput; lower values are better for latency.

Searching benchmark results:

BenchmarkComparisonOld infraNew infraDifferential
`wikipedia` (plain text)Search operation throughput (ops/s)7291107+52%
`wikipedia` (plain text)Search operation latency (p99, ms)5635-37%
`MSMARCO-Passage-Ranking` (sparse vectors)Search operation throughput (ops/s)2231+40%
`MSMARCO-Passage-Ranking` (sparse vectors)Search operation latency (p99, ms)10867-38%
`OpenAI_Vector` (dense vectors)Search operation throughput (ops/s)475624+31%
`OpenAI_Vector` (dense vectors)Search operation latency (p99, ms)3522-37%

Indexing benchmark results:

BenchmarkComparisonOld infraNew infraDifferential
`wikipedia` (plain text)Search operation throughput (ops/s)28453220+13%
`wikipedia` (plain text)Search operation latency (p99, ms)17691120-37%
`MSMARCO-Passage-Ranking` (sparse vectors)Search operation throughput (ops/s)70878900+26%
`MSMARCO-Passage-Ranking` (sparse vectors)Search operation latency (p99, ms)824677-18%
`OpenAI_Vector` (dense vectors)Search operation throughput (ops/s)29723187+7%
`OpenAI_Vector` (dense vectors)Search operation latency (p99, ms)294629440%

The added bonus: Cost reduction

While our focus is on delivering low-latency performance, the efficiency of the new hardware also has a direct, positive impact on costs for Elasticsearch projects.

Elasticsearch Serverless pricing is usage-based, meaning that you only pay for the ingest and search resources you consume. Because the newer, faster hardware is more efficient, your workloads will often complete tasks using fewer resources, leading to an inherent cost reduction for most projects. You get a premium performance boost without the premium price tag—the definition of optimized efficiency.

What does this mean for you, the developer?

This infrastructure upgrade is entirely managed by Elastic, so you don't need to lift a finger—no migrations and no configuration changes. The improvement is immediate and automatic across all your AWS-based serverless projects.

This upgrade empowers you to:

  • Build faster applications: Focus on feature velocity, knowing that your underlying search platform is delivering the speed your users demand.
  • Innovate with confidence: Deploy new search, observability, and security features—including complex AI capabilities, like vector search and relevance ranking—with the assurance that the platform can handle the load at peak performance.
  • Simplify your stack: Use a fully managed service that handles infrastructure management, capacity planning, and scaling, so you can focus on your code and data.

Zugehörige Inhalte

Sind Sie bereit, hochmoderne Sucherlebnisse zu schaffen?

Eine ausreichend fortgeschrittene Suche kann nicht durch die Bemühungen einer einzelnen Person erreicht werden. Elasticsearch wird von Datenwissenschaftlern, ML-Ops-Experten, Ingenieuren und vielen anderen unterstützt, die genauso leidenschaftlich an der Suche interessiert sind wie Sie. Lasst uns in Kontakt treten und zusammenarbeiten, um das magische Sucherlebnis zu schaffen, das Ihnen die gewünschten Ergebnisse liefert.

Probieren Sie es selbst aus