Up to 3x faster stored-vector queries in Elasticsearch

Elasticsearch 9.4 provides a simpler way to search with vectors stored in an Elasticsearch index, with up to 3x lower latency.

From vector search to powerful REST APIs, Elasticsearch offers developers the most extensive search toolkit. Dive into our sample notebooks in the Elasticsearch Labs repo to try something new. You can also start your free trial or run Elasticsearch locally today.

Finding documents similar to a stored vector in Elasticsearch used to require two round trips: Fetch the vector with GET, and then send it back in a k-nearest neighbor (kNN) query. Elasticsearch 9.4 collapses that flow into one request with query_vector_builder.lookup, simplifying the API and improving latency by up to 3x in a two-node Google Cloud Platform (GCP) benchmark.

Why stored-vector search used to require two requests

Previously, when you wanted to find documents similar to a stored vector, you needed to:

  • Call GET to fetch the vector value from Elasticsearch.
  • Call _search referencing that vector value in Elasticsearch:
    • Serialize the vector value via JSON.

This means paying serialization and network costs twice:

  • Serialization and deserialization of the vector for both requests.
  • Network latency costs in both directions.
  • Potential egress costs in cloud deployments.

In Python, the pattern would be:

While these two calls seem cheap, the overhead is unnecessary. Let’s make this better.

How query_vector_builder.lookup works in Elasticsearch 9.4

In Elasticsearch 9.4, we added lookup to simplify the API and eliminate unnecessary costs:

This request now grabs the dense_vector value stored in the product-vector field, in the document with ID product-123 in the seed-products index. This example is a “more like this” search, finding the nearest vectors to the one with ID product-123. You can refer to any index, effectively using lookup as a query vector store.

How much latency lookup vector search can remove

The goal is to simplify the experience and make it faster. The performance gains aren't just from eliminating the client round trip. Many Elasticsearch instances involve multiple nodes, and traffic between nodes can carry its own serialization and network costs. Elasticsearch actively biases execution toward the local node, which cuts network serialization costs on the server side, too.

To illustrate the potential performance improvements, here’s a benchmark we ran. We used a modified version of our so_vector, where instead of using the query vectors, one path did the GET and then _search pattern and the other used lookup. Running on two nodes in the same zone in GCP, the results were strong. Latency was consistently improved by almost 3x. Even when nodes are within the same data center and the same availability zone, network and serialization costs can have a real impact.

Percentileget-then-knn (ms)lookup-knn (ms)ReductionSpeedup
p5010.37963.1409369.74%3.30x
p9025.44295.8980776.82%4.31x
p9927.71678.0710970.88%3.43x
max (p100)28.52212.649755.65%2.25x

This benchmark ran with 2M documents, and the latency improvement will depend on your overall search costs. Even when the speedup is smaller, lookup still removes the extra client-side request. Less code, fewer round trips.

Sometimes small changes can have an outsized impact. While this is a simple feature, I hope it removes some unnecessary friction in your Elasticsearch usage and makes us that much more lovable.

이 콘텐츠가 얼마나 도움이 되었습니까?

도움이 되지 않음

어느 정도 도움이 됩니다

매우 도움이 됨

관련 콘텐츠

최첨단 검색 환경을 구축할 준비가 되셨나요?

충분히 고급화된 검색은 한 사람의 노력만으로는 달성할 수 없습니다. Elasticsearch는 여러분과 마찬가지로 검색에 대한 열정을 가진 데이터 과학자, ML 운영팀, 엔지니어 등 많은 사람들이 지원합니다. 서로 연결하고 협력하여 원하는 결과를 얻을 수 있는 마법 같은 검색 환경을 구축해 보세요.

직접 사용해 보세요