Up to 3x faster stored-vector queries in Elasticsearch

Elasticsearch 9.4 provides a simpler way to search with vectors stored in an Elasticsearch index, with up to 3x lower latency.

From vector search to powerful REST APIs, Elasticsearch offers developers the most extensive search toolkit. Dive into our sample notebooks in the Elasticsearch Labs repo to try something new. You can also start your free trial or run Elasticsearch locally today.

Finding documents similar to a stored vector in Elasticsearch used to require two round trips: Fetch the vector with GET, and then send it back in a k-nearest neighbor (kNN) query. Elasticsearch 9.4 collapses that flow into one request with query_vector_builder.lookup, simplifying the API and improving latency by up to 3x in a two-node Google Cloud Platform (GCP) benchmark.

Why stored-vector search used to require two requests

Previously, when you wanted to find documents similar to a stored vector, you needed to:

  • Call GET to fetch the vector value from Elasticsearch.
  • Call _search referencing that vector value in Elasticsearch:
    • Serialize the vector value via JSON.

This means paying serialization and network costs twice:

  • Serialization and deserialization of the vector for both requests.
  • Network latency costs in both directions.
  • Potential egress costs in cloud deployments.

In Python, the pattern would be:

While these two calls seem cheap, the overhead is unnecessary. Let’s make this better.

How query_vector_builder.lookup works in Elasticsearch 9.4

In Elasticsearch 9.4, we added lookup to simplify the API and eliminate unnecessary costs:

This request now grabs the dense_vector value stored in the product-vector field, in the document with ID product-123 in the seed-products index. This example is a “more like this” search, finding the nearest vectors to the one with ID product-123. You can refer to any index, effectively using lookup as a query vector store.

How much latency lookup vector search can remove

The goal is to simplify the experience and make it faster. The performance gains aren't just from eliminating the client round trip. Many Elasticsearch instances involve multiple nodes, and traffic between nodes can carry its own serialization and network costs. Elasticsearch actively biases execution toward the local node, which cuts network serialization costs on the server side, too.

To illustrate the potential performance improvements, here’s a benchmark we ran. We used a modified version of our so_vector, where instead of using the query vectors, one path did the GET and then _search pattern and the other used lookup. Running on two nodes in the same zone in GCP, the results were strong. Latency was consistently improved by almost 3x. Even when nodes are within the same data center and the same availability zone, network and serialization costs can have a real impact.

Percentileget-then-knn (ms)lookup-knn (ms)ReductionSpeedup
p5010.37963.1409369.74%3.30x
p9025.44295.8980776.82%4.31x
p9927.71678.0710970.88%3.43x
max (p100)28.52212.649755.65%2.25x

This benchmark ran with 2M documents, and the latency improvement will depend on your overall search costs. Even when the speedup is smaller, lookup still removes the extra client-side request. Less code, fewer round trips.

Sometimes small changes can have an outsized impact. While this is a simple feature, I hope it removes some unnecessary friction in your Elasticsearch usage and makes us that much more lovable.

这些内容对您有多大帮助?

没有帮助

有点帮助

非常有帮助

相关内容

准备好打造最先进的搜索体验了吗?

足够先进的搜索不是一个人的努力就能实现的。Elasticsearch 由数据科学家、ML 操作员、工程师以及更多和您一样对搜索充满热情的人提供支持。让我们联系起来,共同打造神奇的搜索体验,让您获得想要的结果。

亲自试用