Up to 3x faster stored-vector queries in Elasticsearch

Try out vector search for yourself using this self-paced hands-on learning for Search AI. You can start a free cloud trial or try Elastic on your local machine now.

Finding documents similar to a stored vector in Elasticsearch used to require two round trips: Fetch the vector with GET, and then send it back in a k-nearest neighbor (kNN) query. Elasticsearch 9.4 collapses that flow into one request with query_vector_builder.lookup, simplifying the API and improving latency by up to 3x in a two-node Google Cloud Platform (GCP) benchmark.

Why stored-vector search used to require two requests

Previously, when you wanted to find documents similar to a stored vector, you needed to:

Call GET to fetch the vector value from Elasticsearch.
Call _search referencing that vector value in Elasticsearch:
- Serialize the vector value via JSON.

This means paying serialization and network costs twice:

Serialization and deserialization of the vector for both requests.
Network latency costs in both directions.
Potential egress costs in cloud deployments.

In Python, the pattern would be:

While these two calls seem cheap, the overhead is unnecessary. Let’s make this better.

How query_vector_builder.lookup works in Elasticsearch 9.4

In Elasticsearch 9.4, we added lookup to simplify the API and eliminate unnecessary costs:

This request now grabs the dense_vector value stored in the product-vector field, in the document with ID product-123 in the seed-products index. This example is a “more like this” search, finding the nearest vectors to the one with ID product-123. You can refer to any index, effectively using lookup as a query vector store.

How much latency lookup vector search can remove

The goal is to simplify the experience and make it faster. The performance gains aren't just from eliminating the client round trip. Many Elasticsearch instances involve multiple nodes, and traffic between nodes can carry its own serialization and network costs. Elasticsearch actively biases execution toward the local node, which cuts network serialization costs on the server side, too.

To illustrate the potential performance improvements, here’s a benchmark we ran. We used a modified version of our so_vector, where instead of using the query vectors, one path did the GET and then _search pattern and the other used lookup. Running on two nodes in the same zone in GCP, the results were strong. Latency was consistently improved by almost 3x. Even when nodes are within the same data center and the same availability zone, network and serialization costs can have a real impact.

Percentile	get-then-knn (ms)	lookup-knn (ms)	Reduction	Speedup
p50	10.3796	3.14093	69.74%	3.30x
p90	25.4429	5.89807	76.82%	4.31x
p99	27.7167	8.07109	70.88%	3.43x
max (p100)	28.522	12.6497	55.65%	2.25x

This benchmark ran with 2M documents, and the latency improvement will depend on your overall search costs. Even when the speedup is smaller, lookup still removes the extra client-side request. Less code, fewer round trips.

A simpler path for stored-vector search

Sometimes small changes can have an outsized impact. While this is a simple feature, I hope it removes some unnecessary friction in your Elasticsearch usage and makes us that much more lovable.

Ce contenu vous a-t-il été utile ?

Pas utile

Plutôt utile

Très utile

Signaler un problème

Pour aller plus loin

A picture is worth 1.5x the words: What we learned benchmarking product search embeddings

Vector Database Relevance+1

16 juillet 2026

A picture is worth 1.5x the words: What we learned benchmarking product search embeddings

We benchmarked two embedding models on 5,000 real products and found that combining image and text beats either alone by up to 50%. Here's the data and the model that won.

Par: Sofia Vasileva

The disk that never woke up: what actually decided our Qdrant vector search benchmark rematch

Vector Database

13 juillet 2026

The disk that never woke up: what actually decided our Qdrant vector search benchmark rematch

On the same hardware, Elasticsearch and Qdrant land in the same range at 56 QPS. The io_uring disk scorer and memory claims turned out to be the two things that mattered least.

Par: Jim Ferenczi

How BBQ shrinks Jina v5 embeddings by 29x without losing recall in Elasticsearch

Vector Database Jina AI+1

10 juillet 2026

How BBQ shrinks Jina v5 embeddings by 29x without losing recall in Elasticsearch

A hands-on test comparing BBQ and float32 vector indices in Elasticsearch, measuring memory, disk and recall@10 across five languages.

Par: Jeffrey Rengifo

Short queries, formal documents: how HyDE improved semantic search precision by 50% in Elasticsearch

AI Vector Database

7 juillet 2026

Short queries, formal documents: how HyDE improved semantic search precision by 50% in Elasticsearch

HyDE boosts semantic search precision and recall by 50% on short queries. Here's how to implement it in Elasticsearch with the Inference API and semantic_text.

Par: Jeffrey Rengifo

A simdvec deep-dive: How Elasticsearch uses neural-net and video-codec CPU instructions for vector search

Vector Database Inside Elastic

2 juillet 2026

A simdvec deep-dive: How Elasticsearch uses neural-net and video-codec CPU instructions for vector search

Four ways Elasticsearch's vector search engine reuses neural-network, video-codec and cryptography CPU instructions for up to 6x speedups; with the math, the failed attempts and the benchmarks.

LD CH

Par: Lorenzo Dematte et Chris Hegarty