Simplifying kNN search

In this blog post we'll take a deep dive into our efforts on making the onboarding experience to kNN search just a bit easier!

Vector search

Vector search has been available for quite some time on Elasticsearch through the use of a new dedicated knn search type, while we also introduced kNN as a query in the 8.12.0 release (more on this amazing blog post we recently published!).

While the execution flow and the applications of each approach have some differences, the syntax to do basic kNN retrieval is pretty much similar. So, a typical knn search request looks like this:

GET products/_search
{
    "knn": {
     "field": "my_vector",
     "query_vector": [1, 2, 3],
     "k": 5,
     "num_candidates": 10
   }
}

The first couple of params are pretty much intuitive: we specify where the data is stored (field) and what we want to compare it against (query_vector).

The k and num_candidates params on the other hand are a bit more obscure and require some understanding in order to fine-tune them. They are specific to the algorithm & data structure that we use, i.e. HNSW, and exist mainly to control the amount of graph exploration we want to do.

Elasticsearch docs are a great resource for all things around search, so taking a look at the knn section we have:

k: Number of nearest neighbors to return as top hits. This value must be less than num_candidates.

num_candidates -> The number of nearest neighbor candidates to consider per shard. Needs to be greater than k, or size if k is omitted, and cannot exceed 10,000. Elasticsearch collects num_candidates results from each shard, then merges them to find the top k results. Increasing num_candidates tends to improve the accuracy of the final k results.

However, when you're first met with something like this, it really isn't that obvious what these values should be, and properly configuring them could turn out to be a challenge. The larger these values are, the more vectors we'll get to explore, but this will come with a performance cost. Once again we're met with the ever present tradeoff of accuracy vs performance.

To make kNN search just a bit easier and more intuitive to use, we decided to make these parameters optional, so that you'd only need to provide where and what you want to search, and you could tune them if you only really need to. While seemingly a rather small change, it makes things much more clear! So, the above query could now simply be rewritten as:

GET products/_search
{
    "knn": {
     "field": "my_vector",
     "query_vector": [1, 2, 3]
   }
}

Making `k` and `num_candidates` optional

So we want to make k and num_candidates optional. That's great! How should we set the default values though?

There are two options at this point. Pick something that looks like a good option, ship it, and hope for the best, or do the hard work and make extensive evaluations to let data drive our decisions. Here at Elastic, we love such challenges, and want to make sure that any decision taken is justified and for good reason!

Just above, we stated that k for kNN search is the number of results we get back from each shard, so one obvious default here would be to just use size. So, each shard would return size results, and we would merge & sort them to find the global top size results. This also works really nice with the kNN query where we don't have a k param at all, but instead we operate based on the request's size (remember that knn query behaves just like any other query such as term, prefix, etc). So, size seems like a sensible default that would cover most of the use cases (or at least be good enough during the onboarding experience!).

num_candidates on the other hand is something that is much more different. This parameter is specific to the HNSW algorithm and controls the size of the nearest neighbor queue we'll consider (for the curious: this is equivalent to the ef parameter in the original paper )

There are multiple approaches we could take here:

we could account for the size of each graph and come up with a function that would calculate the appropriate num_candidates for N indexed vectors
we could take a look at the underlying data distribution and try to guesstimate the needed exploration (maybe by also considering HNSW's entry points)
we could assume that num_candidates is not directly related to the indexed data, but rather to the search request, and ensure that we'll do the needed exploration to provide good enough results.

As a start and to keep things simple, we looked into setting a num_candidates value relative to k (or size). So, the more results you actually want to retrieve, the higher the exploration we'll perform on each graph to make sure that we escape from local minima. The candidates that we'll focus mostly on are the following:

num_candidates = k
num_candidates = 1.5 * k
num_candidates = 2 * k
num_candidates = 3 * k
num_candidates = 4 * k
num_candidates = Math.max(100, k)

It's worth noting here that more alternatives were initially examined but higher values provided little benefit, so for the rest of the blog we'll focus mostly on just the above.

Having a set of num_candidates candidates (no pun intended!) we now focus on the k parameter. We opted to account for both standard searches as well as very large k values (to see the actual impact of the exploration we do). So, the values we decided to focus more upon were:

k = 10 (to account for the requests where nosize is specified)
k = 20
k = 50
k = 100
k = 500
k = 1000

Data

As there is no one-size-fits-all solution, we wanted to test with different datasets that had varying attributes. So, different number of total vectors, dimensions, as well as generated from different models, so different data distributions.

At the same time we have rally available, which is an awesome tool for benchmarking Elasticsearch (https://github.com/elastic/rally), and supports running a bunch of queries & extracting metrics for a number of vector datasets. Here at Elastic, we make extensive use of rally and rely on it for all our nightly benchmarks, which you can check here!

Running a benchmark on rally is as simple as running the following:

pip3 install esrally && esrally race --track=dense-vector

To this end, we slightly modified the tracks (i.e. rally's test scenarios) to include additional metric configurations, added some new ones, and ended up with the following set of tracks:

dense-vector (2M docs, 96dims): https://github.com/elastic/rally-tracks/tree/master/dense_vector
so-vector (2M docs, 768 dims): https://github.com/elastic/rally-tracks/tree/master/so_vector
cohere-vector (3M docs, 768 dims): https://github.com/elastic/rally-tracks/tree/master/cohere_vector
openai-vector (2.5M docs, 1536dims): https://github.com/elastic/rally-tracks/tree/master/openai_vector
Glove 200d (1.2M, 200dims) new track created based on https://github.com/erikbern/ann-benchmarks repo

It's also worth noting that for the first couple of datasets, we would also like to account for both having one vs multiple segments so we included 2 variants of each,

one where we execute force_merge and have a single segment, and
one where we rely to the underlying MergeScheduler to do its magic and end up with how many segments it sees fit.

Metrics

For each of the above tracks, we calculated the standard recall and precision metrics, latency, as well as the actual exploration that we did on the graph by reporting the nodes that we visited. The first couple of metrics are evaluated against the true nearest neighbors, as in our scenario, this is the gold standard dataset(remember we're evaluating how good the approximate search is, and not the quality of the vectors themselves). The nodes_visited attribute has been recently added to knn's profile output (https://github.com/elastic/elasticsearch/pull/102032) so, with some minor changes to the track definitions in order to extract all needed metrics, we should be good to go!

Getting our hands dirty

Now that we know what we want to test, which datasets to use, and how to evaluate the results, it's time to actually run the benchmarks!

To have a standardized environment, for each test we used a clean n2-standard-8 (8 vCPU, 4 core, 32 GB memory) cloud node. The Elasticsearch configuration, along with necessary mappings and all else needed, is configured and deployed through rally, so it's consistent for all similar tests.

Each of the dataset above was executed multiple times, gathering all metrics available for all candidate sets defined, making sure that no results were one-offs.

Results

The recall - latency graphs for each of the specified datasets & param combinations can be found below (higher and to the left is better, latency is measured in ms):


Dense Vector	Dense Vector Multiple Segments	SO Vector	SO Vector Multiple Segments

Glove Vector	Cohere Vector	OpenAI Vector

Narrowing down to the dense_vector and openai_vector tracks for the absolute values of latency@50th percentile and recall we have:


Dense Vector latency@50	Dense Vector recall	OpenAI latency@50	OpenAI Recall

Similarly, the 99th percentile of the HNSW graph nodes visited for each scenario can be found below (smaller is better):


Dense Vector	Dense Vector Multiple Segments	SO Vector	SO Vector Multiple Segments

Glove Vector	Cohere Vector	OpenAI Vector

Less is more*

^{*well not in all cases, but hey :)}

Taking a look at the results, two things stand out:

Having one vs many segments clearly affects recall & latency metrics in an inversely proportional manner. Having less segments decreases latency (as we have fewer graphs to traverse, i.e. searches to run) which is great, but it also affects recall in the opposite manner, as it is quite possible that some good candidates will be left out (due to a fewer num_candidates lists).
Even with little exploration we can achieve good enough recall in almost all cases, which is great!

We're constantly aiming at improving multi-segment search (a great example can be found in this post) so we expect that this tradeoff will be less of an issue going forward (the numbers reported here do not include these improvements).

Taking all things into account, the two main options we discussed were the following:

num_candidates = 1.5 * k - this achieves good enough recall in almost all cases, with very good latency scores.
num_candidates = Math.max(100, k) - this achieves slightly better recall especially for lower k values, with the tradeoff though of increased graph exploration and latency.

After careful consideration and (lengthy!) discussions, we opted to go with the former as the default, i.e. set num_candidates = 1.5 * k. We had to do far less exploration and recall was greater than 90% in almost all the cases we tested (with some significantly higher reported numbers as well), which should yield a good enough onboarding experience.

Conclusion

How we handle kNN search at Elastic is always evolving and we're constantly introducing new features and improvements, so these parameters and this evaluation overall could very well be outdated pretty soon! We're always on the lookout and whenever that happens we'll make sure to follow up and adjust our configurations appropriately!

One important thing to remember is that these values only act as sensible defaults for simplifying the onboarding experience and for very generic use cases. One can easily experiment on their own datasets and adjust accordingly based on their needs (e.g. recall might be much more important than latency in some scenarios).

Happy tuning 😃

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!

Vector search

Making `k` and `num_candidates` optional

Data

Metrics

Getting our hands dirty

Results

Less is more*

Conclusion

Red Hat extends collaboration with Elasticsearch vector database for Red Hat OpenShift AI

Making Elasticsearch and Lucene the best vector database: up to 8x faster and 32x efficient

Elastic Cloud adds Elasticsearch Vector Database optimized instance to Google Cloud

Elasticsearch open inference API adds support for Cohere Embeddings

Elasticsearch open Inference API adds support for Cohere’s Rerank 3 model

Simplifying kNN search

Vector search

Making k and num_candidates optional

Data

Metrics

Getting our hands dirty

Results

Less is more*

Conclusion

Red Hat extends collaboration with Elasticsearch vector database for Red Hat OpenShift AI

Making Elasticsearch and Lucene the best vector database: up to 8x faster and 32x efficient

Elastic Cloud adds Elasticsearch Vector Database optimized instance to Google Cloud

Elasticsearch open inference API adds support for Cohere Embeddings

Elasticsearch open Inference API adds support for Cohere’s Rerank 3 model

Making `k` and `num_candidates` optional