Making Elastic BBQ work for every vector: Preconditioning vectors

From vector search to powerful REST APIs, Elasticsearch offers developers the most extensive search toolkit. Dive into our sample notebooks in the Elasticsearch Labs repo to try something new. You can also start your free trial or run Elasticsearch locally today.

Elasticsearch as a vector database offers comprehensive quantization techniques like Better Binary Quantization (BBQ). BBQ and other similarly modern quantization techniques compress vectors down to as little as a single bit per dimension, reducing memory use while retaining impressively accurate distance approximation. For vectors generated from deep learning models, such as Cohere models, this works really well; however, for other kinds of vectors, such as image data or histogram features, recall can be impacted heavily. Preconditioning fixes this by applying a random orthogonal rotation to your vectors before quantization, redistributing variance evenly across dimensions so every bit captures meaningful signal, in some cases improving recall by almost 75%.

Here, we’ll provide some intuition about the problem and how preconditioning solves it.

The problem

BBQ quantizes each dimension of a vector independently: Values above the mean become 1, values below it become 0. This works well when every dimension carries roughly the same amount of information. Transformer-based embeddings tend to have this property naturally such that their dimensions are learned representations that distribute variance evenly.

But there are lots of real-world vectors that aren’t like this. Consider a 784-dimension vector representing a grayscale image, like in the Fashion-MNIST dataset. Some pixels near the center of the image, where the clothing actually appears, vary a lot across the dataset. However, other pixels, such as those near the corners, are mostly one color and barely vary at all. When BBQ quantizes these vectors, the high-variance dimensions lose precision because a single bit can't capture their range, while the low-variance dimensions become useless. The resulting quantized vectors are poor approximations of the originals, and recall suffers.

Grid of sixteen grayscale images of clothing and accessories, each shown with a numeric label indicating its classification category.

Picture of a representation of Fashion-MNIST images. (credit: geeksforgeeks.org)

Precondition

To fix the problem, we want to spread the information more evenly across dimensions so that each bit captures roughly the same amount of information.

Preconditioning applies a linear transformation to every vector before quantization. The transformation is an orthogonal rotation that reshuffles how information is distributed across dimensions without changing the distances between vectors. If you want to dig into the math, take a look at this in-depth analysis on optimized scalar quantization (OSQ) with preconditioners.

Here’s a graphic to help illustrate how preconditioning can help when applying quantization. This simplified two-dimensional diagram illustrates the idea that the orthogonal rotation helps to increase the spread, or range, of information that was previously quite compressed. While this two-dimensional animation is not an exact representation of preconditioning, it gives a good intuition for what roughly happens in higher dimensions where buckets of dimensions are transformed independently and a random projection can greatly improve the distribution. Imagine that the y-axis represents one pixel of our Fashion-MNIST corners that are primarily one shade with very low variance and the x-axis represents a pixel of clothing at the center of the image with very high variance. Without preconditioning, quantizing vectors to a single representative point is not a particularly good discriminator.

Animated two‑dimensional diagram showing points moving under an orthogonal rotation, illustrating how the spread of values increases before quantization.

Let’s look at the data

Today, preconditioning is supported in DiskBBQ. Here’s a benchmark showing the impact when visiting different percentages of the total vector dataset.

Fashion-MNIST Recall (784 dimensions, 60K docs, 5x oversample, k: 10)

Vectors visited	Baseline recall	Preconditioned recall	% Improvement
0.5%	0.45	0.77	71%
3%	0.49	0.77	57%
5%	0.50	0.87	74%
10%	0.55	0.91	65%

GIST (960 dimensions, 1M docs, 5x oversample, k: 10)

Vectors visited	Baseline recall	Preconditioned recall	% Improvement
0.1%	0.49	0.69	41%
0.2%	0.70	0.77	10%
0.3%	0.73	0.85	16%
0.5%	0.78	0.88	13%

SIFT (128 dimensions, 1M documents, 5x oversample, k: 10)

Vectors visited	Baseline recall	Preconditioned recall	% Improvement
0.5%	0.48	0.60	25%
1%	0.59	0.71	20%
3%	0.71	0.87	23%
7%	0.72	0.90	25%

That’s a nice boost in recall; however, this boost comes with a cost. Applying it to all embeddings blindly is inefficient, causing ~2–4% overhead in query latencies with no improvement in recall for datasets that don’t need to be preconditioned. And upwards of 20% additional overhead at index time. For production use cases where you see initially low recall, you may want to evaluate the impact of preconditioning with your specific model and dataset.

Here’s the how

Preconditioning is available for the bbq_disk index type. Simply set precondition to true in the index_options, like so:

Take a look at the dense vector mapping docs for more details.

Conclusion

BBQ is highly effective for deep learning embeddings, but it can be less effective with embeddings that have uneven variance across dimensions, as can occur in feature-engineered vectors. Preconditioning redistributes that variance so quantization can be more effective. On some datasets, like Fashion-MNIST, we see as much as a 74% improvement in recall!

For now, we’ve made preconditioning optional. Hopefully, you feel more capable of knowing when it may be beneficial so you try it out yourself. In the future, we plan to iterate on performance and automatically detect when to apply preconditioning.

このコンテンツはどれほど役に立ちましたか？

役に立たない

やや役に立つ

非常に役に立つ

問題を報告する

Elasticsearch query logs: One coordinator-level line per query for ES|QL, DSL, SQL, and EQL

Inside Elastic Basics+1

2026年5月12日

Elasticsearch query logs: One coordinator-level line per query for ES|QL, DSL, SQL, and EQL

Easily understand query impact on cluster performance with Elasticsearch query logs. One coordinator-level line records ES|QL, DSL, SQL, and EQL per request and provides full query text, tracing, optional user context, and CCS hints

NH VC

による: Najwa Harif および Valentin Crettaz

30x faster than Prometheus: How we rebuilt Elasticsearch as a leading columnar metrics datastore

ES|QL Inside Elastic

2026年5月7日

30x faster than Prometheus: How we rebuilt Elasticsearch as a leading columnar metrics datastore

Elasticsearch now stores OTel metrics at 3.75 bytes per data point and queries them up to 30x faster than Prometheus. Here's how we rebuilt TSDS and ES|QL.

KK MG NN FB

による: Kostas Krikellas, Martijn Van Groningen, Nhat Nguyen および Felix Barnsteiner

Elasticsearch Vector DiskBBQ filter search is now 3–5x faster

Vector Database

2026年5月13日

Elasticsearch Vector DiskBBQ filter search is now 3–5x faster

Learn how Elasticsearch 9.4 makes restrictive filtered DiskBBQ vector search 3–5x faster and more stable by avoiding wasted centroid and postings-list work when selectivity is high.

による: Benjamin Trent

Stop guessing which query is burning your cluster: Query activity in Kibana

Inside Elastic Kibana

2026年4月28日

Stop guessing which query is burning your cluster: Query activity in Kibana

Pinpoint long-running Elasticsearch searches from Kibana: live tasks, origin context, and cancel when the cluster allows without living in low-level APIs.

による: Valentin Crettaz

How cross-project search (CPS) works in Elasticsearch Serverless

Elastic Cloud Serverless Inside Elastic

2026年4月30日

How cross-project search (CPS) works in Elasticsearch Serverless

Elastic Cloud Serverless cross-project search (CPS) treats index expressions as cross-project by default. This post explains how TransportSearchAction scopes projects, resolves index expressions, skips projects with no matches, and validates index resolution against allow_no_indices and ignore_unavailable.

MP PK

による: Matteo Piergiovanni および Pawan Kartik

Preconditioning Vectors: Making Elasticsearch VectorDB Better Binary Quantization work for every vector