Elasticsearch supports approximate k-nearest neighbor search for efficiently finding the k nearest vectors to a query vector. Since approximate kNN search works differently from other queries, there are special considerations around its performance.
Many of these recommendations help improve search speed. With approximate kNN, the indexing algorithm runs searches under the hood to create the vector index structures. So these same recommendations also help with indexing speed.
When indexing vectors for approximate kNN search, you need to specify the
similarity function for comparing the vectors.
If you’d like to compare vectors through cosine similarity, there are two
cosine option accepts any float vector and computes the cosine
similarity. While this is convenient for testing, it’s not the most efficient
approach. Instead, we recommend using the
dot_product option to compute the
similarity. To use
dot_product, all vectors need to be normalized in advance
to have length 1. The
dot_product option is significantly faster, since it
avoids performing extra vector length computations during the search.
Ensure data nodes have enough memoryedit
Elasticsearch uses the HNSW algorithm for approximate
kNN search. HNSW is a graph-based algorithm which only works efficiently when
most vector data is held in memory. You should ensure that data nodes have at
least enough RAM to hold the vector data and index structures. To check the
size of the vector data, you can use the Analyze index disk usage API. As a
loose rule of thumb, and assuming the default HNSW options, the bytes used will
num_vectors * 4 * (num_dimensions + 12). When using the
the space required will be closer to
num_vectors * (num_dimensions + 12). Note that
the required RAM is for the filesystem cache, which is separate from the Java
The data nodes should also leave a buffer for other ways that RAM is needed. For example your index might also include text fields and numerics, which also benefit from using filesystem cache. It’s recommended to run benchmarks with your specific dataset to ensure there’s a sufficient amount of memory to give good search performance. You can find here and here some examples of datasets and configurations that we use for our nightly benchmarks.
Warm up the filesystem cacheedit
If the machine running Elasticsearch is restarted, the filesystem cache will be
empty, so it will take some time before the operating system loads hot regions
of the index into memory so that search operations are fast. You can explicitly
tell the operating system which files should be loaded into memory eagerly
depending on the file extension using the
Loading data into the filesystem cache eagerly on too many indices or too many files will make search slower if the filesystem cache is not large enough to hold all the data. Use with caution.
The following file extensions are used for the approximate kNN search: "vec" (for vector values), "vex" (for HNSW graph), "vem" (for metadata).
Reduce vector dimensionalityedit
The speed of kNN search scales linearly with the number of vector dimensions, because each similarity computation considers each element in the two vectors. Whenever possible, it’s better to use vectors with a lower dimension. Some embedding models come in different "sizes", with both lower and higher dimensional options available. You could also experiment with dimensionality reduction techniques like PCA. When experimenting with different approaches, it’s important to measure the impact on relevance to ensure the search quality is still acceptable.
Exclude vector fields from
Elasticsearch stores the original JSON document that was passed at index time in the
_source field. By default, each hit in the search
results contains the full document
_source. When the documents contain
dense_vector fields, the
_source can be quite large and
expensive to load. This could significantly slow down the speed of kNN search.
You can disable storing
dense_vector fields in the
_source through the
excludes mapping parameter. This prevents loading and
returning large vectors during search, and also cuts down on the index size.
Vectors that have been omitted from
_source can still be used in kNN search,
since it relies on separate data structures to perform the search. Before
excludes parameter, make sure to review the
downsides of omitting fields from
Reduce the number of index segmentsedit
Elasticsearch shards are composed of segments, which are internal storage elements in the index. For approximate kNN search, Elasticsearch stores the dense vector values of each segment as an HNSW graph. kNN search must check each segment, searching through one HNSW graph after another. This means kNN search can be significantly faster if there are fewer segments. By default, Elasticsearch periodically merges smaller segments into larger ones through a background merge process. If this isn’t sufficient, you can take explicit steps to reduce the number of index segments.
Force merge to one segmentedit
The force merge operation forces an index merge. If you
force merge to one segment, the kNN search only need to check a single,
all-inclusive HNSW graph. Force merging
dense_vector fields is an expensive
operation that can take significant time to complete.
We recommend only force merging a read-only index (meaning the index is no longer receiving writes). When documents are updated or deleted, the old version is not immediately removed, but instead soft-deleted and marked with a "tombstone". These soft-deleted documents are automatically cleaned up during regular segment merges. But force merge can cause very large (> 5GB) segments to be produced, which are not eligible for regular merges. So the number of soft-deleted documents can then grow rapidly, resulting in higher disk usage and worse search performance. If you regularly force merge an index receiving writes, this can also make snapshots more expensive, since the new documents can’t be backed up incrementally.
Create large segments during bulk indexingedit
A common pattern is to first perform an initial bulk upload, then make an index available for searches. Instead of force merging, you can adjust the index settings to encourage Elasticsearch to create larger initial segments:
Ensure there are no searches during the bulk upload and disable
index.refresh_intervalby setting it to
-1. This prevents refresh operations and avoids creating extra segments.
Give Elasticsearch a large indexing buffer so it can accept more documents before
flushing. By default, the
indices.memory.index_buffer_sizeis set to 10% of the heap size. With a substantial heap size like 32GB, this is often enough. To allow the full indexing buffer to be used, you should also increase the limit
Avoid heavy indexing during searchesedit
Actively indexing documents can have a negative impact on approximate kNN search performance, since indexing threads steal compute resources from search. When indexing and searching at the same time, Elasticsearch also refreshes frequently, which creates several small segments. This also hurts search performance, since approximate kNN search is slower when there are more segments.
When possible, it’s best to avoid heavy indexing during approximate kNN search. If you need to reindex all the data, perhaps because the vector embedding model changed, then it’s better to reindex the new documents into a separate index rather than update them in-place. This helps avoid the slowdown mentioned above, and prevents expensive merge operations due to frequent document updates.
Avoid page cache thrashing by using modest readahead values on Linuxedit
Search can cause a lot of randomized read I/O. When the underlying block device has a high readahead value, there may be a lot of unnecessary read I/O done, especially when files are accessed using memory mapping (see storage types).
Most Linux distributions use a sensible readahead value of
128KiB for a
single plain device, however, when using software raid, LVM or dm-crypt the
resulting block device (backing Elasticsearch path.data)
may end up having a very large readahead value (in the range of several MiB).
This usually results in severe page (filesystem) cache thrashing adversely
affecting search (or update) performance.
You can check the current value in
lsblk -o NAME,RA,MOUNTPOINT,TYPE,SIZE.
Consult the documentation of your distribution on how to alter this value
(for example with a
udev rule to persist across reboots, or via
as a transient setting). We recommend a value of
128KiB for readahead.
blockdev expects values in 512 byte sectors whereas
KiB. As an example, to temporarily set readahead to
blockdev --setra 256 /dev/nvme0n1.