Apache Lucene development has always been vibrant, but the last few months have seen an especially high number of optimizations to query evaluation. There isn't one optimization that can be singled out, it's rather a combination of many improvements around mechanical sympathy and improved algorithms.
What is especially interesting here is that these optimizations do not only benefit some very specific cases, they translate into actual speedups in Lucene's nightly benchmarks, which aim at tracking the performance of queries that are representative of the real world. Just hover on annotations to see where a speedup (or slowdown sometimes!) is coming from. By the way, special thanks to Mike McCandless for maintaining Lucene's nightly benchmarks on his own time and hardware for almost 13 years now!
Key speedup benchmarks in Lucene
Here are some speedups that nightly benchmarks observed between Lucene 9.6 (May 2023) and Lucene 9.9 (December 2023):
- AndHighHigh: 35% faster
- AndHighMed: 15% faster
- OrHighHigh: 60% faster
- OrHighMed: 38% faster
- CountAndHighHigh: 15% faster
- CountAndHighMed: 11% faster
- CountOrHighHigh: 145% faster
- CountOrHighMed: 155% faster
- TermDTSort: 24% faster
- TermTitleSort: 290% faster (not a typo!)
- TermMonthSort: 7% faster
- DayOfYearSort: 25% faster
- VectorSearch: 5% faster
Lucene optimization resources
In case you are curious about these changes, here are resources that describe some of the optimizations that we applied:
- Bringing speedups to top-k queries with many and/or high-frequency terms (annotation FK)
- More skipping with block-max MAXSCORE (annotation FU)
- Accelerating vector search with SIMD instructions
- Vector similarity computations FMA-style
Lucene 9.9 was just released and is expected to be integrated into Elasticsearch 8.12, which should get released soon. Stay tuned!
Ready to try this out on your own? Start a free trial.
Elasticsearch and Lucene offer strong vector database and search capabilities. Dive into our sample notebooks to learn more.
Related content
June 26, 2024
Elasticsearch vs. OpenSearch: Vector Search Performance Comparison
Elasticsearch is out-of-the-box 2x–12x faster than OpenSearch for vector search
September 1, 2023
Bringing maximum-inner-product into Lucene
Explore how we brought maximum-inner-product into Lucene and the investigations undertaken to ensure its support.
April 25, 2024
Understanding Int4 scalar quantization in Lucene
This blog explains how int4 quantization works in Lucene, how it lines up, and the benefits of using int4 quantization.
April 17, 2024
Making Lucene faster with vectorization and FFI/madvise
Discover how modern Java features, including vectorization and FFI/madvise, are speeding up Lucene's performance.
March 12, 2024
Speeding Up Multi-graph Vector Search
Explore multi-graph vector search in Lucene and discover how sharing information between segment searches enhances search speed.