More skipping with block- max MAXSCORE

How do you quickly identify the top-k matches of a disjunctive query using an inverted index? This is the problem that the WAND1 and MAXSCORE2 algorithms try to solve. These two algorithms are based on the same idea: taking advantage of the maximum impact score (the maximum contribution of a particular term to the overall score) for each term, so as to skip hits whose score cannot possibly compare greater than the score of the current k-th top hit (referred to as "minimum competitive score" below). For instance, if you are searching for hits that contain the or fox, and the has a maximum contribution of 0.2 to the score while the minimum competitive score is 0.5, then there is no point in evaluating hits that only contain the anymore, as they have no chance of making it to the top hits.

However, WAND and MAXSCORE come with different performance characteristics: WAND typically evaluates fewer hits than MAXSCORE but with a higher per-hit overhead. This makes MAXSCORE generally perform better on high k values or with many terms - when skipping hits is hard, and WAND perform better otherwise.

While Lucene first implemented a variant of WAND called block-max WAND, it later got attracted to the lower overhead of MAXSCORE and started using block-max MAXSCORE for top-level disjunctions in July 2022 (annotation EN in Lucene's nightly benchmarks). The MAXSCORE algorithm is rather simple: it sorts terms by increasing maximum impact score and partitions them into two groups: essential terms and non-essential terms. Non-essential terms are terms with low maximum impact scores whose sum of maximum scores is less than the minimum competitive score. Essential terms are all other terms. Essential terms are used to find candidate matches while non-essential terms are only used to compute the score of a candidate.

Let's take an example: you are searching for the quick fox, and the maximum impact scores of each term are respectively 0.2 for the, 0.5 for quick and 1 for fox. As you start collecting hits, the minimum competitive score is 0, so all terms are essential and no terms are non-essential. Then at some point, the minimum competitive score reaches e.g. 0.3, meaning that a hit that only contains the has no chance of making it to the top-k hits. the moves from the set of essential terms to the set of non-essential terms, and the query effectively runs as (the) +(quick fox). The + sign here is used to express that a query clause is required, such as in Lucene's classic query parser. Said another way, from that point on, the query will only match hits that contain quick or fox and will only use the to compute the final score. The below table summarizes cases that MAXSCORE considers:

Minimum competitive score intervalQuery runs as
[0, 0.2]+(the quick fox)
(0.2, 0.7](the) +(quick fox)
(0.7, 1.7](the quick) +(fox)
(1.7, +Infty)No more matches

The last case happens when the minimum competitive score is greater than the sum of all maximum impact scores across all terms. It typically never happens with regular MAXSCORE, but may happen on some blocks with block-max MAXSCORE.

Something that WAND does better than MAXSCORE is to progressively evaluate queries less and less as a disjunction and more and more as a conjunction as the minimum competitive score increases, which yields more skipping. This raised the question of whether MAXSCORE can be improved to also intersect terms? The answer is yes: for instance if the minimum competitive score is 1.3, then a hit cannot be competitive if it doesn't match both quick and fox. So we modified our block-max MAXSCORE implementation to consider the following cases instead:

Minimum competitive score intervalQuery runs as
[0, 0.2]+(the quick fox)
(0.2, 0.7](the) +(quick fox)
(0.7, 1.2](the quick) +(fox)
(1.2, 1.5](the) +quick +fox
(1.5, 1.7]+the +quick +fox
(1.7, +Infty)No more matches

Now the interesting question is whether these new cases we added are likely to occur or not? The answer depends on how good your score upper bounds are, your actual k value, whether terms actually have matches in common, etc., but it seems to kick in especially often in practice on queries that either have two terms, or that combine two high-scoring terms and zero or more low-scoring terms (e.g. stop words), such as the query we looked at in the above example. This is expected to cover a sizable number of queries in many query logs.

Implementing this optimization yielded a noticeable improvement on Lucene's nightly benchmarks (annotation FU), see OrHighHigh (11% speedup) and OrHighMed (6% speedup). It was released in Lucene 9.9 and should be included in Elasticsearch 8.12. We hope you'll enjoy the speedups!

Footnotes

  1. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003, November). Efficient query evaluation using a two-level retrieval process. In Proceedings of the twelfth international conference on Information and knowledge management (pp. 426-434).

  2. Turtle, H., & Flood, J. (1995). Query evaluation: strategies and optimizations. Information Processing & Management, 31(6), 831-850.

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!
Recommended Articles