We are working on updating this book for the latest version. Some content might be out of date.
The DSL used by Elasticsearch has a single set of components called queries, which can be mixed and matched in endless combinations. This single set of components can be used in two contexts: filtering context and query context.
When used in filtering context, the query is said to be a "non-scoring" or "filtering" query. That is, the query simply asks the question: "Does this document match?". The answer is always a simple, binary yes|no.
createddate in the range
statusfield contain the term
10kmof a specified point?
When used in a querying context, the query becomes a "scoring" query. Similar to its non-scoring sibling, this determines if a document matches. But it also determines how well does the document matches.
A typical use for a query is to find documents:
Best matching the words
full text search
Containing the word
run, but maybe also matching
Containing the words
fox—the closer together they are, the more relevant the document
java—the more tags, the more relevant the document
A scoring query calculates how relevant each document
is to the
query, and assigns it a relevance
_score, which is later used to
sort matching documents by relevance. This concept of relevance is
well suited to full-text search, where there is seldom a completely
Historically, queries and filters were separate components in Elasticsearch. Starting in Elasticsearch 2.0, filters were technically eliminated, and all queries gained the ability to become non-scoring.
However, for clarity and simplicity, we will use the term "filter" to mean a query which is used in a non-scoring, filtering context. You can think of the terms "filter", "filtering query" and "non-scoring query" as being identical.
Similarly, if the term "query" is used in isolation without a qualifier, we are referring to a "scoring query".
Filtering queries are simple checks for set inclusion/exclusion, which make them very fast to compute. There are various optimizations that can be leveraged when at least one of your filtering query is "sparse" (few matching documents), and frequently used non-scoring queries can be cached in memory for faster access.
In contrast, scoring queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes them heavier than their non-scoring counterparts. Also, query results are not cacheable.
Thanks to the inverted index, a simple scoring query that matches just a few documents may perform as well or better than a filter that spans millions of documents. In general, however, a filter will outperform a scoring query. And it will do so consistently.
The goal of filtering is to reduce the number of documents that have to be examined by the scoring queries.