Queries and Filters

edit

The DSL used by Elasticsearch has a single set of components called queries, which can be mixed and matched in endless combinations. This single set of components can be used in two contexts: filtering context and query context.

When used in filtering context, the query is said to be a "non-scoring" or "filtering" query. That is, the query simply asks the question: "Does this document match?". The answer is always a simple, binary yes|no.

  • Is the created date in the range 2013 - 2014?
  • Does the status field contain the term published?
  • Is the lat_lon field within 10km of a specified point?

When used in a querying context, the query becomes a "scoring" query. Similar to its non-scoring sibling, this determines if a document matches and how well the document matches.

A typical use for a query is to find documents:

  • Best matching the words full text search
  • Containing the word run, but maybe also matching runs, running, jog, or sprint
  • Containing the words quick, brown, and fox—the closer together they are, the more relevant the document
  • Tagged with lucene, search, or java—the more tags, the more relevant the document

A scoring query calculates how relevant each document is to the query, and assigns it a relevance _score, which is later used to sort matching documents by relevance. This concept of relevance is well suited to full-text search, where there is seldom a completely “correct” answer.

Historically, queries and filters were separate components in Elasticsearch. Starting in Elasticsearch 2.0, filters were technically eliminated, and all queries gained the ability to become non-scoring.

However, for clarity and simplicity, we will use the term "filter" to mean a query which is used in a non-scoring, filtering context. You can think of the terms "filter", "filtering query" and "non-scoring query" as being identical.

Similarly, if the term "query" is used in isolation without a qualifier, we are referring to a "scoring query".

Performance Differences

edit

Filtering queries are simple checks for set inclusion/exclusion, which make them very fast to compute. There are various optimizations that can be leveraged when at least one of your filtering query is "sparse" (few matching documents), and frequently used non-scoring queries can be cached in memory for faster access.

In contrast, scoring queries have to not only find matching documents, but also calculate how relevant each document is, which typically makes them heavier than their non-scoring counterparts. Also, query results are not cacheable.

Thanks to the inverted index, a simple scoring query that matches just a few documents may perform as well or better than a filter that spans millions of documents. In general, however, a filter will outperform a scoring query. And it will do so consistently.

The goal of filtering is to reduce the number of documents that have to be examined by the scoring queries.

When to Use Which

edit

As a general rule, use query clauses for full-text search or for any condition that should affect the relevance score, and use filters for everything else.