Filter Orderedit

The order of filters in a bool clause is important for performance. More-specific filters should be placed before less-specific filters in order to exclude as many documents as possible, as early as possible.

If Clause A could match 10 million documents, and Clause B could match only 100 documents, then Clause B should be placed before Clause A.

Cached filters are very fast, so they should be placed before filters that are not cacheable. Imagine that we have an index that contains one month’s worth of log events. However, we’re mostly interested only in log events from the previous hour:

GET /logs/2014-01/_search
{
    "query" : {
        "filtered" : {
            "filter" : {
                "range" : {
                    "timestamp" : {
                        "gt" : "now-1h"
                    }
                }
            }
        }
    }
}

This filter is not cached because it uses the now function, the value of which changes every millisecond. That means that we have to examine one month’s worth of log events every time we run this query!

We could make this much more efficient by combining it with a cached filter: we can exclude most of the month’s data by adding a filter that uses a fixed point in time, such as midnight last night:

"bool": {
    "must": [
        { "range" : {
            "timestamp" : {
                "gt" : "now-1h/d" 
            }
        }},
        { "range" : {
            "timestamp" : {
                "gt" : "now-1h" 
            }
        }}
    ]
}

This filter is cached because it uses now rounded to midnight.

This filter is not cached because it uses now without rounding.

The now-1h/d clause rounds to the previous midnight and so excludes all documents created before today. The resulting bitset is cached because now is used with rounding, which means that it is executed only once a day, when the value for midnight-last-night changes. The now-1h clause isn’t cached because now produces a time accurate to the nearest millisecond. However, thanks to the first filter, this second filter need only check documents that have been created since midnight.

The order of these clauses is important. This approach works only because the since-midnight clause comes before the last-hour clause. If they were the other way around, then the last-hour clause would need to examine all documents in the index, instead of just documents created since midnight.