Combining Filters

edit

The previous two examples showed a single filter in use. In practice, you will probably need to filter on multiple values or fields. For example, how would you express this SQL in Elasticsearch?

SELECT product
FROM   products
WHERE  (price = 20 OR productID = "XHDK-A-1293-#fJ3")
  AND  (price != 30)

In these situations, you will need to use a bool query inside the constant_score query. This allows us to build filters that can have multiple components in boolean combinations.

Bool Filter

edit

Recall that the bool query is composed of four sections:

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
      "filter":    []
   }
}
must
All of these clauses must match. The equivalent of AND.
must_not
All of these clauses must not match. The equivalent of NOT.
should
At least one of these clauses must match. The equivalent of OR.
filter
Clauses that must match, but are run in non-scoring, filtering mode.

In this secondary boolean query, we can ignore the filter clause: the queries are already running in non-scoring mode, so the extra filter clause is useless.

Each section of the bool filter is optional (for example, you can have a must clause and nothing else), and each section can contain a single query or an array of queries.

To replicate the preceding SQL example, we will take the two term queries that we used previously and place them inside the should clause of a bool query, and add another clause to deal with the NOT condition:

GET /my_store/products/_search
{
   "query" : {
      "constant_score" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}

Note that we still need to use a constant_score query to wrap everything with its filter clause. This is what enables non-scoring mode

These two term queries are children of the bool query, and since they are placed inside the should clause, at least one of them needs to match.

If a product has a price of 30, it is automatically excluded because it matches a must_not clause.

Notice how boolean is placed inside the constant_score, but the individual term queries are just placed in the should and must_not. Because everything is wrapped with the constant_score, the rest of the queries are executing in filtering mode.

Our search results return two hits, each document satisfying a different clause in the bool query:

"hits" : [
    {
        "_id" :     "1",
        "_score" :  1.0,
        "_source" : {
          "price" :     10,
          "productID" : "XHDK-A-1293-#fJ3" 
        }
    },
    {
        "_id" :     "2",
        "_score" :  1.0,
        "_source" : {
          "price" :     20, 
          "productID" : "KDKE-B-9947-#kL5"
        }
    }
]

Matches the term query for productID = "XHDK-A-1293-#fJ3"

Matches the term query for price = 20

Nesting Boolean Queries

edit

You can already see how nesting boolean queries together can give rise to more sophisticated boolean logic. If you need to perform more complex operations, you can continue nesting boolean queries in any combination, giving rise to arbitrarily complex boolean logic.

For example, if we have this SQL statement:

SELECT document
FROM   products
WHERE  productID      = "KDKE-B-9947-#kL5"
  OR (     productID = "JODL-X-1937-#pV7"
       AND price     = 30 )

We can translate it into a pair of nested bool filters:

GET /my_store/products/_search
{
   "query" : {
      "constant_score" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
                { "bool" : { 
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                    { "term" : {"price" : 30}} 
                  ]
                }}
              ]
           }
         }
      }
   }
}

Because the term and the bool are sibling clauses inside the Boolean should, at least one of these queries must match for a document to be a hit.

These two term clauses are siblings in a must clause, so they both have to match for a document to be returned as a hit.

The results show us two documents, one matching each of the should clauses:

"hits" : [
    {
        "_id" :     "2",
        "_score" :  1.0,
        "_source" : {
          "price" :     20,
          "productID" : "KDKE-B-9947-#kL5" 
        }
    },
    {
        "_id" :     "3",
        "_score" :  1.0,
        "_source" : {
          "price" :      30, 
          "productID" : "JODL-X-1937-#pV7" 
        }
    }
]

This productID matches the term in the first bool.

These two fields match the term filters in the nested bool.

This was a simple example, but it demonstrates how Boolean queries can be used as building blocks to construct complex logical conditions.