Query-Time Search-as-You-Type

edit

Leaving postcodes behind, let’s take a look at how prefix matching can help with full-text queries. Users have become accustomed to seeing search results before they have finished typing their query—​so-called instant search, or search-as-you-type. Not only do users receive their search results in less time, but we can guide them toward results that actually exist in our index.

For instance, if a user types in johnnie walker bl, we would like to show results for Johnnie Walker Black Label and Johnnie Walker Blue Label before they can finish typing their query.

As always, there are more ways than one to skin a cat! We will start by looking at the way that is simplest to implement. You don’t need to prepare your data in any way; you can implement search-as-you-type at query time on any full-text field.

In Phrase Matching, we introduced the match_phrase query, which matches all the specified words in the same positions relative to each other. For-query time search-as-you-type, we can use a specialization of this query, called the match_phrase_prefix query:

{
    "match_phrase_prefix" : {
        "brand" : "johnnie walker bl"
    }
}

This query behaves in the same way as the match_phrase query, except that it treats the last word in the query string as a prefix. In other words, the preceding example would look for the following:

  • johnnie
  • Followed by walker
  • Followed by words beginning with bl

If you were to run this query through the validate-query API, it would produce this explanation:

"johnnie walker bl*"

Like the match_phrase query, it accepts a slop parameter (see Mixing It Up) to make the word order and relative positions somewhat less rigid:

{
    "match_phrase_prefix" : {
        "brand" : {
            "query": "walker johnnie bl", 
            "slop":  10
        }
    }
}

Even though the words are in the wrong order, the query still matches because we have set a high enough slop value to allow some flexibility in word positions.

However, it is always only the last word in the query string that is treated as a prefix.

Earlier, in prefix Query, we warned about the perils of the prefix—​how prefix queries can be resource intensive. The same is true in this case. A prefix of a could match hundreds of thousands of terms. Not only would matching on this many terms be resource intensive, but it would also not be useful to the user.

We can limit the impact of the prefix expansion by setting max_expansions to a reasonable number, such as 50:

{
    "match_phrase_prefix" : {
        "brand" : {
            "query":          "johnnie walker bl",
            "max_expansions": 50
        }
    }
}

The max_expansions parameter controls how many terms the prefix is allowed to match. It will find the first term starting with bl and keep collecting terms (in alphabetical order) until it either runs out of terms with prefix bl, or it has more terms than max_expansions.

Don’t forget that we have to run this query every time the user types another character, so it needs to be fast. If the first set of results isn’t what users are after, they’ll keep typing until they get the results that they want.