In That Case | Elasticsearch: The Definitive Guide [2.x]

WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

› › ›

« Normalizing Tokens You Have an Accent »

In That Caseedit

The most frequently used token filter is the lowercase filter, which does exactly what you would expect; it transforms each token into its lowercase form:

GET /_analyze?tokenizer=standard&filters=lowercase
The QUICK Brown FOX!

Emits tokens the, quick, brown, fox

It doesn’t matter whether users search for fox or FOX, as long as the same analysis process is applied at query time and at search time. The lowercase filter will transform a query for FOX into a query for fox, which is the same token that we have stored in our inverted index.

To use token filters as part of the analysis process, we can create a custom analyzer:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_lowercaser": {
          "tokenizer": "standard",
          "filter":  [ "lowercase" ]
        }
      }
    }
  }
}

And we can test it out with the analyze API:

GET /my_index/_analyze?analyzer=my_lowercaser
The QUICK Brown FOX!

Emits tokens the, quick, brown, fox

« Normalizing Tokens You Have an Accent »