NOTE: You are looking at documentation for an older release. For the latest information, see the current release documentation.

« Letter Tokenizer Whitespace Tokenizer »

› › ›

Lowercase Tokenizer

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Lowercase Tokenizer

edit

The lowercase tokenizer, like the letter tokenizer breaks text into terms whenever it encounters a character which is not a letter, but it also lowercases all terms. It is functionally equivalent to the letter tokenizer combined with the lowercase token filter, but is more efficient as it performs both steps in a single pass.

Example output

edit

POST _analyze
{
  "tokenizer": "lowercase",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above sentence would produce the following terms:

[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

Configuration

edit

The lowercase tokenizer is not configurable.

« Letter Tokenizer Whitespace Tokenizer »