WARNING: Version 5.6 of Elasticsearch has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

« Lowercase Tokenizer UAX URL Email Tokenizer »

› › ›

Whitespace Tokenizer

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Whitespace Tokenizer

edit

The whitespace tokenizer breaks text into terms whenever it encounters a whitespace character.

Example output

edit

POST _analyze
{
  "tokenizer": "whitespace",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The maximum token length is 255. If a token is seen that exceeds this length, then it is split at 255 characters.

The above sentence would produce the following terms:

[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

Configuration

edit

The whitespace tokenizer is not configurable.

« Lowercase Tokenizer UAX URL Email Tokenizer »