WARNING: Version 2.3 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
A tokenizer of type
standard providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
The following are settings that can be set for a
The maximum token length. If a token is seen that
exceeds this length then it is split at