A tokenizer of type
standard providing grammar based tokenizer that is
a good tokenizer for most European language documents. The tokenizer
implements the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29.
The following are settings that can be set for a
The maximum token length. If a token is seen that
exceeds this length then it is discarded. Defaults to