WARNING: Version 1.7 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
A tokenizer of type
pattern that can flexibly separate text into terms
via a regular expression. Accepts the following settings:
The regular expression pattern, defaults to
The regular expression flags.
Which group to extract into tokens. Defaults to
IMPORTANT: The regular expression should match the token separators, not the tokens themselves.
group set to
-1 (the default) is equivalent to "split". Using group
>= 0 selects the matching group as the token. For example, if you have:
pattern = '([^']+)' group = 0 input = aaa 'bbb' 'ccc'
the output will be two tokens:
'ccc' (including the
marks). With the same input but using group=1, the output would be: