WARNING: Version 5.5 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
fingerprint token filter emits a single token which is useful for fingerprinting
a body of text, and/or providing a token that can be clustered on. It does this by
sorting the tokens, deduplicating and then concatenating them back into a single token.
For example, the tokens
["the", "quick", "quick", "brown", "fox", "was", "very", "brown"] will be
transformed into a single token:
"brown fox quick the very was". Notice how the tokens were sorted
alphabetically, and there is only one
The following are settings that can be set for a
Defaults to a space.
Because a field may have many unique tokens, it is important to set a cutoff so that fields do not grow
too large. The
max_output_size setting controls this behavior. If the concatenated fingerprint
grows larger than
max_output_size, the token filter will exit and will not emit a token (e.g. the
field will be empty).