WARNING: Version 5.2 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Character filters are used to preprocess the stream of characters before it is passed to the tokenizer.
A character filter receives the original text as a stream of characters and
can transform the stream by adding, removing, or changing characters. For
instance, a character filter could be used to convert Hindu-Arabic numerals
(٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML
<b> from the stream.
Elasticsearch has a number of built in character filters which can be used to build custom analyzers.
- HTML Strip Character Filter
html_stripcharacter filter strips out HTML elements like
<b>and decodes HTML entities like
- Mapping Character Filter
mappingcharacter filter replaces any occurrences of the specified strings with the specified replacements.
- Pattern Replace Character Filter
pattern_replacecharacter filter replaces any characters matching a regular expression with the specified replacement.