We are working on updating this book for the latest version. Some content might be out of date.
In the same way as the
lowercase token filter is a good starting point for
but falls short when exposed to the entire tower of Babel, so
asciifolding token filter requires a more
effective Unicode character-folding counterpart for dealing with the many
languages of the world.
icu_folding token filter (provided by the
does the same job as the
asciifolding filter, but extends the transformation
to scripts that are not ASCII-based, such as Greek, Hebrew, Han, conversion
of numbers in other scripts into their Latin equivalents, plus various other
numeric, symbolic, and punctuation transformations.
If there are particular characters that you would like to protect from
folding, you can use a
(much like a character class in regular expressions) to specify which Unicode
characters may be folded. For instance, to exclude the Swedish letters
Ö from folding, you would specify a character class
representing all Unicode characters, except for those letters:
^ means everything except).