We are working on updating this book for the latest version. Some content might be out of date.
Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box support for many of the world’s most common languages.
These analyzers typically perform four roles:
Tokenize text into individual words:
The quick brown foxes→ [
Remove common stopwords:
foxes] → [
Stem tokens to their root form:
Each analyzer may also apply other transformations specific to its language in order to make words from that language more searchable:
englishanalyzer removes the possessive
frenchanalyzer removes elisions like
qu'and diacritics like
germananalyzer normalizes terms, replacing
ss, among others: