Getting Started with Languages | Elasticsearch: The Definitive Guide [master]

WARNING: This documentation covers Elasticsearch 2.x. The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

› ›

« Dealing with Human Language Using Language Analyzers »

Getting Started with Languagesedit

Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box support for many of the world’s most common languages.

These analyzers typically perform four roles:

Tokenize text into individual words:

The quick brown foxes → [The, quick, brown, foxes]
Lowercase tokens:

The → the
Remove common stopwords:

[The, quick, brown, foxes] → [quick, brown, foxes]
Stem tokens to their root form:

foxes → fox

Each analyzer may also apply other transformations specific to its language in order to make words from that language more searchable:

The english analyzer removes the possessive 's:

John's → john
The french analyzer removes elisions like l' and qu' and diacritics like ¨ or ^:

l'église → eglis
The german analyzer normalizes terms, replacing ä and ae with a, or ß with ss, among others:

äußerst → ausserst

« Dealing with Human Language Using Language Analyzers »