WARNING: Version 5.5 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Keyword Marker Token Filteredit
Protects words from being modified by stemmers. Must be placed before any stemming filters.
Setting | Description |
---|---|
|
A list of words to use. |
|
A path (either relative to |
|
A regular expression pattern to match against words in the text. |
|
Set to |
You can configure it like:
PUT /keyword_marker_example { "settings": { "analysis": { "analyzer": { "protect_cats": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "protect_cats", "porter_stem"] }, "normal": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "porter_stem"] } }, "filter": { "protect_cats": { "type": "keyword_marker", "keywords": ["cats"] } } } } }
And test it with:
POST /keyword_marker_example/_analyze { "analyzer" : "protect_cats", "text" : "I like cats" }
And it’d respond:
{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "cats", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 } ] }
As compared to the normal
analyzer which has cats
stemmed to cat
:
POST /keyword_marker_example/_analyze { "analyzer" : "normal", "text" : "I like cats" }
Response:
{ "tokens": [ { "token": "i", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "like", "start_offset": 2, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "cat", "start_offset": 7, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 } ] }