IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Keyword Marker Token Filter
edit
IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.
Keyword Marker Token Filter
editProtects words from being modified by stemmers. Must be placed before any stemming filters.
| Setting | Description |
|---|---|
|
A list of words to use. |
|
A path (either relative to |
|
A regular expression pattern to match against words in the text. |
|
Set to |
You can configure it like:
PUT /keyword_marker_example
{
"settings": {
"analysis": {
"analyzer": {
"protect_cats": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "protect_cats", "porter_stem"]
},
"normal": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "porter_stem"]
}
},
"filter": {
"protect_cats": {
"type": "keyword_marker",
"keywords": ["cats"]
}
}
}
}
}
And test it with:
POST /keyword_marker_example/_analyze
{
"analyzer" : "protect_cats",
"text" : "I like cats"
}
And it’d respond:
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "like",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "cats",
"start_offset": 7,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
}
]
}
As compared to the normal analyzer which has cats stemmed to cat:
POST /keyword_marker_example/_analyze
{
"analyzer" : "normal",
"text" : "I like cats"
}
Response:
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "like",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "cat",
"start_offset": 7,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
}
]
}