Keyword Marker Token Filter | Elasticsearch Guide [6.6]

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› › ›

« Stemmer Override Token Filter Keyword Repeat Token Filter »

Keyword Marker Token Filteredit

Protects words from being modified by stemmers. Must be placed before any stemming filters.

Setting	Description
`keywords`	A list of words to use.
`keywords_path`	A path (either relative to `config` location, or absolute) to a list of words.
`keywords_pattern`	A regular expression pattern to match against words in the text.
`ignore_case`	Set to `true` to lower case all words first. Defaults to `false`.

You can configure it like:

PUT /keyword_marker_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "protect_cats": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "protect_cats", "porter_stem"]
        },
        "normal": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "porter_stem"]
        }
      },
      "filter": {
        "protect_cats": {
          "type": "keyword_marker",
          "keywords": ["cats"]
        }
      }
    }
  }
}

And test it with:

POST /keyword_marker_example/_analyze
{
  "analyzer" : "protect_cats",
  "text" : "I like cats"
}

And it’d respond:

{
  "tokens": [
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "like",
      "start_offset": 2,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "cats",
      "start_offset": 7,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

As compared to the normal analyzer which has cats stemmed to cat:

POST /keyword_marker_example/_analyze
{
  "analyzer" : "normal",
  "text" : "I like cats"
}

Response:

{
  "tokens": [
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "like",
      "start_offset": 2,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "cat",
      "start_offset": 7,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

« Stemmer Override Token Filter Keyword Repeat Token Filter »