Loading

Keep types token filter

Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes to quick foxes by keeping only <ALPHANUM> (alphanumeric) tokens.

Token types

Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.

For example, the standard tokenizer can produce a variety of token types, including <ALPHANUM>, <HANGUL>, and <NUM>. Simpler analyzers, like the lowercase tokenizer, only produce the word token type.

Certain token filters can also add token types. For example, the synonym filter can add the <SYNONYM> token type.

Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.

This filter uses Lucene’s TypeTokenFilter.

The following analyze API request uses the keep_types filter to keep only <NUM> (numeric) tokens from 1 quick fox 2 lazy dogs.

 GET _analyze {
  "tokenizer": "standard",
  "filter": [
    {
      "type": "keep_types",
      "types": [ "<NUM>" ]
    }
  ],
  "text": "1 quick fox 2 lazy dogs"
}

The filter produces the following tokens:

[ 1, 2 ]

The following analyze API request uses the keep_types filter to remove <NUM> tokens from 1 quick fox 2 lazy dogs. Note the mode parameter is set to exclude.

 GET _analyze {
  "tokenizer": "standard",
  "filter": [
    {
      "type": "keep_types",
      "types": [ "<NUM>" ],
      "mode": "exclude"
    }
  ],
  "text": "1 quick fox 2 lazy dogs"
}

The filter produces the following tokens:

[ quick, fox, lazy, dogs ]
types
(Required, array of strings) List of token types to keep or remove.
mode

(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:

include
(Default) Keep only the specified token types.
exclude
Remove the specified token types.

To customize the keep_types filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following create index API request uses a custom keep_types filter to configure a new custom analyzer. The custom keep_types filter keeps only <ALPHANUM> (alphanumeric) tokens.

 PUT keep_types_example {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [ "extract_alpha" ]
        }
      },
      "filter": {
        "extract_alpha": {
          "type": "keep_types",
          "types": [ "<ALPHANUM>" ]
        }
      }
    }
  }
}