Keep types token filter
Keeps or removes tokens of a specific type. For example, you can use this filter to change 3 quick foxes
to quick foxes
by keeping only <ALPHANUM>
(alphanumeric) tokens.
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard
tokenizer can produce a variety of token types, including <ALPHANUM>
, <HANGUL>
, and <NUM>
. Simpler analyzers, like the lowercase
tokenizer, only produce the word
token type.
Certain token filters can also add token types. For example, the synonym
filter can add the <SYNONYM>
token type.
Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.
This filter uses Lucene’s TypeTokenFilter.
The following analyze API request uses the keep_types
filter to keep only <NUM>
(numeric) tokens from 1 quick fox 2 lazy dogs
.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ]
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ 1, 2 ]
The following analyze API request uses the keep_types
filter to remove <NUM>
tokens from 1 quick fox 2 lazy dogs
. Note the mode
parameter is set to exclude
.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ],
"mode": "exclude"
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
types
- (Required, array of strings) List of token types to keep or remove.
mode
-
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
include
- (Default) Keep only the specified token types.
exclude
- Remove the specified token types.
To customize the keep_types
filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.
For example, the following create index API request uses a custom keep_types
filter to configure a new custom analyzer. The custom keep_types
filter keeps only <ALPHANUM>
(alphanumeric) tokens.
PUT keep_types_example
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [ "extract_alpha" ]
}
},
"filter": {
"extract_alpha": {
"type": "keep_types",
"types": [ "<ALPHANUM>" ]
}
}
}
}
}