Keep types token filter
editKeep types token filter
editKeeps or removes tokens of a specific type. For example, you can use this filter
to change 3 quick foxes to quick foxes by keeping only <ALPHANUM>
(alphanumeric) tokens.
Token types
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard tokenizer can
produce a variety of token types, including <ALPHANUM>, <HANGUL>, and
<NUM>. Simpler analyzers, like the
lowercase tokenizer, only produce the word
token type.
Certain token filters can also add token types. For example, the
synonym filter can add the <SYNONYM> token
type.
This filter uses Lucene’s TypeTokenFilter.
Include example
editThe following analyze API request uses the keep_types
filter to keep only <NUM> (numeric) tokens from 1 quick fox 2 lazy dogs.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ]
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ 1, 2 ]
Exclude example
editThe following analyze API request uses the keep_types
filter to remove <NUM> tokens from 1 quick fox 2 lazy dogs. Note the mode
parameter is set to exclude.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ],
"mode": "exclude"
}
],
"text": "1 quick fox 2 lazy dogs"
}
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
Configurable parameters
edit-
types - (Required, array of strings) List of token types to keep or remove.
-
mode -
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
-
include - (Default) Keep only the specified token types.
-
exclude - Remove the specified token types.
-
Customize and add to an analyzer
editTo customize the keep_types filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following create index API request
uses a custom keep_types filter to configure a new
custom analyzer. The custom keep_types filter
keeps only <ALPHANUM> (alphanumeric) tokens.
PUT keep_types_example
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [ "extract_alpha" ]
}
},
"filter": {
"extract_alpha": {
"type": "keep_types",
"types": [ "<ALPHANUM>" ]
}
}
}
}
}