Keep types token filter
editKeep types token filter
editKeeps or removes tokens of a specific type. For example, you can use this filter
to change 3 quick foxes
to quick foxes
by keeping only <ALPHANUM>
(alphanumeric) tokens.
Token types
Token types are set by the tokenizer when converting characters to tokens. Token types can vary between tokenizers.
For example, the standard
tokenizer can
produce a variety of token types, including <ALPHANUM>
, <HANGUL>
, and
<NUM>
. Simpler analyzers, like the
lowercase
tokenizer, only produce the word
token type.
Certain token filters can also add token types. For example, the
synonym
filter can add the <SYNONYM>
token
type.
Some tokenizers don’t support this token filter, for example keyword, simple_pattern, and simple_pattern_split tokenizers, as they don’t support setting the token type attribute.
This filter uses Lucene’s TypeTokenFilter.
Include example
editThe following analyze API request uses the keep_types
filter to keep only <NUM>
(numeric) tokens from 1 quick fox 2 lazy dogs
.
resp = client.indices.analyze( tokenizer="standard", filter=[ { "type": "keep_types", "types": [ "<NUM>" ] } ], text="1 quick fox 2 lazy dogs", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ { type: 'keep_types', types: [ '<NUM>' ] } ], text: '1 quick fox 2 lazy dogs' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: [ { type: "keep_types", types: ["<NUM>"], }, ], text: "1 quick fox 2 lazy dogs", }); console.log(response);
GET _analyze { "tokenizer": "standard", "filter": [ { "type": "keep_types", "types": [ "<NUM>" ] } ], "text": "1 quick fox 2 lazy dogs" }
The filter produces the following tokens:
[ 1, 2 ]
Exclude example
editThe following analyze API request uses the keep_types
filter to remove <NUM>
tokens from 1 quick fox 2 lazy dogs
. Note the mode
parameter is set to exclude
.
resp = client.indices.analyze( tokenizer="standard", filter=[ { "type": "keep_types", "types": [ "<NUM>" ], "mode": "exclude" } ], text="1 quick fox 2 lazy dogs", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ { type: 'keep_types', types: [ '<NUM>' ], mode: 'exclude' } ], text: '1 quick fox 2 lazy dogs' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: [ { type: "keep_types", types: ["<NUM>"], mode: "exclude", }, ], text: "1 quick fox 2 lazy dogs", }); console.log(response);
GET _analyze { "tokenizer": "standard", "filter": [ { "type": "keep_types", "types": [ "<NUM>" ], "mode": "exclude" } ], "text": "1 quick fox 2 lazy dogs" }
The filter produces the following tokens:
[ quick, fox, lazy, dogs ]
Configurable parameters
edit-
types
- (Required, array of strings) List of token types to keep or remove.
-
mode
-
(Optional, string) Indicates whether to keep or remove the specified token types. Valid values are:
-
include
- (Default) Keep only the specified token types.
-
exclude
- Remove the specified token types.
-
Customize and add to an analyzer
editTo customize the keep_types
filter, duplicate it to create the basis
for a new custom token filter. You can modify the filter using its configurable
parameters.
For example, the following create index API request
uses a custom keep_types
filter to configure a new
custom analyzer. The custom keep_types
filter
keeps only <ALPHANUM>
(alphanumeric) tokens.
resp = client.indices.create( index="keep_types_example", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "extract_alpha" ] } }, "filter": { "extract_alpha": { "type": "keep_types", "types": [ "<ALPHANUM>" ] } } } }, ) print(resp)
response = client.indices.create( index: 'keep_types_example', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'standard', filter: [ 'extract_alpha' ] } }, filter: { extract_alpha: { type: 'keep_types', types: [ '<ALPHANUM>' ] } } } } } ) puts response
const response = await client.indices.create({ index: "keep_types_example", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "standard", filter: ["extract_alpha"], }, }, filter: { extract_alpha: { type: "keep_types", types: ["<ALPHANUM>"], }, }, }, }, }); console.log(response);
PUT keep_types_example { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "extract_alpha" ] } }, "filter": { "extract_alpha": { "type": "keep_types", "types": [ "<ALPHANUM>" ] } } } } }