Predicate script token filter

The predicate_token_filter token filter takes a predicate script, and removes tokens that do not match the predicate.

Options

script

a predicate script that determines whether or not the current token will be emitted. Note that only inline scripts are supported.

Settings example

You can set it up like:

PUT /condition_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "filter" : [ "my_script_filter" ]
                }
            },
            "filter" : {
                "my_script_filter" : {
                    "type" : "predicate_token_filter",
                    "script" : {
                        "source" : "token.getTerm().length() > 5"  
                    }
                }
            }
        }
    }
}

This will emit tokens that are more than 5 characters long

And test it like:

POST /condition_example/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "What Flapdoodle"
}

And it’d respond:

{
  "tokens": [
    {
      "token": "Flapdoodle",        
      "start_offset": 5,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1                 
    }
  ]
}

The token What has been removed from the tokenstream because it does not match the predicate.

The position and offset values are unaffected by the removal of earlier tokens