IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Phonetic Analysis Plugin Smart Chinese Analysis Plugin »

› › ›

phonetic token filter

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

`phonetic` token filter

edit

The phonetic token filter takes the following settings:

encoder: Which phonetic encoder to use. Accepts metaphone (default), double_metaphone, soundex, refined_soundex, caverphone1, caverphone2, cologne, nysiis, koelnerphonetik, haasephonetik, beider_morse, daitch_mokotoff.
replace: Whether or not the original token should be replaced by the phonetic token. Accepts true (default) and false. Not supported by beider_morse encoding.

PUT phonetic_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_metaphone"
            ]
          }
        },
        "filter": {
          "my_metaphone": {
            "type": "phonetic",
            "encoder": "metaphone",
            "replace": false
          }
        }
      }
    }
  }
}

GET phonetic_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "Joe Bloggs" 
}

Returns: J, joe, BLKS, bloggs

Double metaphone settings

edit

If the double_metaphone encoder is used, then this additional setting is supported:

max_code_len: The maximum length of the emitted metaphone token. Defaults to 4.

Beider Morse settings

edit

If the beider_morse encoder is used, then these additional settings are supported:

rule_type: Whether matching should be exact or approx (default).
name_type: Whether names are ashkenazi, sephardic, or generic (default).
languageset: An array of languages to check. If not specified, then the language will be guessed. Accepts: any, common, cyrillic, english, french, german, hebrew, hungarian, polish, romanian, russian, spanish.

« Phonetic Analysis Plugin Smart Chinese Analysis Plugin »