IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Edge n-gram token filter Fingerprint token filter »

› › ›

Elision token filter

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Elision token filter

edit

Removes specified elisions from the beginning of tokens. For example, you can use this filter to change l'avion to avion.

When not customized, the filter removes the following French elisions by default:

l', m', t', qu', n', s', j', d', c', jusqu', quoiqu', lorsqu', puisqu'

Customized versions of this filter are included in several of Elasticsearch’s built-in language analyzers:

This filter uses Lucene’s ElisionFilter.

Example

edit

The following analyze API request uses the elision filter to remove j' from j’examine près du wharf:

response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
      'elision'
    ],
    text: 'j’examine près du wharf'
  }
)
puts response

GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["elision"],
  "text" : "j’examine près du wharf"
}

The filter produces the following tokens:

[ examine, près, du, wharf ]

Add to an analyzer

edit

The following create index API request uses the elision filter to configure a new custom analyzer.

response = client.indices.create(
  index: 'elision_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          whitespace_elision: {
            tokenizer: 'whitespace',
            filter: [
              'elision'
            ]
          }
        }
      }
    }
  }
)
puts response

PUT /elision_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_elision": {
          "tokenizer": "whitespace",
          "filter": [ "elision" ]
        }
      }
    }
  }
}

Configurable parameters

edit

articles

(Required*, array of string) List of elisions to remove.

To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.

For custom elision filters, either this parameter or articles_path must be specified.

articles_path

(Required*, string) Path to a file that contains a list of elisions to remove.

This path must be absolute or relative to the config location, and the file must be UTF-8 encoded. Each elision in the file must be separated by a line break.

To be removed, the elision must be at the beginning of a token and be immediately followed by an apostrophe. Both the elision and apostrophe are removed.

For custom elision filters, either this parameter or articles must be specified.

articles_case

(Optional, Boolean) If true, elision matching is case insensitive. If false, elision matching is case sensitive. Defaults to false.

Customize

edit

To customize the elision filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following request creates a custom case-insensitive elision filter that removes the l', m', t', qu', n', s', and j' elisions:

response = client.indices.create(
  index: 'elision_case_insensitive_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          default: {
            tokenizer: 'whitespace',
            filter: [
              'elision_case_insensitive'
            ]
          }
        },
        filter: {
          elision_case_insensitive: {
            type: 'elision',
            articles: [
              'l',
              'm',
              't',
              'qu',
              'n',
              's',
              'j'
            ],
            articles_case: true
          }
        }
      }
    }
  }
)
puts response

PUT /elision_case_insensitive_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "whitespace",
          "filter": [ "elision_case_insensitive" ]
        }
      },
      "filter": {
        "elision_case_insensitive": {
          "type": "elision",
          "articles": [ "l", "m", "t", "qu", "n", "s", "j" ],
          "articles_case": true
        }
      }
    }
  }
}

« Edge n-gram token filter Fingerprint token filter »