This documentation contains work-in-progress information for future Elastic Stack and Cloud releases. Use the version selector to view supported release docs. It also contains some Elastic Cloud serverless information. Check out our serverless docs for more details.

« Standard analyzer Whitespace analyzer »

› › ›

Stop analyzer

edit

Stop analyzeredit

The stop analyzer is the same as the simple analyzer but adds support for removing stop words. It defaults to using the _english_ stop words.

Example outputedit

response = client.indices.analyze(
  body: {
    analyzer: 'stop',
    text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

POST _analyze
{
  "analyzer": "stop",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above sentence would produce the following terms:

[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

Configurationedit

The stop analyzer accepts the following parameters:

`stopwords`	A pre-defined stop words list like `_english_` or an array containing a list of stop words. Defaults to `_english_`.
`stopwords_path`	The path to a file containing stop words. This path is relative to the Elasticsearch `config` directory.

See the Stop Token Filter for more information about stop word configuration.

Example configurationedit

In this example, we configure the stop analyzer to use a specified list of words as stop words:

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    settings: {
      analysis: {
        analyzer: {
          my_stop_analyzer: {
            type: 'stop',
            stopwords: [
              'the',
              'over'
            ]
          }
        }
      }
    }
  }
)
puts response

response = client.indices.analyze(
  index: 'my-index-000001',
  body: {
    analyzer: 'my_stop_analyzer',
    text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

PUT my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_stop_analyzer": {
          "type": "stop",
          "stopwords": ["the", "over"]
        }
      }
    }
  }
}

POST my-index-000001/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above example produces the following terms:

[ quick, brown, foxes, jumped, lazy, dog, s, bone ]

Definitionedit

It consists of:

Tokenizer

Lower Case Tokenizer

Token filters

Stop Token Filter

If you need to customize the stop analyzer beyond the configuration parameters then you need to recreate it as a custom analyzer and modify it, usually by adding token filters. This would recreate the built-in stop analyzer and you can use it as a starting point for further customization:

response = client.indices.create(
  index: 'stop_example',
  body: {
    settings: {
      analysis: {
        filter: {
          english_stop: {
            type: 'stop',
            stopwords: '_english_'
          }
        },
        analyzer: {
          rebuilt_stop: {
            tokenizer: 'lowercase',
            filter: [
              'english_stop'
            ]
          }
        }
      }
    }
  }
)
puts response

PUT /stop_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        }
      },
      "analyzer": {
        "rebuilt_stop": {
          "tokenizer": "lowercase",
          "filter": [
            "english_stop"          
          ]
        }
      }
    }
  }
}

	The default stopwords can be overridden with the `stopwords` or `stopwords_path` parameters.
	You’d add any token filters after `english_stop`.

« Standard analyzer Whitespace analyzer »