Get tokens from text analysis Generally available

GET /{index}/_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit. ##Required authorization

  • Index privileges: index
External documentation

Path parameters

  • index string Required

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

Query parameters

  • index string

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

application/json

Body

  • analyzer string

    The name of the analyzer that should be applied to the provided text. This could be a built-in analyzer, or an analyzer that’s been configured in the index.

  • attributes array[string]

    Array of token attributes used to filter the output of the explain parameter.

  • char_filter array[string | object]

    Array of character filters used to preprocess characters before the tokenizer.

    One of:
  • explain boolean

    If true, the response includes token attributes and additional details.

  • field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • filter array[string | object]

    Array of token filters used to apply after the tokenizer.

    One of:
    TokenFilter string ApostropheTokenFilter object ArabicNormalizationTokenFilter object AsciiFoldingTokenFilter object CjkBigramTokenFilter object CjkWidthTokenFilter object ClassicTokenFilter object CommonGramsTokenFilter object ConditionTokenFilter object DecimalDigitTokenFilter object DelimitedPayloadTokenFilter object EdgeNGramTokenFilter object ElisionTokenFilter object FingerprintTokenFilter object FlattenGraphTokenFilter object GermanNormalizationTokenFilter object HindiNormalizationTokenFilter object HunspellTokenFilter object HyphenationDecompounderTokenFilter object IndicNormalizationTokenFilter object KeepTypesTokenFilter object KeepWordsTokenFilter object KeywordMarkerTokenFilter object KeywordRepeatTokenFilter object KStemTokenFilter object LengthTokenFilter object LimitTokenCountTokenFilter object LowercaseTokenFilter object MinHashTokenFilter object MultiplexerTokenFilter object NGramTokenFilter object NoriPartOfSpeechTokenFilter object PatternCaptureTokenFilter object PatternReplaceTokenFilter object PersianNormalizationTokenFilter object PorterStemTokenFilter object PredicateTokenFilter object RemoveDuplicatesTokenFilter object ReverseTokenFilter object ScandinavianFoldingTokenFilter object ScandinavianNormalizationTokenFilter object SerbianNormalizationTokenFilter object ShingleTokenFilter object SnowballTokenFilter object SoraniNormalizationTokenFilter object StemmerOverrideTokenFilter object StemmerTokenFilter object StopTokenFilter object SynonymGraphTokenFilter object SynonymTokenFilter object TrimTokenFilter object TruncateTokenFilter object UniqueTokenFilter object UppercaseTokenFilter object WordDelimiterGraphTokenFilter object WordDelimiterTokenFilter object JaStopTokenFilter object KuromojiStemmerTokenFilter object KuromojiReadingFormTokenFilter object KuromojiPartOfSpeechTokenFilter object IcuCollationTokenFilter object IcuFoldingTokenFilter object IcuNormalizationTokenFilter object IcuTransformTokenFilter object PhoneticTokenFilter object DictionaryDecompounderTokenFilter object
  • normalizer string

    Normalizer to use to convert text into a single token.

  • tokenizer string | object

    One of:

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • detail object
      Hide detail attributes Show detail attributes object
      • analyzer object
        Hide analyzer attributes Show analyzer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • charfilters array[object]
        Hide charfilters attributes Show charfilters attributes object
        • filtered_text array[string] Required
        • name string Required
      • custom_analyzer boolean Required
      • tokenfilters array[object]
        Hide tokenfilters attributes Show tokenfilters attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • tokenizer object
        Hide tokenizer attributes Show tokenizer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
    • tokens array[object]
      Hide tokens attributes Show tokens attributes object
      • end_offset number Required
      • position number Required
      • positionLength number
      • start_offset number Required
      • token string Required
      • type string Required
GET /{index}/_analyze
GET /_analyze
{
  "tokenizer": "standard",
  "filter" : [
    "lowercase",
    {
      "type": "synonym_graph",
      "synonyms": ["pc => personal computer", "computer, pc, laptop"]
    }
  ],
  "text" : "Check how PC synonyms work"
}
curl \
 --request GET 'http://api.example.com/{index}/_analyze' \
 --header "Content-Type: application/json" \
 --data '"{\n  \"tokenizer\": \"standard\",\n  \"filter\" : [\n    \"lowercase\",\n    {\n      \"type\": \"synonym_graph\",\n      \"synonyms\": [\"pc =\u003e personal computer\", \"computer, pc, laptop\"]\n    }\n  ],\n  \"text\" : \"Check how PC synonyms work\"\n}"'
Request examples
An example body for a `GET /_analyze` request.
{
  "tokenizer": "standard",
  "filter" : [
    "lowercase",
    {
      "type": "synonym_graph",
      "synonyms": ["pc => personal computer", "computer, pc, laptop"]
    }
  ],
  "text" : "Check how PC synonyms work"
}
You can apply any of the built-in analyzers to the text string without specifying an index.
{
  "analyzer": "standard",
  "text": "this is a test"
}
If the text parameter is provided as array of strings, it is analyzed as a multi-value field.
{
  "analyzer": "standard",
  "text": [
    "this is a test",
    "the second text"
  ]
}
You can test a custom transient analyzer built from tokenizers, token filters, and char filters. Token filters use the filter parameter.
{
  "tokenizer": "keyword",
  "filter": [
    "lowercase"
  ],
  "char_filter": [
    "html_strip"
  ],
  "text": "this is a <b>test</b>"
}
Custom tokenizers, token filters, and character filters can be specified in the request body.
{
  "tokenizer": "whitespace",
  "filter": [
    "lowercase",
    {
      "type": "stop",
      "stopwords": [
        "a",
        "is",
        "this"
      ]
    }
  ],
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` to run an analysis on the text using the default index analyzer associated with the `analyze_sample` index. Alternatively, the analyzer can be derived based on a field mapping.
{
  "field": "obj1.field1",
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` and supply a normalizer for a keyword field if there is a normalizer associated with the specified index.
{
  "normalizer": "my_normalizer",
  "text": "BaR"
}
If you want to get more advanced details, set `explain` to `true`. It will output all token attributes for each token. You can filter token attributes you want to output by setting the `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
{
  "tokenizer": "standard",
  "filter": [
    "snowball"
  ],
  "text": "detailed output",
  "explain": true,
  "attributes": [
    "keyword"
  ]
}
Response examples (200)
A successful response for an analysis with `explain` set to `true`.
{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "standard",
      "tokens": [
        {
          "token": "detailed",
          "start_offset": 0,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "output",
          "start_offset": 9,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    },
    "tokenfilters": [
      {
        "name": "snowball",
        "tokens": [
          {
            "token": "detail",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0,
            "keyword": false
          },
          {
            "token": "output",
            "start_offset": 9,
            "end_offset": 15,
            "type": "<ALPHANUM>",
            "position": 1,
            "keyword": false
          }
        ]
      }
    ]
  }
}

Documentation preview

This is a preview of your version @2025-06-09 which is not yet released.