The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory.
The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced.
If more than this limit of tokens gets generated, an error occurs.
The _analyze endpoint without a specified index will always use 10000 as its limit.
Required authorization
- Index privileges:
index
Query parameters
-
Index used to derive the analyzer. If specified, the
analyzeror field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body
Required
-
The name of the analyzer that should be applied to the provided
text. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
Array of token attributes used to filter the output of the
explainparameter. -
Array of character filters used to preprocess characters before the tokenizer.
-
If
true, the response includes token attributes and additional details.Default value is
false. -
Field used to derive the analyzer. To use this parameter, you must specify an index. If specified, the
analyzerparameter overrides this value. -
Array of token filters used to apply after the tokenizer.
External documentation One of: string-1string ApostropheTokenFilter ArabicStemTokenFilter ArabicNormalizationTokenFilter AsciiFoldingTokenFilter BengaliNormalizationTokenFilter BrazilianStemTokenFilter CjkBigramTokenFilter CjkWidthTokenFilter ClassicTokenFilter CommonGramsTokenFilter ConditionTokenFilter CzechStemTokenFilter DecimalDigitTokenFilter DelimitedPayloadTokenFilter DutchStemTokenFilter EdgeNGramTokenFilter ElisionTokenFilter FingerprintTokenFilter FlattenGraphTokenFilter FrenchStemTokenFilter GermanNormalizationTokenFilter GermanStemTokenFilter HindiNormalizationTokenFilter HunspellTokenFilter HyphenationDecompounderTokenFilter IndicNormalizationTokenFilter KeepTypesTokenFilter KeepWordsTokenFilter KeywordMarkerTokenFilter KeywordRepeatTokenFilter KStemTokenFilter LengthTokenFilter LimitTokenCountTokenFilter LowercaseTokenFilter MinHashTokenFilter MultiplexerTokenFilter NGramTokenFilter NoriPartOfSpeechTokenFilter PatternCaptureTokenFilter PatternReplaceTokenFilter PersianNormalizationTokenFilter PersianStemTokenFilter PorterStemTokenFilter PredicateTokenFilter RemoveDuplicatesTokenFilter ReverseTokenFilter RussianStemTokenFilter ScandinavianFoldingTokenFilter ScandinavianNormalizationTokenFilter SerbianNormalizationTokenFilter ShingleTokenFilter SnowballTokenFilter SoraniNormalizationTokenFilter StemmerOverrideTokenFilter StemmerTokenFilter StopTokenFilter SynonymGraphTokenFilter SynonymTokenFilter TrimTokenFilter TruncateTokenFilter UniqueTokenFilter UppercaseTokenFilter WordDelimiterGraphTokenFilter WordDelimiterTokenFilter JaStopTokenFilter KuromojiStemmerTokenFilter KuromojiReadingFormTokenFilter KuromojiPartOfSpeechTokenFilter IcuCollationTokenFilter IcuFoldingTokenFilter IcuNormalizationTokenFilter IcuTransformTokenFilter PhoneticTokenFilter DictionaryDecompounderTokenFilter -
Normalizer to use to convert text into a single token.
tokenizer
string Tokenizer to use to convert text into tokens.
One of: string-1string CharGroupTokenizer ClassicTokenizer EdgeNGramTokenizer KeywordTokenizer LetterTokenizer LowercaseTokenizer NGramTokenizer PathHierarchyTokenizer PatternTokenizer SimplePatternTokenizer SimplePatternSplitTokenizer StandardTokenizer ThaiTokenizer UaxEmailUrlTokenizer WhitespaceTokenizer IcuTokenizer KuromojiTokenizer NoriTokenizer
curl \
--request GET 'http://api.example.com/_analyze' \
--header "Content-Type: application/json" \
--data '"{\n \"analyzer\": \"standard\",\n \"text\": \"this is a test\"\n}"'
{
"analyzer": "standard",
"text": "this is a test"
}
{
"analyzer": "standard",
"text": [
"this is a test",
"the second text"
]
}
{
"tokenizer": "keyword",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"text": "this is a <b>test</b>"
}
{
"tokenizer": "whitespace",
"filter": [
"lowercase",
{
"type": "stop",
"stopwords": [
"a",
"is",
"this"
]
}
],
"text": "this is a test"
}
{
"field": "obj1.field1",
"text": "this is a test"
}
{
"normalizer": "my_normalizer",
"text": "BaR"
}
{
"tokenizer": "standard",
"filter": [
"snowball"
],
"text": "detailed output",
"explain": true,
"attributes": [
"keyword"
]
}
{
"detail": {
"custom_analyzer": true,
"charfilters": [],
"tokenizer": {
"name": "standard",
"tokens": [
{
"token": "detailed",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
}
]
},
"tokenfilters": [
{
"name": "snowball",
"tokens": [
{
"token": "detail",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0,
"keyword": false
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1,
"keyword": false
}
]
}
]
}
}