The analyze API performs analysis on a text string and returns the resulting tokens.
Generating excessive amount of tokens may cause a node to run out of memory.
The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced.
If more than this limit of tokens gets generated, an error occurs.
The _analyze endpoint without a specified index will always use 10000 as its limit.
##Required authorization
- Index privileges:
index
Path parameters
-
Index used to derive the analyzer. If specified, the
analyzeror field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Query parameters
-
Index used to derive the analyzer. If specified, the
analyzeror field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
Body
-
The name of the analyzer that should be applied to the provided
text. This could be a built-in analyzer, or an analyzer that’s been configured in the index. -
Array of token attributes used to filter the output of the
explainparameter. -
Array of character filters used to preprocess characters before the tokenizer.
-
If
true, the response includes token attributes and additional details. -
Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
Array of token filters used to apply after the tokenizer.
One of: TokenFilterstring ApostropheTokenFilterobject ArabicNormalizationTokenFilterobject AsciiFoldingTokenFilterobject CjkBigramTokenFilterobject CjkWidthTokenFilterobject ClassicTokenFilterobject CommonGramsTokenFilterobject ConditionTokenFilterobject DecimalDigitTokenFilterobject DelimitedPayloadTokenFilterobject EdgeNGramTokenFilterobject ElisionTokenFilterobject FingerprintTokenFilterobject FlattenGraphTokenFilterobject GermanNormalizationTokenFilterobject HindiNormalizationTokenFilterobject HunspellTokenFilterobject HyphenationDecompounderTokenFilterobject IndicNormalizationTokenFilterobject KeepTypesTokenFilterobject KeepWordsTokenFilterobject KeywordMarkerTokenFilterobject KeywordRepeatTokenFilterobject KStemTokenFilterobject LengthTokenFilterobject LimitTokenCountTokenFilterobject LowercaseTokenFilterobject MinHashTokenFilterobject MultiplexerTokenFilterobject NGramTokenFilterobject NoriPartOfSpeechTokenFilterobject PatternCaptureTokenFilterobject PatternReplaceTokenFilterobject PersianNormalizationTokenFilterobject PorterStemTokenFilterobject PredicateTokenFilterobject RemoveDuplicatesTokenFilterobject ReverseTokenFilterobject ScandinavianFoldingTokenFilterobject ScandinavianNormalizationTokenFilterobject SerbianNormalizationTokenFilterobject ShingleTokenFilterobject SnowballTokenFilterobject SoraniNormalizationTokenFilterobject StemmerOverrideTokenFilterobject StemmerTokenFilterobject StopTokenFilterobject SynonymGraphTokenFilterobject SynonymTokenFilterobject TrimTokenFilterobject TruncateTokenFilterobject UniqueTokenFilterobject UppercaseTokenFilterobject WordDelimiterGraphTokenFilterobject WordDelimiterTokenFilterobject JaStopTokenFilterobject KuromojiStemmerTokenFilterobject KuromojiReadingFormTokenFilterobject KuromojiPartOfSpeechTokenFilterobject IcuCollationTokenFilterobject IcuFoldingTokenFilterobject IcuNormalizationTokenFilterobject IcuTransformTokenFilterobject PhoneticTokenFilterobject DictionaryDecompounderTokenFilterobject -
Normalizer to use to convert text into a single token.
tokenizer
string | object One of: Tokenizerstring CharGroupTokenizerobject ClassicTokenizerobject EdgeNGramTokenizerobject KeywordTokenizerobject LetterTokenizerobject LowercaseTokenizerobject NGramTokenizerobject PathHierarchyTokenizerobject PatternTokenizerobject SimplePatternTokenizerobject SimplePatternSplitTokenizerobject StandardTokenizerobject ThaiTokenizerobject UaxEmailUrlTokenizerobject WhitespaceTokenizerobject IcuTokenizerobject KuromojiTokenizerobject NoriTokenizerobject
GET /_analyze
{
"tokenizer": "standard",
"filter" : [
"lowercase",
{
"type": "synonym_graph",
"synonyms": ["pc => personal computer", "computer, pc, laptop"]
}
],
"text" : "Check how PC synonyms work"
}
curl \
--request POST 'http://api.example.com/{index}/_analyze' \
--header "Content-Type: application/json" \
--data '"{\n \"tokenizer\": \"standard\",\n \"filter\" : [\n \"lowercase\",\n {\n \"type\": \"synonym_graph\",\n \"synonyms\": [\"pc =\u003e personal computer\", \"computer, pc, laptop\"]\n }\n ],\n \"text\" : \"Check how PC synonyms work\"\n}"'
{
"tokenizer": "standard",
"filter" : [
"lowercase",
{
"type": "synonym_graph",
"synonyms": ["pc => personal computer", "computer, pc, laptop"]
}
],
"text" : "Check how PC synonyms work"
}
{
"analyzer": "standard",
"text": "this is a test"
}
{
"analyzer": "standard",
"text": [
"this is a test",
"the second text"
]
}
{
"tokenizer": "keyword",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
],
"text": "this is a <b>test</b>"
}
{
"tokenizer": "whitespace",
"filter": [
"lowercase",
{
"type": "stop",
"stopwords": [
"a",
"is",
"this"
]
}
],
"text": "this is a test"
}
{
"field": "obj1.field1",
"text": "this is a test"
}
{
"normalizer": "my_normalizer",
"text": "BaR"
}
{
"tokenizer": "standard",
"filter": [
"snowball"
],
"text": "detailed output",
"explain": true,
"attributes": [
"keyword"
]
}
{
"detail": {
"custom_analyzer": true,
"charfilters": [],
"tokenizer": {
"name": "standard",
"tokens": [
{
"token": "detailed",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
}
]
},
"tokenfilters": [
{
"name": "snowball",
"tokens": [
{
"token": "detail",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0,
"keyword": false
},
{
"token": "output",
"start_offset": 9,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1,
"keyword": false
}
]
}
]
}
}