Term Vectors API
editTerm Vectors API
editTerm Vectors API returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user.
Term Vectors Request
editA TermVectorsRequest
expects an index
, a type
and an id
to specify
a certain document, and fields for which the information is retrieved.
TermVectorsRequest request = new TermVectorsRequest("authors", "1"); request.setFields("user");
Term vectors can also be generated for artificial documents, that is for documents not present in the index:
XContentBuilder docBuilder = XContentFactory.jsonBuilder(); docBuilder.startObject().field("user", "guest-user").endObject(); TermVectorsRequest request = new TermVectorsRequest("authors", docBuilder);
An artificial document is provided as an |
Optional arguments
editrequest.setFieldStatistics(false); request.setTermStatistics(true); request.setPositions(false); request.setOffsets(false); request.setPayloads(false); Map<String, Integer> filterSettings = new HashMap<>(); filterSettings.put("max_num_terms", 3); filterSettings.put("min_term_freq", 1); filterSettings.put("max_term_freq", 10); filterSettings.put("min_doc_freq", 1); filterSettings.put("max_doc_freq", 100); filterSettings.put("min_word_length", 1); filterSettings.put("max_word_length", 10); request.setFilterSettings(filterSettings); Map<String, String> perFieldAnalyzer = new HashMap<>(); perFieldAnalyzer.put("user", "keyword"); request.setPerFieldAnalyzer(perFieldAnalyzer); request.setRealtime(false); request.setRouting("routing");
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set a routing parameter |
Synchronous execution
editWhen executing a TermVectorsRequest
in the following manner, the client waits
for the TermVectorsResponse
to be returned before continuing with code execution:
TermVectorsResponse response = client.termvectors(request, RequestOptions.DEFAULT);
Synchronous calls may throw an IOException
in case of either failing to
parse the REST response in the high-level REST client, the request times out
or similar cases where there is no response coming back from the server.
In cases where the server returns a 4xx
or 5xx
error code, the high-level
client tries to parse the response body error details instead and then throws
a generic ElasticsearchException
and adds the original ResponseException
as a
suppressed exception to it.
Asynchronous execution
editExecuting a TermVectorsRequest
can also be done in an asynchronous fashion so that
the client can return directly. Users need to specify how the response or
potential failures will be handled by passing the request and a listener to the
asynchronous term-vectors method:
The asynchronous method does not block and returns immediately. Once it is
completed the ActionListener
is called back using the onResponse
method
if the execution successfully completed or using the onFailure
method if
it failed. Failure scenarios and expected exceptions are the same as in the
synchronous execution case.
A typical listener for term-vectors
looks like:
Term Vectors Response
editTermVectorsResponse
contains the following information:
String index = response.getIndex(); String type = response.getType(); String id = response.getId(); boolean found = response.getFound();
The index name of the document. |
|
The type name of the document. |
|
The id of the document. |
|
Indicates whether or not the document found. |
Inspecting Term Vectors
editIf TermVectorsResponse
contains non-null list of term vectors,
more information about each term vector can be obtained using the following:
for (TermVectorsResponse.TermVector tv : response.getTermVectorsList()) { String fieldname = tv.getFieldName(); int docCount = tv.getFieldStatistics().getDocCount(); long sumTotalTermFreq = tv.getFieldStatistics().getSumTotalTermFreq(); long sumDocFreq = tv.getFieldStatistics().getSumDocFreq(); if (tv.getTerms() != null) { List<TermVectorsResponse.TermVector.Term> terms = tv.getTerms(); for (TermVectorsResponse.TermVector.Term term : terms) { String termStr = term.getTerm(); int termFreq = term.getTermFreq(); int docFreq = term.getDocFreq(); long totalTermFreq = term.getTotalTermFreq(); float score = term.getScore(); if (term.getTokens() != null) { List<TermVectorsResponse.TermVector.Token> tokens = term.getTokens(); for (TermVectorsResponse.TermVector.Token token : tokens) { int position = token.getPosition(); int startOffset = token.getStartOffset(); int endOffset = token.getEndOffset(); String payload = token.getPayload(); } } } } }
The name of the current field |
|
Fields statistics for the current field - document count |
|
Fields statistics for the current field - sum of total term frequencies |
|
Fields statistics for the current field - sum of document frequencies |
|
Terms for the current field |
|
The name of the term |
|
Term frequency of the term |
|
Document frequency of the term |
|
Total term frequency of the term |
|
Score of the term |
|
Tokens of the term |
|
Position of the token |
|
Start offset of the token |
|
End offset of the token |
|
Payload of the token |