Search API

edit

Search Request

edit

The SearchRequest is used for any operation that has to do with searching documents, aggregations, suggestions and also offers ways of requesting highlighting on the resulting documents.

In its most basic form, we can add a query to the request:

SearchRequest searchRequest = new SearchRequest(); 
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); 
searchSourceBuilder.query(QueryBuilders.matchAllQuery()); 
searchRequest.source(searchSourceBuilder); 

Creates the SeachRequest. Without arguments this runs against all indices.

Most search parameters are added to the SearchSourceBuilder. It offers setters for everything that goes into the search request body.

Add a match_all query to the SearchSourceBuilder.

Add the SearchSourceBuilder to the SeachRequest.

Optional arguments

edit

Let’s first look at some of the optional arguments of a SearchRequest:

SearchRequest searchRequest = new SearchRequest("posts"); 
searchRequest.types("doc"); 

Restricts the request to an index

Limits the request to a type

There are a couple of other interesting optional parameters:

searchRequest.routing("routing"); 

Set a routing parameter

searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen()); 

Setting IndicesOptions controls how unavailable indices are resolved and how wildcard expressions are expanded

searchRequest.preference("_local"); 

Use the preference parameter e.g. to execute the search to prefer local shards. The default is to randomize across shards.

Using the SearchSourceBuilder

edit

Most options controlling the search behavior can be set on the SearchSourceBuilder, which contains more or less the equivalent of the options in the search request body of the Rest API.

Here are a few examples of some common options:

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); 
sourceBuilder.query(QueryBuilders.termQuery("user", "kimchy")); 
sourceBuilder.from(0); 
sourceBuilder.size(5); 
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); 

Create a SearchSourceBuilder with default options.

Set the query. Can be any type of QueryBuilder

Set the from option that determines the result index to start searching from. Defaults to 0.

Set the size option that determines the number of search hits to return. Defaults to 10.

Set an optional timeout that controls how long the search is allowed to take.

After this, the SearchSourceBuilder only needs to be added to the SearchRequest:

SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("posts");
searchRequest.source(sourceBuilder);

Building queries

edit

Search queries are created using QueryBuilder objects. A QueryBuilder exists for every search query type supported by Elasticsearch’s Query DSL.

A QueryBuilder can be created using its constructor:

MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("user", "kimchy"); 

Create a full text Match Query that matches the text "kimchy" over the field "user".

Once created, the QueryBuilder object provides methods to configure the options of the search query it creates:

matchQueryBuilder.fuzziness(Fuzziness.AUTO); 
matchQueryBuilder.prefixLength(3); 
matchQueryBuilder.maxExpansions(10); 

Enable fuzzy matching on the match query

Set the prefix length option on the match query

Set the max expansion options to control the fuzzy process of the query

QueryBuilder objects can also be created using the QueryBuilders utility class. This class provides helper methods that can be used to create QueryBuilder objects using a fluent programming style:

QueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("user", "kimchy")
                                                .fuzziness(Fuzziness.AUTO)
                                                .prefixLength(3)
                                                .maxExpansions(10);

Whatever the method used to create it, the QueryBuilder object must be added to the SearchSourceBuilder as follows:

searchSourceBuilder.query(matchQueryBuilder);

The Building Queries page gives a list of all available search queries with their corresponding QueryBuilder objects and QueryBuilders helper methods.

Specifying Sorting

edit

The SearchSourceBuilder allows to add one or more SortBuilder instances. There are four special implementations (Field-, Score-, GeoDistance- and ScriptSortBuilder).

sourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC)); 
sourceBuilder.sort(new FieldSortBuilder("_uid").order(SortOrder.ASC));  

Sort descending by _score (the default)

Also sort ascending by _id field

Source filtering

edit

By default, search requests return the contents of the document _source but like in the Rest API you can overwrite this behavior. For example, you can turn off _source retrieval completely:

sourceBuilder.fetchSource(false);

The method also accepts an array of one or more wildcard patterns to control which fields get included or excluded in a more fine grained way:

String[] includeFields = new String[] {"title", "user", "innerObject.*"};
String[] excludeFields = new String[] {"_type"};
sourceBuilder.fetchSource(includeFields, excludeFields);

Requesting Highlighting

edit

Highlighting search results can be achieved by setting a HighlightBuilder on the SearchSourceBuilder. Different highlighting behaviour can be defined for each fields by adding one or more HighlightBuilder.Field instances to a HighlightBuilder.

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HighlightBuilder highlightBuilder = new HighlightBuilder(); 
HighlightBuilder.Field highlightTitle =
        new HighlightBuilder.Field("title"); 
highlightTitle.highlighterType("unified");  
highlightBuilder.field(highlightTitle);  
HighlightBuilder.Field highlightUser = new HighlightBuilder.Field("user");
highlightBuilder.field(highlightUser);
searchSourceBuilder.highlighter(highlightBuilder);

Creates a new HighlightBuilder

Create a field highlighter for the title field

Set the field highlighter type

Add the field highlighter to the highlight builder

There are many options which are explained in detail in the Rest API documentation. The Rest API parameters (e.g. pre_tags) are usually changed by setters with a similar name (e.g. #preTags(String ...)).

Highlighted text fragments can later be retrieved from the SearchResponse.

Requesting Aggregations

edit

Aggregations can be added to the search by first creating the appropriate AggregationBuilder and then setting it on the SearchSourceBuilder. In the following example we create a terms aggregation on company names with a sub-aggregation on the average age of employees in the company:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_company")
        .field("company.keyword");
aggregation.subAggregation(AggregationBuilders.avg("average_age")
        .field("age"));
searchSourceBuilder.aggregation(aggregation);

The Building Aggregations page gives a list of all available aggregations with their corresponding AggregationBuilder objects and AggregationBuilders helper methods.

We will later see how to access aggregations in the SearchResponse.

Requesting Suggestions

edit

To add Suggestions to the search request, use one of the SuggestionBuilder implementations that are easily accessible from the SuggestBuilders factory class. Suggestion builders need to be added to the top level SuggestBuilder, which itself can be set on the SearchSourceBuilder.

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
SuggestionBuilder termSuggestionBuilder =
    SuggestBuilders.termSuggestion("user").text("kmichy"); 
SuggestBuilder suggestBuilder = new SuggestBuilder();
suggestBuilder.addSuggestion("suggest_user", termSuggestionBuilder); 
searchSourceBuilder.suggest(suggestBuilder);

Creates a new TermSuggestionBuilder for the user field and the text kmichy

Adds the suggestion builder and names it suggest_user

We will later see how to retrieve suggestions from the SearchResponse.

Profiling Queries and Aggregations

edit

The Profile API can be used to profile the execution of queries and aggregations for a specific search request. in order to use it, the profile flag must be set to true on the SearchSourceBuilder:

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.profile(true);

Once the SearchRequest is executed the corresponding SearchResponse will contain the profiling results.

Synchronous Execution

edit

When executing a SearchRequest in the following manner, the client waits for the SearchResponse to be returned before continuing with code execution:

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

Asynchronous Execution

edit

Executing a SearchRequest can also be done in an asynchronous fashion so that the client can return directly. Users need to specify how the response or potential failures will be handled by passing the request and a listeners to the asynchronous search method:

client.searchAsync(searchRequest, RequestOptions.DEFAULT, listener); 

The SearchRequest to execute and the ActionListener to use when the execution completes

The asynchronous method does not block and returns immediately. Once it is completed the ActionListener is called back using the onResponse method if the execution successfully completed or using the onFailure method if it failed.

A typical listener for SearchResponse looks like:

ActionListener<SearchResponse> listener = new ActionListener<SearchResponse>() {
    @Override
    public void onResponse(SearchResponse searchResponse) {
        
    }

    @Override
    public void onFailure(Exception e) {
        
    }
};

Called when the execution is successfully completed.

Called when the whole SearchRequest fails.

SearchResponse

edit

The SearchResponse that is returned by executing the search provides details about the search execution itself as well as access to the documents returned. First, there is useful information about the request execution itself, like the HTTP status code, execution time or whether the request terminated early or timed out:

RestStatus status = searchResponse.status();
TimeValue took = searchResponse.getTook();
Boolean terminatedEarly = searchResponse.isTerminatedEarly();
boolean timedOut = searchResponse.isTimedOut();

Second, the response also provides information about the execution on the shard level by offering statistics about the total number of shards that were affected by the search, and the successful vs. unsuccessful shards. Possible failures can also be handled by iterating over an array off ShardSearchFailures like in the following example:

int totalShards = searchResponse.getTotalShards();
int successfulShards = searchResponse.getSuccessfulShards();
int failedShards = searchResponse.getFailedShards();
for (ShardSearchFailure failure : searchResponse.getShardFailures()) {
    // failures should be handled here
}

Retrieving SearchHits

edit

To get access to the returned documents, we need to first get the SearchHits contained in the response:

SearchHits hits = searchResponse.getHits();

The SearchHits provides global information about all hits, like total number of hits or the maximum score:

long totalHits = hits.getTotalHits();
float maxScore = hits.getMaxScore();

Nested inside the SearchHits are the individual search results that can be iterated over:

SearchHit[] searchHits = hits.getHits();
for (SearchHit hit : searchHits) {
    // do something with the SearchHit
}

The SearchHit provides access to basic information like index, type, docId and score of each search hit:

String index = hit.getIndex();
String type = hit.getType();
String id = hit.getId();
float score = hit.getScore();

Furthermore, it lets you get back the document source, either as a simple JSON-String or as a map of key/value pairs. In this map, regular fields are keyed by the field name and contain the field value. Multi-valued fields are returned as lists of objects, nested objects as another key/value map. These cases need to be cast accordingly:

String sourceAsString = hit.getSourceAsString();
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
String documentTitle = (String) sourceAsMap.get("title");
List<Object> users = (List<Object>) sourceAsMap.get("user");
Map<String, Object> innerObject =
        (Map<String, Object>) sourceAsMap.get("innerObject");

Retrieving Highlighting

edit

If requested, highlighted text fragments can be retrieved from each SearchHit in the result. The hit object offers access to a map of field names to HighlightField instances, each of which contains one or many highlighted text fragments:

SearchHits hits = searchResponse.getHits();
for (SearchHit hit : hits.getHits()) {
    Map<String, HighlightField> highlightFields = hit.getHighlightFields();
    HighlightField highlight = highlightFields.get("title"); 
    Text[] fragments = highlight.fragments();  
    String fragmentString = fragments[0].string();
}

Get the highlighting for the title field

Get one or many fragments containing the highlighted field content

Retrieving Aggregations

edit

Aggregations can be retrieved from the SearchResponse by first getting the root of the aggregation tree, the Aggregations object, and then getting the aggregation by name.

Aggregations aggregations = searchResponse.getAggregations();
Terms byCompanyAggregation = aggregations.get("by_company"); 
Bucket elasticBucket = byCompanyAggregation.getBucketByKey("Elastic"); 
Avg averageAge = elasticBucket.getAggregations().get("average_age"); 
double avg = averageAge.getValue();

Get the by_company terms aggregation

Get the buckets that is keyed with Elastic

Get the average_age sub-aggregation from that bucket

Note that if you access aggregations by name, you need to specify the aggregation interface according to the type of aggregation you requested, otherwise a ClassCastException will be thrown:

Range range = aggregations.get("by_company"); 

This will throw an exception because "by_company" is a terms aggregation but we try to retrieve it as a range aggregation

It is also possible to access all aggregations as a map that is keyed by the aggregation name. In this case, the cast to the proper aggregation interface needs to happen explicitly:

Map<String, Aggregation> aggregationMap = aggregations.getAsMap();
Terms companyAggregation = (Terms) aggregationMap.get("by_company");

There are also getters that return all top level aggregations as a list:

List<Aggregation> aggregationList = aggregations.asList();

And last but not least you can iterate over all aggregations and then e.g. decide how to further process them based on their type:

for (Aggregation agg : aggregations) {
    String type = agg.getType();
    if (type.equals(TermsAggregationBuilder.NAME)) {
        Bucket elasticBucket = ((Terms) agg).getBucketByKey("Elastic");
        long numberOfDocs = elasticBucket.getDocCount();
    }
}

Retrieving Suggestions

edit

To get back the suggestions from a SearchResponse, use the Suggest object as an entry point and then retrieve the nested suggestion objects:

Suggest suggest = searchResponse.getSuggest(); 
TermSuggestion termSuggestion = suggest.getSuggestion("suggest_user"); 
for (TermSuggestion.Entry entry : termSuggestion.getEntries()) { 
    for (TermSuggestion.Entry.Option option : entry) { 
        String suggestText = option.getText().string();
    }
}

Use the Suggest class to access suggestions

Suggestions can be retrieved by name. You need to assign them to the correct type of Suggestion class (here TermSuggestion), otherwise a ClassCastException is thrown

Iterate over the suggestion entries

Iterate over the options in one entry

Retrieving Profiling Results

edit

Profiling results are retrieved from a SearchResponse using the getProfileResults() method. This method returns a Map containing a ProfileShardResult object for every shard involved in the SearchRequest execution. ProfileShardResult are stored in the Map using a key that uniquely identifies the shard the profile result corresponds to.

Here is a sample code that shows how to iterate over all the profiling results of every shard:

Map<String, ProfileShardResult> profilingResults =
        searchResponse.getProfileResults(); 
for (Map.Entry<String, ProfileShardResult> profilingResult : profilingResults.entrySet()) { 
    String key = profilingResult.getKey(); 
    ProfileShardResult profileShardResult = profilingResult.getValue(); 
}

Retrieve the Map of ProfileShardResult from the SearchResponse

Profiling results can be retrieved by shard’s key if the key is known, otherwise it might be simpler to iterate over all the profiling results

Retrieve the key that identifies which shard the ProfileShardResult belongs to

Retrieve the ProfileShardResult for the given shard

The ProfileShardResult object itself contains one or more query profile results, one for each query executed against the underlying Lucene index:

List<QueryProfileShardResult> queryProfileShardResults =
        profileShardResult.getQueryProfileResults(); 
for (QueryProfileShardResult queryProfileResult : queryProfileShardResults) { 

}

Retrieve the list of QueryProfileShardResult

Iterate over each QueryProfileShardResult

Each QueryProfileShardResult gives access to the detailed query tree execution, returned as a list of ProfileResult objects:

for (ProfileResult profileResult : queryProfileResult.getQueryResults()) { 
    String queryName = profileResult.getQueryName(); 
    long queryTimeInMillis = profileResult.getTime(); 
    List<ProfileResult> profiledChildren = profileResult.getProfiledChildren(); 
}

Iterate over the profile results

Retrieve the name of the Lucene query

Retrieve the time in millis spent executing the Lucene query

Retrieve the profile results for the sub-queries (if any)

The Rest API documentation contains more information about Profiling Queries with a description of the query profiling information

The QueryProfileShardResult also gives access to the profiling information for the Lucene collectors:

CollectorResult collectorResult = queryProfileResult.getCollectorResult();  
String collectorName = collectorResult.getName();  
Long collectorTimeInMillis = collectorResult.getTime(); 
List<CollectorResult> profiledChildren = collectorResult.getProfiledChildren(); 

Retrieve the profiling result of the Lucene collector

Retrieve the name of the Lucene collector

Retrieve the time in millis spent executing the Lucene collector

Retrieve the profile results for the sub-collectors (if any)

The Rest API documentation contains more information about profiling information for Lucene collectors.

In a very similar manner to the query tree execution, the QueryProfileShardResult objects gives access to the detailed aggregations tree execution:

AggregationProfileShardResult aggsProfileResults =
        profileShardResult.getAggregationProfileResults(); 
for (ProfileResult profileResult : aggsProfileResults.getProfileResults()) { 
    String aggName = profileResult.getQueryName(); 
    long aggTimeInMillis = profileResult.getTime(); 
    List<ProfileResult> profiledChildren = profileResult.getProfiledChildren(); 
}

Retrieve the AggregationProfileShardResult

Iterate over the aggregation profile results

Retrieve the type of the aggregation (corresponds to Java class used to execute the aggregation)

Retrieve the time in millis spent executing the Lucene collector

Retrieve the profile results for the sub-aggregations (if any)

The Rest API documentation contains more information about Profiling Aggregations