Ranking Evaluation APIedit

The rankEval method allows to evaluate the quality of ranked search results over a set of search request. Given sets of manually rated documents for each search request, ranking evaluation performs a multi search request and calculates information retrieval metrics like mean reciprocal rank, precision or discounted cumulative gain on the returned results.

Ranking Evaluation Requestedit

In order to build a RankEvalRequest, you first need to create an evaluation specification (RankEvalSpec). This specification requires to define the evaluation metric that is going to be calculated, as well as a list of rated documents per search requests. Creating the ranking evaluation request then takes the specification and a list of target indices as arguments:

EvaluationMetric metric = new PrecisionAtK();                 
List<RatedDocument> ratedDocs = new ArrayList<>();
ratedDocs.add(new RatedDocument("posts", "1", 1));            
SearchSourceBuilder searchQuery = new SearchSourceBuilder();
searchQuery.query(QueryBuilders.matchQuery("user", "kimchy"));
RatedRequest ratedRequest =                                   
        new RatedRequest("kimchy_query", ratedDocs, searchQuery);
List<RatedRequest> ratedRequests = Arrays.asList(ratedRequest);
RankEvalSpec specification =
        new RankEvalSpec(ratedRequests, metric);              
RankEvalRequest request =                                     
        new RankEvalRequest(specification, new String[] { "posts" });

Define the metric used in the evaluation

Add rated documents, specified by index name, id and rating

Create the search query to evaluate

Combine the three former parts into a RatedRequest

Create the ranking evaluation specification

Create the ranking evaluation request

Synchronous Executionedit

The rankEval method executes `RankEvalRequest`s synchronously:

RankEvalResponse response = client.rankEval(request);

Asynchronous Executionedit

The rankEvalAsync method executes RankEvalRequest`s asynchronously, calling the provided `ActionListener when the response is ready.

client.rankEvalAsync(request, listener); 

The RankEvalRequest to execute and the ActionListener to use when the execution completes

The asynchronous method does not block and returns immediately. Once it is completed the ActionListener is called back using the onResponse method if the execution successfully completed or using the onFailure method if it failed.

A typical listener for RankEvalResponse looks like:

ActionListener<RankEvalResponse> listener = new ActionListener<RankEvalResponse>() {
    public void onResponse(RankEvalResponse response) {

    public void onFailure(Exception e) {

Called when the execution is successfully completed.

Called when the whole RankEvalRequest fails.


The RankEvalResponse that is returned by executing the request contains information about the overall evaluation score, the scores of each individual search request in the set of queries and detailed information about search hits and details about the metric calculation per partial result.

double evaluationResult = response.getEvaluationResult();   
assertEquals(1.0 / 3.0, evaluationResult, 0.0);
Map<String, EvalQueryQuality> partialResults =
EvalQueryQuality evalQuality =
assertEquals("kimchy_query", evalQuality.getId());
double qualityLevel = evalQuality.getQualityLevel();        
assertEquals(1.0 / 3.0, qualityLevel, 0.0);
List<RatedSearchHit> hitsAndRatings = evalQuality.getHitsAndRatings();
RatedSearchHit ratedSearchHit = hitsAndRatings.get(0);
assertEquals("2", ratedSearchHit.getSearchHit().getId());   
MetricDetail metricDetails = evalQuality.getMetricDetails();
String metricName = metricDetails.getMetricName();
assertEquals(PrecisionAtK.NAME, metricName);                
PrecisionAtK.Detail detail = (PrecisionAtK.Detail) metricDetails;
assertEquals(1, detail.getRelevantRetrieved());             
assertEquals(3, detail.getRetrieved());

The overall evaluation result

Partial results that are keyed by their query id

The metric score for each partial result

Rated search hits contain a fully fledged SearchHit

Rated search hits also contain an Optional<Interger> rating that is not present if the document did not get a rating in the request

Metric details are named after the metric used in the request

After casting to the metric used in the request, the metric details offers insight into parts of the metric calculation