Hadoop Metricsedit

The Hadoop system records a set of metric counters for each job that it runs. elasticsearch-hadoop extends on that and provides metrics about its activity for each job run by leveraging the Hadoop Counters infrastructure. During each run, elasticsearch-hadoop sends statistics from each task instance, as it is running, which get aggregated by the Map/Reduce infrastructure and are available through the standard Hadoop APIs.

elasticsearch-hadoop provides the following counters, available under org.elasticsearch.hadoop.mr.Counter enum:

Table 11. Available counters

Counter name Purpose

Data focused


Total number of data/communication bytes sent over the network to Elasticsearch


Data/Documents accepted by Elasticsearch in bytes


Data/Documents rejected by Elasticsearch in bytes


Data/Documents received from Elasticsearch in bytes

Document focused


Number of docs sent over the network to Elasticsearch


Number of documents sent and accepted by Elasticsearch


Number of documents sent but rejected by Elasticsearch


Number of documents received from Elasticsearch

Network focused


Number of bulk requests made to Elasticsearch


Number of bulk retries (caused by document rejections)


Number of scroll pulled from Elasticsearch


Number of node fall backs (caused by network errors)


Number of network retries (caused by network errors)

Time focused


Overall time (in ms) spent over the network


Time (in ms) spent over the network by the bulk requests


Time (in ms) spent over the network retrying bulk requests


Time (in ms) spent over the network reading the scroll requests

One can use the counters programatically, depending on the API used, through mapred or mapreduce. Whatever the choice, elasticsearch-hadoop performs automatic reports without any user intervention. In fact, when using elasticsearch-hadoop one will see the stats reported at the end of the job run, for example:

13:55:08,100  INFO main mapreduce.Job - Job job_local127738678_0013 completed successfully
13:55:08,101  INFO main mapreduce.Job - Counters: 35
Elasticsearch Hadoop Counters
    Bulk Retries=0
    Bulk Retries Total Time(ms)=0
    Bulk Total=20
    Bulk Total Time(ms)=518
    Bytes Accepted=159129
    Bytes Sent=159129
    Bytes Received=79921
    Bytes Retried=0
    Documents Accepted=993
    Documents Sent=993
    Documents Received=0
    Documents Retried=0
    Network Retries=0
    Network Total Time(ms)=937
    Node Retries=0
    Scroll Total=0
    Scroll Total Time(ms)=0