Percentiles Bucket Aggregationedit
A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multibucket aggregation.
Syntaxedit
A percentiles_bucket
aggregation looks like this in isolation:
{ "percentiles_bucket": { "buckets_path": "the_sum" } }
Table 14. percentiles_bucket
Parameters
Parameter Name  Description  Required  Default Value 

 The path to the buckets we wish to find the percentiles for (see  Required  
 The policy to apply when gaps are found in the data (see Dealing with gaps in the data for more details)  Optional 

 format to apply to the output value of this aggregation  Optional 

 The list of percentiles to calculate  Optional 

 Flag which returns the range as an hash instead of an array of keyvalue pairs  Optional 

The following snippet calculates the percentiles for the total monthly sales
buckets:
POST /sales/_search { "size": 0, "aggs" : { "sales_per_month" : { "date_histogram" : { "field" : "date", "calendar_interval" : "month" }, "aggs": { "sales": { "sum": { "field": "price" } } } }, "percentiles_monthly_sales": { "percentiles_bucket": { "buckets_path": "sales_per_month>sales", "percents": [ 25.0, 50.0, 75.0 ] } } } }
 

And the following may be the response:
{ "took": 11, "timed_out": false, "_shards": ..., "hits": ..., "aggregations": { "sales_per_month": { "buckets": [ { "key_as_string": "2015/01/01 00:00:00", "key": 1420070400000, "doc_count": 3, "sales": { "value": 550.0 } }, { "key_as_string": "2015/02/01 00:00:00", "key": 1422748800000, "doc_count": 2, "sales": { "value": 60.0 } }, { "key_as_string": "2015/03/01 00:00:00", "key": 1425168000000, "doc_count": 2, "sales": { "value": 375.0 } } ] }, "percentiles_monthly_sales": { "values" : { "25.0": 375.0, "50.0": 375.0, "75.0": 550.0 } } } }
Percentiles_bucket implementationedit
The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not interpolate between data points.
The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
the implementation maintains an inmemory, sorted list of your data to compute the percentiles, before discarding the
data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
datapoints in a single percentiles_bucket
.