Percentiles Bucket Aggregationedit

Warning

This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features.

A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.

Syntaxedit

A percentiles_bucket aggregation looks like this in isolation:

{
    "percentiles_bucket": {
        "buckets_path": "the_sum"
    }
}

Table 8. sum_bucket Parameters

Parameter Name

Description

Required

Default Value

buckets_path

The path to the buckets we wish to find the sum for (see buckets_path Syntaxedit for more details)

Required

gap_policy

The policy to apply when gaps are found in the data (see Dealing with gaps in the dataedit for more details)

Optional

skip

format

format to apply to the output value of this aggregation

Optional

null

percents

The list of percentiles to calculate

Optional

[ 1, 5, 25, 50, 75, 95, 99 ]


The following snippet calculates the sum of all the total monthly sales buckets:

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                }
            }
        },
        "sum_monthly_sales": {
            "percentiles_bucket": {
                "buckets_path": "sales_per_month>sales", 
                "percents": [ 25.0, 50.0, 75.0 ] 
            }
        }
    }
}

buckets_path instructs this percentiles_bucket aggregation that we want to calculate percentiles for the sales aggregation in the sales_per_month date histogram.

percents specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentil

And the following may be the response:

{
   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375
               }
            }
         ]
      },
      "percentiles_monthly_sales": {
        "values" : {
            "25.0": 60,
            "50.0": 375",
            "75.0": 550
         }
      }
   }
}

Percentiles_bucket implementationedit

The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not interpolate between data points.

The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of data-points in a single percentiles_bucket.