Bucket correlation aggregation
editBucket correlation aggregation
editThis functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
A sibling pipeline aggregation which executes a correlation function on the configured sibling multibucket aggregation.
Parameters
edit
buckets_path

(Required, string)
Path to the buckets that contain one set of values to correlate.
For syntax, see
buckets_path
Syntax. 
function

(Required, object) The correlation function to execute.
Properties of
function

count_correlation

(Required^{*}, object) The configuration to calculate a count correlation. This function is designed for determining the correlation of a term value and a given metric. Consequently, it needs to meet the following requirements.

The
buckets_path
must point to a_count
metric. 
The total count of all the
bucket_path
count values must be less than or equal toindicator.doc_count
. 
When utilizing this function, an initial calculation to gather the required
indicator
values is required.

The
Properties of
count_correlation

indicator

(Required, object)
The indicator with which to correlate the configured
bucket_path
values.

expectations

(Required, array)
An array of numbers with which to correlate the configured
bucket_path
values. The length of this value must always equal the number of buckets returned by thebucket_path
. 
fractions

(Optional, array)
An array of fractions to use when averaging and calculating variance. This should be used if the precalculated data and the
buckets_path
have known gaps. The length offractions
, if provided, must equalexpectations
. 
doc_count

(Required, integer)
The total number of documents that initially created the
expectations
. It’s required to be greater than or equal to the sum of all values in thebuckets_path
as this is the originating superset of data to which the term values are correlated.

Syntax
editA bucket_correlation
aggregation looks like this in isolation:
Example
editThe following snippet correlates the individual terms in the field version
with the latency
metric. Not shown
is the precalculation of the latency
indicator values, which was done utilizing the
percentiles aggregation.
This example is only using the 10s percentiles.
POST correlate_latency/_search?size=0&filter_path=aggregations { "aggs": { "buckets": { "terms": { "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { "field": "latency", "ranges": [ { "to": 0.0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "bucket_correlation": { "bucket_correlation": { "buckets_path": "latency_ranges>_count", "function": { "count_correlation": { "indicator": { "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775], "doc_count": 200 } } } } } } } } }
The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate the correlation of the term values with the latency. 

The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field. 

The bucket correlation aggregation that calculates the correlation of the number of term values within each range and the previously calculated indicator values. 
And the following may be the response:
{ "aggregations" : { "buckets" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "1.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 1 }, { "key" : "105.0225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 9 }, { "key" : "225.0445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 0 }, { "key" : "445.0665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 0 }, { "key" : "665.0885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 0 }, { "key" : "885.01115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.01335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 20 }, { "key" : "1335.01555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 20 }, { "key" : "1555.01775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 20 }, { "key" : "1775.0*", "from" : 1775.0, "doc_count" : 20 } ] }, "bucket_correlation" : { "value" : 0.8402398981360937 } }, { "key" : "2.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 19 }, { "key" : "105.0225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 11 }, { "key" : "225.0445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 20 }, { "key" : "445.0665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 20 }, { "key" : "665.0885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 20 }, { "key" : "885.01115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.01335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 0 }, { "key" : "1335.01555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 0 }, { "key" : "1555.01775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 0 }, { "key" : "1775.0*", "from" : 1775.0, "doc_count" : 0 } ] }, "bucket_correlation" : { "value" : 0.5759855613334943 } } ] } } }