Ttest aggregation
editTtest aggregation
editA t_test
metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student’s tdistribution
under the null hypothesis on numeric values extracted from the aggregated documents. In practice, this
will tell you if the difference between two population means are statistically significant and did not occur by chance alone.
Syntax
editA t_test
aggregation looks like this in isolation:
{ "t_test": { "a": "value_before", "b": "value_after", "type": "paired" } }
Assuming that we have a record of node start up times before and after upgrade, let’s look at a ttest to see if upgrade affected the node start up time in a meaningful way.
GET node_upgrade/_search { "size": 0, "aggs": { "startup_time_ttest": { "t_test": { "a": { "field": "startup_time_before" }, "b": { "field": "startup_time_after" }, "type": "paired" } } } }
The field 

The field 

Since we have data from the same nodes, we are using paired ttest. 
The response will return the pvalue or probability value for the test. It is the probability of obtaining results at least as extreme as the result processed by the aggregation, assuming that the null hypothesis is correct (which means there is no difference between population means). Smaller pvalue means the null hypothesis is more likely to be incorrect and population means are indeed different.
TTest Types
editThe t_test
aggregation supports unpaired and paired twosample ttests. The type of the test can be specified using the type
parameter:

"type": "paired"
 performs paired ttest

"type": "homoscedastic"
 performs twosample equal variance test

"type": "heteroscedastic"
 performs twosample unequal variance test (this is default)
Filters
editIt is also possible to run unpaired ttest on different sets of records using filters. For example, if we want to test the difference
of startup times before upgrade between two different groups of nodes, we use the same field startup_time_before
by separate groups of
nodes using terms filters on the group name field:
GET node_upgrade/_search { "size": 0, "aggs": { "startup_time_ttest": { "t_test": { "a": { "field": "startup_time_before", "filter": { "term": { "group": "A" } } }, "b": { "field": "startup_time_before", "filter": { "term": { "group": "B" } } }, "type": "heteroscedastic" } } } }
The field 

Any query that separates two groups can be used here. 

We are using the same field 

but we are using different filters. 

Since we have data from different nodes, we cannot use paired ttest. 
Populations don’t have to be in the same index. If data sets are located in different
indices, the term filter on the _index
field can be used to select populations.
Script
editIf you need to run the t_test
on values that aren’t represented cleanly
by a field you should, run the aggregation on a runtime field.
For example, if you want to adjust out load times for the before values:
GET node_upgrade/_search { "size": 0, "runtime_mappings": { "startup_time_before.adjusted": { "type": "long", "script": { "source": "emit(doc['startup_time_before'].value  params.adjustment)", "params": { "adjustment": 10 } } } }, "aggs": { "startup_time_ttest": { "t_test": { "a": { "field": "startup_time_before.adjusted" }, "b": { "field": "startup_time_after" }, "type": "paired" } } } }