T-digest field type
A field to store pre-aggregated numerical data constructed using the T-Digest algorithm.
A tdigest field requires two arrays:
- A
centroidsarray ofdouble, containing the computed centroids. These must be provided in ascending order. - A
countsarray oflong, containing the computed counts for each of the centroids. This must be the same length as thecentroidsarray
The field also accepts three optional summary fields:
sum, adouble, representing the sum of the values being summarized by the t-digestmin, adouble, representing the minimum of the values being summarized by the t-digestmax, adouble, representing the maximum of the values being summarized by the t-digest
Specifying the summary values enables them to be calculated with
higher accuracy from the raw data. If not specified, Elasticsearch
computes them based on the given centroids and counts, with some loss of
accuracy.
- A
tdigestfield can only store a single sketch per document. Multi-values or nested arrays are not supported. tdigestfields do not support sorting and are not searchable.
T-Digest fields accept two field-specific configuration parameters:
compression, adoublebetween0and10000(excluding0), which corresponds to the parameter of the same name in the T-Digest algorithm. In general, the higher this number, the more space on disk the field will use but the more accurate the sketch approximations will be. Default is100digest_type, which selects the merge strategy to use with the sketch. Valid values aredefaultandhigh_accuracy. The default isdefault. Thedefaultis optimized for storage and performance, while still producing a good approximation. Thehigh_accuracyvariant uses more memory, disk, and CPU for a better approximation.
tdigest fields are primarily intended for use with aggregations. To make them
efficient for aggregations, the data are stored as compact doc
values and not
indexed.
tdigest fields are supported in the following ES|QL aggregation functions:
tdigest fields support synthetic _source in their default configuration.
To save space, zero-count buckets are not stored in tdigest doc values. If you index a tdigest field with zero-count buckets and synthetic _source is enabled, those buckets won't appear when you retrieve the field.
PUT my-index-000001
{
"mappings": {
"properties": {
"latency": {
"type": "tdigest"
}
}
}
}
PUT my-index-000001/_doc/1
{
"latency": {
"centroids": [0.1, 0.2, 0.3, 0.4, 0.5],
"counts": [3, 7, 23, 12, 6]
}
}
POST /_query?format=txt
{
"query": "FROM test | STATS Percentile(99, latency)"
}