T-digest field type

A field to store pre-aggregated numerical data constructed using the T-Digest algorithm.

Structure of a `tdigest` field

A tdigest field requires two arrays:

A centroids array of double, containing the computed centroids. These must be provided in ascending order.
A counts array of long, containing the computed counts for each of the centroids. This must be the same length as the centroids array

The field also accepts three optional summary fields:

sum, a double, representing the sum of the values being summarized by the t-digest
min, a double, representing the minimum of the values being summarized by the t-digest
max, a double, representing the maximum of the values being summarized by the t-digest

Specifying the summary values enables them to be calculated with higher accuracy from the raw data. If not specified, Elasticsearch computes them based on the given centroids and counts, with some loss of accuracy.

Limitations

A tdigest field can only store a single sketch per document. Multi-values or nested arrays are not supported.
tdigest fields do not support sorting and are not searchable.

Configuring T-Digest Fields

T-Digest fields accept two field-specific configuration parameters:

compression, a double between 0 and 10000 (excluding 0), which corresponds to the parameter of the same name in the T-Digest algorithm. In general, the higher this number, the more space on disk the field will use but the more accurate the sketch approximations will be. Default is 100
digest_type, which selects the merge strategy to use with the sketch. Valid values are default and high_accuracy. The default is default. The default is optimized for storage and performance, while still producing a good approximation. The high_accuracy variant uses more memory, disk, and CPU for a better approximation.

Use cases

tdigest fields are primarily intended for use with aggregations. To make them efficient for aggregations, the data are stored as compact doc values and not indexed.

tdigest fields are supported in the following ES|QL aggregation functions:

Synthetic `_source`

tdigest fields support synthetic _source in their default configuration.

Note

To save space, zero-count buckets are not stored in tdigest doc values. If you index a tdigest field with zero-count buckets and synthetic _source is enabled, those buckets won't appear when you retrieve the field.

Examples

Create an index with a `tdigest` field

						PUT my-index-000001
					{
  "mappings": {
    "properties": {
      "latency": {
        "type": "tdigest"
      }
    }
  }
}
		
	

Index a simple document

						PUT my-index-000001/_doc/1
					{
  "latency": {
    "centroids": [0.1, 0.2, 0.3, 0.4, 0.5],
    "counts": [3, 7, 23, 12, 6]
  }
}
		
	

Query via ES|QL

						POST /_query?format=txt
					{
	"query": "FROM my-index-000001 | STATS Percentile(latency, 99)"
}