Histogram field type
editHistogram field type
editA field to store preaggregated numerical data representing a histogram. This data is defined using two paired arrays:
Because the elements in the values
array correspond to the elements in the
same position of the count
array, these two arrays must have the same length.

A
histogram
field can only store a single pair ofvalues
andcount
arrays per document. Nested arrays are not supported. 
histogram
fields do not support sorting.
Uses
edithistogram
fields are primarily intended for use with aggregations. To make it
more readily accessible for aggregations, histogram
field data is stored as a
binary doc values and not indexed. Its size in bytes is at most
13 * numValues
, where numValues
is the length of the provided arrays.
Because the data is not indexed, you only can use histogram
fields for the
following aggregations and queries:
 min aggregation
 max aggregation
 sum aggregation
 value_count aggregation
 avg aggregation
 percentiles aggregation
 percentile ranks aggregation
 boxplot aggregation
 histogram aggregation
 range aggregation
 exists query
Building a histogram
editWhen using a histogram as part of an aggregation, the accuracy of the results will depend on how the histogram was constructed. It is important to consider the percentiles aggregation mode that will be used to build it. Some possibilities include:

For the TDigest mode, the
values
array represents the mean centroid positions and thecounts
array represents the number of values that are attributed to each centroid. If the algorithm has already started to approximate the percentiles, this inaccuracy is carried over in the histogram. 
For the High Dynamic Range (HDR) histogram mode, the
values
array represents fixed upper limits of each bucket interval, and thecounts
array represents the number of values that are attributed to each interval. This implementation maintains a fixed worsecase percentage error (specified as a number of significant digits), therefore the value used when generating the histogram would be the maximum accuracy you can achieve at aggregation time.
The histogram field is "algorithm agnostic" and does not store data specific to either TDigest or HDRHistogram. While this means the field can technically be aggregated with either algorithm, in practice the user should chose one algorithm and index data in that manner (e.g. centroids for TDigest or intervals for HDRHistogram) to ensure best accuracy.
Synthetic _source
editSynthetic _source
is Generally Available only for TSDB indices
(indices that have index.mode
set to time_series
). For other indices
synthetic _source
is in technical preview. Features in technical preview may
be changed or removed in a future release. Elastic will work to fix
any issues, but features in technical preview are not subject to the support SLA
of official GA features.
histogram
fields support synthetic _source
in their
default configuration. Synthetic _source
cannot be used together with
ignore_malformed
or copy_to
.
To save space, zerocount buckets are not stored in the histogram doc values. As a result, when indexing a histogram field in an index with synthetic source enabled, indexing a histogram including zerocount buckets will result in missing buckets when fetching back the histogram.
Examples
editThe following create index API request creates a new index with two field mappings:

my_histogram
, ahistogram
field used to store percentile data 
my_text
, akeyword
field used to store a title for the histogram
response = client.indices.create( index: 'myindex000001', body: { mappings: { properties: { my_histogram: { type: 'histogram' }, my_text: { type: 'keyword' } } } } ) puts response
PUT myindex000001 { "mappings" : { "properties" : { "my_histogram" : { "type" : "histogram" }, "my_text" : { "type" : "keyword" } } } }
The following index API requests store preaggregated for
two histograms: histogram_1
and histogram_2
.
response = client.index( index: 'myindex000001', id: 1, body: { my_text: 'histogram_1', my_histogram: { values: [ 0.1, 0.2, 0.3, 0.4, 0.5 ], counts: [ 3, 7, 23, 12, 6 ] } } ) puts response response = client.index( index: 'myindex000001', id: 2, body: { my_text: 'histogram_2', my_histogram: { values: [ 0.1, 0.25, 0.35, 0.4, 0.45, 0.5 ], counts: [ 8, 17, 8, 7, 6, 2 ] } } ) puts response
PUT myindex000001/_doc/1 { "my_text" : "histogram_1", "my_histogram" : { "values" : [0.1, 0.2, 0.3, 0.4, 0.5], "counts" : [3, 7, 23, 12, 6] } } PUT myindex000001/_doc/2 { "my_text" : "histogram_2", "my_histogram" : { "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], "counts" : [8, 17, 8, 7, 6, 2] } }
Values for each bucket. Values in the array are treated as doubles and must be given in increasing order. For TDigest histograms this value represents the mean value. In case of HDR histograms this represents the value iterated to. 

Count for each bucket. Values in the arrays are treated as long integers and must be positive or zero. Negative values will be rejected. The relation between a bucket and a count is given by the position in the array. 