Elasticsearch downsampling methods: last-value vs. aggregate sampling

Elasticsearch downsampling now gives you a choice: last-value sampling for maximum storage savings or aggregate sampling for precise rate calculations and counter resets, both fully queryable in ES|QL.

Elasticsearch allows you to index data quickly and in a flexible manner. Try it free in the cloud or run it locally to see how easy indexing can be.

Elasticsearch downsampling cuts time series storage by 94% in our OTel benchmark, and as of 9.4 it's fully queryable in ES|QL. For each metric type you now choose: last-value keeps one observation per bucket for maximum storage savings, aggregate keeps min/max/sum/count and preserves counter resets for accurate rate calculations. Both methods support histograms. Any ES|QL dashboard built on raw time series data runs unchanged on downsampled data; the trade-off is per-bucket averages instead of raw values.

Downsampling (available since Elasticsearch 8.7) shrinks the footprint of your time series data by summarising data points into broader time buckets. It frees up storage and speeds up queries by orders of magnitude. Recently, our engineering focus has shifted from simply optimising the underlying downsampling engine to expanding what it can do. The new features give you more control over how your data is summarised.

In this post, we’ll explore the new downsampling features including:

  • A choice between two distinct sampling methods: lightweight last-value sampling (for maximum storage savings) and high-fidelity aggregate sampling (for precise mathematical accuracy, such as counter resets).
  • Expanded support for new metric types, including histograms.
  • Full ES|QL support for downsampled gauges.

To begin, here is a quick review of the terminology used for time series data streams (TSDS):

  • Metrics are the actual measurements that change over time, such as CPU usage.
  • Dimensions are the identifying names and values associated with a measurement, which collectively determine the unique time series ID (_tsid).
  • The timestamp marks the exact moment a measurement was taken.
  • Finally, a (downsampled) bucket represents the result of reducing a metric's values across a specified time interval for a single _tsid.

How does downsampling work?

The downsampling process is initiated via the downsample API and can be automated using ILM (Index Lifecycle Management) or data stream lifecycle which downsamples and replaces the index with the raw data after the indexing has finished.

Since downsampling operates on a whole backing index (which must be read-only), and because the backing indices are time-bound for time series data, the system can generate the downsampled buckets from the data in a single index.

The downsampling task is optimised for efficiency: it reads all documents sorted by their time series dimensions and their timestamp in descending order. This specific sorting ensures that all data points belonging to a single time series bucket are collected sequentially without interleaving with other time series data. Once all documents that contribute to a single bucket have been read, their field values are collected for summarisation.

We will use the following data to show the effects of downsampling throughout this post. This data represents two nodes reporting their CPU usage and their number of requests every 10 seconds.

Let’s see what downsampling (up to version 9.3) produces from this data using a 10-minute interval:

A time series is uniquely identified by its dimension values, the node field in our example. Therefore, all documents summarised into a single bucket will share the same dimension values, meaning only one instance of the dimension values needs to be stored per bucket.

The timestamp and metrics are the fields whose values vary per document. For the timestamp (@timestamp), a rounding operation is performed to align it with the beginning of the bucket interval. For instance, in our example, the resulting timestamps are normalised to 2025-09-08T21:20:00.000Z (UTC).

For the metrics, up to 9.3 we used to downsample them based on their metric type, for gauges (cpu.usage) we stored the min, max, sum, and count of the encountered values and for counters (requests) the observed last value.

As you can see, up to 9.3 we effectively used a last value sampling method for counters, and the aggregate method for gauges. From 9.4 onward, for each field type, both sampling methods are available for all metric types and you get to choose which one best fits your data and the available system resources.

How downsampling sampling methods work

Different use cases require different trade-offs. For maximum storage reduction, last-value sampling keeps only the most recent observation per bucket. For accurate aggregations, the aggregate method keeps min, max, sum and count. Some applications demand maximum storage reduction and fast downsampling. Other use cases prioritise results that retain the highest possible fidelity to the original data, optimising for accuracy over sheer speed or space savings.

For this reason, in versions 9.3 and 9.4 we worked on offering two distinct ways of downsampling metrics. In 9.3 we introduced a new sampling method called last value and in 9.4 we differentiated the way we downsample counters between the last value and the aggregate method.

Last value sampling method

The last value method consistently downsamples data across all field types. For each time series bucket, it creates a single document. This document is timestamped with the start of the bucket, and all fields retain the last observed value from that period. The fact that it only needs one value makes it very efficient since it does not need to go over all values. Looking at our previous example, the downsampled documents look like this:

While this method sacrifices data accuracy by discarding data points, it is a standard practice in time series solutions. Its primary benefit is preserving long-term trends while lowering the cost of data storage and querying. It is also lightweight, reducing the resources needed to generate downsampled data.

Aggregate sampling method

The aggregate sampling method does not skip any metric values, it collects all of them and then summarises them appropriately. The aggregate method processes each metric type differently, as explained below. Our example data, when downsampled with the aggregate method:

In the next sections we will see how the aggregate method summarises each metric type.

Gauges

Gauges are a fundamental type of metric. Unlike counters or histograms, their values can both increase and decrease, reflecting the current state of a system component (in our example, cpu.usage). A single value per bucket isn't enough to keep gauge aggregations accurate, since gauge values fluctuate. Instead, we track multiple statistical aggregates over each downsample interval:

  1. min tracks the lowest value recorded for the gauge within the aggregation interval.
  2. max tracks the highest value recorded for the gauge within the aggregation interval.
  3. sum is the arithmetic sum of all individual gauge readings taken during the interval.
  4. value_count tracks the total number of individual gauge measurements that contributed to the aggregation interval.

With these four statistics, we can answer the direct aggregations exactly (we know the true minimum, maximum, sum and count) and compute the average over the interval by dividing the recorded sum by the value_count. This approach preserves the shape of the gauge's behaviour over time, regardless of the downsampling interval.

In our example, we see that for node-0001 the downsampled document is stored in the format of an aggregate metric double:

Some operations such as value filtering or standard deviation aren't covered by the summary statistics. In 9.4, we address this by using the average value in ES|QL. This is a major milestone because now the pre-aggregated gauges can be fully supported by ES|QL. Any ES|QL dashboard built on top of raw data can now use downsampled data with no errors, just with the expected loss of accuracy from using the average value per downsample interval.

We chose the average as the most representative signal we have available for the original samples in each downsample interval. For example, let’s think of the following query:

The first query is applied to the original data and uses the individual data points. The second query is applied on the aggregated cpu usage values so we do not have individual data points. We could use min or max, but then the values used would be skewed towards the min or the max point. For this reason, we decided to use the average that would better capture the values within an interval.

Counters

Counters (requests in our examples), which are measurements that only ever increase (for cumulative temporality), seem straightforward for downsampling: just keeping a single value should be enough. However, when a process restarts, the counter value resets to zero.

The most common aggregation on a counter is its rate of change. Missing a counter reset when downsampling can drastically skew this rate, leading to inaccurate monitoring. Therefore, our downsampling process ensures that the rate calculation algorithm can still detect a reset, even when analysing downsampled data.

A rate algorithm detects a reset when a counter's current value is lower than its previous recorded value. To maintain accuracy in downsampled data, we need to ensure that the last value before the reset is preserved, and the next value seen is correctly compared against it.

So, let’s see how the counter requests in a downsampled bucket with reset look like:

The main downsampled bucket stores the first counter value observed in that time frame. We choose the first value because it is the closest to the bucket's start timestamp. In addition to that, when a counter reset occurs within this time bucket, we store auxiliary documents with their original timestamps that hold the values to preserve the reset event:

  1. Last value before reset is stored in an auxiliary document. This records the maximum value the counter reached before the reset event.
  2. Value after reset (conditional) may also be stored in a second auxiliary document. This is optional because if the first value of the next downsampled bucket is already lower than the stored pre-reset value, the rate algorithm can infer the reset without needing this intermediate data point.

Let’s consider a more elaborate example, we have a stream of counter values within a time window, simulating a reset:

Bucket 1: 1000 1003 1010 1040 1060 (reset) 20 30 40 70 80

Bucket 2: 90 ...

This sequence results in the following downsampled documents:

  • Downsampled bucket 1: stores the first value (1000).
  • Auxiliary document: stores the pre-reset value (1060).
  • Post-reset value: we do not need to store value 20, because the next downsampled bucket's starting value (90) is already lower than the pre-reset value (1060). The rate calculation will correctly detect the reset by comparing 90 against 1060.
  • Next downsampled bucket 2: stores the first value (90).

The total change (delta) in the counter value is calculated via:

delta=(preResetMaxbucket1)+bucket2delta = (preResetMax - bucket_1) + bucket_2

Notice that it remains the same for the original and the aggregated data because we have all the values available in both data sets: (10601000)+90=150(1060 - 1000) + 90 = 150. In comparison, the last value misses the increase within the first bucket and only observes an increase of 90, 40% less.

Although the delta is accurately observed, the final downsampled rate may not be exactly the same as the raw data rate because of the rate extrapolation and interpolation. Still, this method keeps the result close to the raw rate, even when counters reset.

How aggregate method handles histogram metrics

The aggregate downsampling method merges histogram metrics into a single representative histogram per bucket, preserving distribution shape while reducing volume. The process depends on the histogram type.

For exponential histograms, which are designed for efficient storage of data with a wide range, their values and counts are aggregated and merged into a single, representative exponential histogram. This maintains the proportional distribution and statistical characteristics of the original datasets.

Conversely, for the older histogram field type and the newer tdigest field type, the merging is performed using the TDigest algorithm. This ensures that essential metrics derived from the distributions, such as the median or the 95th percentile, remain statistically reliable after the downsampling process is complete.

Let’s see an example with exponential histograms:

As you can see, the downsampled histogram is capturing the values of all three histograms and not just the last one as the last value method would have done.

Note: the histogram field type does not record which algorithm was used to build it. Histograms built with TDigest are downsampled correctly with the aggregate method. Histograms built with High Dynamic Range (HDR) are not, because the aggregate method cannot detect that. For HDR data, use the last value method instead.

How to configure Elasticsearch downsampling methods

You can configure the sampling method of a time series data stream using either data stream lifecycle or ILM:

Note: all downsampling actions in a single ILM policy should have the same sampling method configured.

How do I switch between downsampling methods?

It is possible to switch from one sampling method to another, but a downsampled index can be downsampled further only with the same sampling method.

The data stream lifecycle takes care of this, so if you change the sampling method in a data stream lifecycle configuration, the data stream lifecycle will apply it to the original indices, but if there are already downsampled indices it will continue applying the compatible one.

On the other hand, if you are using ILM and the policy has more than one downsampling action, it is recommended to create a new policy with the new sampling method. This way, the existing indices will continue downsampling with the old sampling method and the new ones will transition to the new one. More specifically, we recommend the following steps:

  1. Copy the ILM policy and change the sampling method in the downsampling actions.
  2. Update the relevant index templates to use the new ILM policy.

How do last-value and aggregate downsampling compare?

Now let’s compare the two methods on more realistic data. We generated an hour of OTel metrics from a receiver. This was done using metricsgenreceiver with the following configuration:

  • scenario: hostmetricsreceiver
  • interval: 5s

Next, we downsampled the data into 10-minute buckets using both sampling methods. We then compared key performance characteristics, including the resulting document count, the size of the dataset (after being force-merged into a single segment), and the time taken for the downsampling process.

Doc countData set sizeDuration
Raw11509670176.2 MB
Aggregate(-98.83%) 134370(-93.64%) 11.2 MB2.8s
Last value(-98.89%) 127170(-94.72%) 9.3 MB(-7.14%) 2.6s

Both downsampling methods (last value and aggregate) significantly reduce the initial 1.51 million documents by 98%. However, the aggregate method retains slightly more documents to account for counter resets.

When considering the overall data set size, both methods show a substantial reduction compared to the raw data, but a more noticeable difference emerges between the two: the last value method achieves a greater size reduction than the aggregate method. This difference is not solely due to counter resets; it's also because the aggregate method stores more detailed information per gauge to maintain the accuracy of common aggregations.

The last value downsampling method is also faster, completing the process 7% sooner than the aggregate method. This efficiency is because the last value method does not need to collect every value; after the last value is found, the rest are ignored.

The last value method wins on data size and speed; the aggregate method is more accurate on queries, which is its main advantage.

The following section will demonstrate this with three ES|QL query examples using the recently introduced TS command which uses time series semantics and lets us use time series aggregations to examine how time series change over time. These queries will run against all three indices (raw, last value, and aggregate) to compare how the downsampled data relates to the raw data concerning query accuracy.

Gauges

For downsampling gauges, we will compare the following time series aggregations:

  • The minimum of min_over_time
  • The maximum of max_over_time
  • The average of avg_over_time
  • The standard deviation (std_dev) of last_over_time
  • The standard deviation (std_dev) of avg_over_time
minmaxavgstd_dev of last valueStd_dev of avg
Raw data05523.33690.793720.04318
Aggregate05523.33690.132530.04318
Last value04118.52610.793720.46364

The aggregate sampling method preserves accuracy for most results, such as min, max, and average, because it uses pre-aggregated data from the buckets. This data accurately reflects the raw data for all metrics except the standard deviation of the last value.

Since the pre-aggregated data does not include the raw last value, the system defaults to using the average value for the bucket to calculate the last value over time. This substitution reduces the overall variance and, consequently, the standard deviation compared to the raw data.

Conversely, the calculation based on the last value alone loses accuracy for min, max, and average because it omits certain data points. However, this method maintains the accuracy of the standard deviation of the last value query, as both rely on the same single last value within the bucket.

Counters

For counters, we will focus solely on the rate, as it is the most practical aggregation. Specifically, we will query the minimum and maximum rate observed across the time series.

Min rateMax rate
Raw332214509.24381462158.55
Aggregate322801874.77370653936.12
Last value11.78370653936.71

The maximum rate observed closely matches that of the raw data for both sampling methods. However, the minimum rate detected shows a significant difference because the last value method results in the loss of all reset information calculating a significantly lower value.

Choosing the right downsampling method

As we demonstrated in the previous sections, downsampling can substantially reduce data size and improve query performance, regardless of the sampling method used. The default sampling method is still the aggregate method, so accuracy is preserved by default. However, if your primary concern is reducing resource consumption during the downsampling process and in the resulting data, you now have the option to accept a slight decrease in accuracy for a more resource-efficient outcome.

What's next for Elasticsearch downsampling?

Three areas are in active development: sparse data performance, layered downsampling, and lifting the read-only index requirement.

Performance improvements for sparse data

Downsampling has been optimised keeping in mind that all the documents of a time series data stream will have the same fields. For example, in a data stream containing node metrics, we expected that all nodes would be defined by the same set of dimensions and all documents would contain the same metrics about these nodes.

However, in practice we see that this is not always the case, and quite often a time series data stream contains documents that represent measurements, that are defined by different sets of dimensions and contain different metrics. For example, the same data stream could have kubernetes metrics but also application metrics.

This is reflected in recent improvements, since 9.3, time series indices use doc values skippers, a form of sparse indices, instead of inverted indices and BKD trees, allowing them to be very efficient in this case. Following this direction, we will focus on improving the downsampling algorithm to use optimisation opportunities that this type of data provides.

The future of downsampling

We believe the improvements described above enable adoption of downsampling for most metrics applications. Three constraints still shape how teams use it:

  • downsampling depends on ILM or data stream lifecycle for automation,
  • requires the source index to be read-only,
  • and replaces the raw data once complete.

We plan to address all three with a more flexible downsampling solution built around multiple downsampling layers.

Layers will lift the requirement to delete the original data, so you can start downsampling much earlier and query the pre-aggregated data even for the most recent information.

Multiple layers will also let you downsample at different granularities and manage each layer separately. For example, you could move the original data to the frozen tier sooner and keep only the pre-aggregated data in the hot tier, balancing cost and query performance per layer.

We expect to share design details and an early preview of the layered model in an upcoming post.

Questions fréquentes

How do I reduce Elasticsearch storage costs for time series metrics?

Use Elasticsearch downsampling, available since 8.7. On our OTel benchmark it cuts time series storage by ~94% (from 176 MB to 11 MB) by replacing individual metric documents with aggregated buckets. As of 9.4 you can choose between two sampling methods to trade storage for accuracy.

Does downsampling break my ES|QL queries or dashboards?

No. As of Elasticsearch 9.4, downsampled gauges are fully supported in ES|QL by substituting the per-bucket average for the raw value. Any ES|QL dashboard built on raw time series data runs unchanged on downsampled data, with the expected accuracy loss from using interval averages.

What is the difference between last-value and aggregate downsampling in Elasticsearch?

Last-value sampling keeps a single value per bucket (the most recent observation), giving maximum storage savings and faster downsampling. Aggregate sampling stores min, max, sum, value_count, and preserves counter resets, giving better query accuracy at slightly higher storage cost. In Elasticsearch 9.4 both methods are available for every metric type.

How does Elasticsearch handle counter resets when downsampling?

The aggregate sampling method preserves counter resets by storing auxiliary documents alongside the main downsampled bucket: the pre-reset value, and optionally the post-reset value if the next bucket's first value isn't already lower. This keeps rate calculations more accurate even when a counter resets mid-bucket.

When should I pick last-value over aggregate downsampling?

Choose last-value when storage and downsampling speed matter more than precise aggregations, for example long-term trend monitoring where individual bucket statistics don't need to be exact. Choose aggregate when min, max, average, or rate calculations need to closely match the raw data, especially for counters with frequent resets.

Can Elasticsearch downsample OpenTelemetry histogram metrics?

Yes, as of 9.4. The aggregate method supports exponential histograms merging into a single representative histogram, and tdigest field histograms merging using the TDigest algorithm. For the older histogram field type built with High Dynamic Range (HDR), use the last-value method instead, because the aggregate method cannot merge HDR histograms.

Can I switch downsampling methods on an existing time series data stream?

Yes. With data stream lifecycle, the new method is applied to original indices and the existing method continues on already-downsampled indices. With ILM, if your policy has more than one downsampling action, create a new policy with the new sampling method and update the relevant index templates instead of editing the original policy.

Ce contenu vous a-t-il été utile ?

Pas utile

Plutôt utile

Très utile

Pour aller plus loin

Prêt à créer des expériences de recherche d'exception ?

Une recherche suffisamment avancée ne se fait pas avec les efforts d'une seule personne. Elasticsearch est alimenté par des data scientists, des ML ops, des ingénieurs et bien d'autres qui sont tout aussi passionnés par la recherche que vous. Mettons-nous en relation et travaillons ensemble pour construire l'expérience de recherche magique qui vous permettra d'obtenir les résultats que vous souhaitez.

Jugez-en par vous-même