Achieve faster cardinality aggregations via dynamic pruning

July 27, 2023

Elasticsearch 8.9 introduces a speedup to cardinality aggregations through support for dynamic pruning. This optimization needs specific conditions to be met to kick in, but when it does it often yields spectacular results. We have observed some cardinality aggregations that ran by as much as 1,000x faster with this change.

For instance, computing the number of unique Kubernetes deployments monitored by the Elastic Kubernetes integration benefits from this optimization:

POST metrics-*/_search
{
  "query": { // giving an example query, but any query would work
    "bool": {
      "filter": [
        { "range": { "@timestamp": { "gte": "now-1d/h" } } },
        { "match": { "data_stream.dataset": "kubernetes.pod" } }
      ]
    }
  },
  "size": 0,
  "track_total_hits": false,
  "aggs": {
    "deployment_count": {
      "cardinality": {
        "field": "kubernetes.deployment.name"
      }
    }
  }
}

How does it work?

Dynamic pruning is the process of using index structures to dynamically reduce the set of matches that need to be evaluated when running a query. For instance, if you query for the top 10 events sorted by descending timestamp, start evaluating matches, and find 10 hits whose timestamp is in the last hour, then you can dynamically introduce a filter on the timestamp field to ignore events that are more than one hour old: they have no chance of making it to the top-10.

The optimization to the cardinality aggregation follows a similar idea: once you have seen a value, there is no point in looking at that value again later on since it will not influence the count of unique values of the field. So during query evaluation, the cardinality integration automatically introduces a filter on a disjunctive query that only matches values that haven't been seen so far. As documents with new values get collected, these values get removed from the disjunction.

For instance, imagine that you are computing the cardinality of a field that has two unique values: a and b. The following table lists all matches from the query, with the Lucene doc IDs that match the query in the first column, and the value that is associated with this doc ID in the second column.

Doc ID	Value
3	b
10	b
12	a
19	b
30	a

When it starts evaluating the query, Elasticsearch implicitly adds a filter on a OR b to the main query. After seeing the first match, doc ID 3, value b doesn't need to be seen another time, so the filter will be mutated into a more selective filter that only matches value a. This helps save evaluating doc ID 10, since it has b as a value as well, and jumps directly to the next document that has a as a value: doc ID 12. At this point, a gets removed from the filter and Elasticsearch knows that there is no point in evaluating more matches, since it has already seen all unique values of the field. This helped save evaluating doc IDs 19 and 30.

The first phase of this optimization when a filter is dynamically introduced already helps significantly reduce the number of documents that the query needs to evaluate, which in turn speeds up query evaluation. But this second phase when the query exits when it has seen all unique values is what triggers the most spectacular speedups, as it could help skip the majority of documents of the index. Note that this second phase will not always happen depending on the query — it is possible that some values only exist in documents that do not match the query.

When does it kick in?

Disjunctive queries don't scale well with the number of clauses, so the main limitation of this optimization is that it can only work on fields that have a relatively small cardinality. As a consequence, Elasticsearch only enables this optimization on segments that have no more than 1,024 unique values.

Furthermore, this optimization is only supported on keyword fields to take advantage of the fact that they are indexed with an inverted index and that their doc values give us the number of unique values per segment.

Finally, the cardinality aggregation must be the only aggregation and be at the top level of the aggregation tree.

Conclusion

This optimization was evaluated against dashboards of the Elastic Kubernetes integration where it triggered noticeable speedups to dashboard loading times, especially when working with large amounts of data. In particular, the example query that was shared in the introduction of this blog got a 90% reduction in latency. We hope you'll enjoy the speedup!

What else is new in Elastic 8.9? Check out the 8.9 announcement post to learn more.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Context engineering

Vector database

Search powered applications

Logs

Threat protection

Workflows

Elasticsearch

Kibana (Discover, Dashboards)

Elastic Agent Builder

AutoOps

Piped query language

Jina AI search models

Elastic Cloud Serverless

Elastic Cloud Hosted

Self-managed Elasticsearch

Ecommerce search

Customer support search

Search-driven apps

Log analytics

Infrastructure monitoring

Digital experience monitoring

App performance monitoring

AIOps

LLM observability

Next-gen SIEM

Workflows for security

XDR and endpoint security

AI for security

10x your data's value

Cloud providers

Elastic AI Ecosystem

Search AI Partner Program

AV-Comparatives

Forrester Wave™ XDR

Gartner Magic Quadrant Leader

IDC MarketScape

Search

Security

Observability

Get started

Demo gallery

Downloads

Integrations

Docs

Elasticsearch Labs

Elastic Security Labs

Elastic Observability Labs

Blog

Community

Events

Webinars

Discuss

Training

Support

Consulting

Achieve faster cardinality aggregations via dynamic pruning

How does it work?

When does it kick in?

Conclusion

Share

Sign up for Elastic Cloud free trial