How to

Improving the performance of high-cardinality terms aggregations

Introduction

An Elasticsearch terms aggregation is used to create buckets corresponding to unique values of a given field. For example, a terms aggregation on a field containing country names would create a bucket for USA, a bucket for Canada, a bucket for Spain, and so on. Under normal circumstances, terms aggregations are very fast, however in some exceptional cases they may be slow. One reason for slow terms aggregations may be a mis-configured cluster. Another reason for poor performance may be high-cardinality values on the field which a terms aggregation is executed on.

In this blog post, I’ll first give a brief overview of general instructions that should be followed to ensure the best performance of an Elasticsearch cluster. This is then followed by several sections that present background material that will help to understand the underlying mechanics of terms aggregations, including (1) a definition of high cardinality, (2) a description of the refresh interval, and (3) a description of global ordinals. Next, I’ll show how to view the impact of building global ordinals on terms aggregation performance. Finally, I’ll present several techniques to improve the performance of high-cardinality terms aggregations, including (1) time-based indices, (2) eager global ordinals, and (3) techniques to prevent Elasticsearch from building global ordinals.

In one instance, the techniques documented in this blog post were able to reduce the execution time of a high-cardinality terms aggregation that was running at a very large retail bank, from 15 seconds to below 15 milliseconds.

General suggestions

Tuning a cluster can have a large impact on overall cluster performance. The following Elasticsearch documentation provides details on configuring and tuning an Elasticsearch cluster, and should be followed:

Additional information on tuning slow queries can be found in this blog about advanced Elasticsearch tuning.

In addition to the above, having too many shards is a common cause of performance problems. This blog about sharding gives good rules of thumb to follow.

The remainder of this blog focuses specifically on understanding and tuning terms aggregations.

Cardinality

The performance of terms aggregations can be greatly impacted by the cardinality of the field that is being aggregated. Cardinality refers to the uniqueness of values stored in a particular field. High cardinality means that a field contains a large percentage of unique values. Low cardinality means that a field contains a lot of repeated values. For example, a field storing country names will be relatively low cardinality since there are less than two hundred countries in the world. Alternatively, a field storing IBAN numbers or email addresses is high cardinality since there may be millions of unique values stored.

When discussing high cardinality in this blog post, we are referring to fields with hundreds of thousands or millions of unique values.

Elasticsearch refresh interval

In order to understand the remainder of this blog, we must have a general understanding of the refresh interval.

As documents are inserted into Elasticsearch, they are written into a buffer and then periodically flushed from that buffer into segments. This flush operation is known as a refresh, and newly inserted documents are only searchable after a refresh. By default refreshes occur every second, however the refresh interval is configurable.

The refresh interval is relevant to performance because in the background, Elasticsearch merges small segments into larger segments, and those larger segments are merged into even larger segments, and so on. Therefore, by enabling frequent refreshes, Elasticsearch needs to do more background work merging small segments than it would need to do with less frequent refreshes which would create larger segments.

While frequent refreshes are necessary if near real-time search functionality is required for newly inserted data, such frequent refreshes may not be necessary in other use cases. If an application can wait longer for recent data to appear in its search results, then the refresh interval may be increased in order to improve the efficiency of data ingestion, which in turn should free up resources to help overall cluster performance.

Global Ordinals

Terms aggregations rely on global ordinals to improve efficiency. Global ordinals is a data-structure that maintains an incremental numbering for each unique term for a given field. Global ordinals are computed on each shard, and by default subsequent terms aggregations will rely purely on those global ordinals to efficiently perform the aggregation at the shard level.  Global ordinals are then converted  to the real term for the final reduce phase, which combines results from different shards. If a shard is modified, then new global ordinals will need to be calculated for that shard.

The performance of terms aggregations on high-cardinality fields may be slow and unpredictable in-part because (1) the time taken to build global ordinals will increase as the cardinality of the field increases, and (2) by default, global ordinals are lazily built on the first aggregation that occurs since the previous refresh. Furthermore, a combination of frequent document insertions, frequent refreshes, and frequently executed terms aggregations would cause global ordinals to be frequently recomputed. 

Additional details on global ordinal performance can be found in this GitHub issue.

How to view the impact of global ordinals on terms aggregations

Elasticsearch log files are very helpful for detecting performance issues. Set log levels to appropriate values, keeping in mind that excessive logging may increase disk IO and could negatively impact performance. Monitor the log file called elasticsearch_index_search_slowlog.log. If slow terms aggregations appear in the slowlog file, then their poor performance may be due to the building of global ordinals, which can be checked as follows:

  • Continue to insert data into Elasticsearch to ensure that global ordinals will be rebuilt when we execute a terms aggregation. Copy the slow terms aggregation from the slowlog file and manually execute and profile it, to get a feeling for where execution time is spent.
  • Continue to insert data into Elasticsearch. Copy the terms aggregation from the slowlog file and manually execute it. While the terms aggregation is executing, simultaneously execute the hot_threads api. If hot threads generally returns results that include references to GlobalOrdinalsBuilder, then the code may be spending significant time building global ordinals.
  • Temporarily enable logging of global ordinals information by executing the following command:

    PUT _cluster/settings
    {
        "transient": {
            "logger.org.elasticsearch.index.fielddata": "TRACE"
        }
    }
        
    This will write information about time spent building global ordinals to the elasticsearch.log file, such as the following:

    global-ordinals [<field_name>][1014089] took [592.3ms]
        
    Be sure to set the logging back to the default value after completing the above, as performance may be impacted while logging is set to TRACE.
  • The size of the global ordinals data structure for a given field can be seen by viewing memory_size_in_bytes when executing the indices stats command as follows:

    GET <index_name>/_stats/fielddata?fielddata_fields=<field_name>
        

The above commands should give an idea if building global ordinals is consuming significant resources. The remainder of this blog focuses on steps to mitigate the impact of building global ordinals on terms aggregations.

Use time-based indices

Global ordinals only need to be re-created on a shard if that shard has been modified since the last computation of its global ordinals. If a shard is unmodified since the last computation of its global ordinals, then previously calculated global ordinals will continue to be used. For time-series data, implementing time-based indices is a good way to ensure that the majority of indices/shards remain unmodified, which will reduce the size of the global ordinals that need to be recomputed after a refresh operation.

For example, if two years of data is stored in monthly indices instead of in one large index, then each monthly index is 1/24th the size that one large index would be. Since we are considering time-series data, we know that only the most recent monthly index will have new documents inserted. This means that only one of the 24 indices is actively written into. Since global ordinals are only rebuilt on shards that have been modified, the shards in 23 of the 24 monthly indices will continue to use previously computed global ordinals. This would reduce the work required to build global ordinals by a factor of up to 24 times when compared to storing two years worth of data in one large index.

Enable eager global ordinals

The performance of high-cardinality terms aggregations can be improved by eager building of global ordinals. Enabling eager building of global ordinals will create the global ordinals data structure when segments are refreshed, as opposed to the first query after each refresh. However, the trade-off is that eager building of global ordinals will potentially negatively impact ingest performance because new global ordinals will be computed on every refresh, even if they might not be used. To minimize the additional workload caused by frequently building global ordinals due to frequent refreshes, the refresh interval should be increased.

Do not build global ordinals

Another approach to improving the performance of high-cardinality terms aggregations is to avoid the building of global ordinals entirely, and instead execute terms aggregations directly on the raw terms. This can be beneficial because computing global ordinals on a high-cardinality field may be slow, and not building global ordinals will eliminate this delay. This comes at the expense of making each subsequent terms aggregation less efficient, as they cannot leverage global ordinals. 

Additionally, be aware that if global ordinals are not built for a terms aggregation, the terms aggregation will use more memory per request because Elasticsearch will need to keep a map of all of the unique terms that appear in the result set. This could potentially trigger an internal circuit breaker to prevent memory overuse if a large number of unique terms are to be aggregated, which could result in a failed aggregation.

Therefore, this approach should only be applied if a terms aggregation is expected to execute on a relatively small number of documents. For example, this would likely be the case if the aggregation is defined along with a selective bool query executing in filter context. Run an experiment with real data to determine if this approach improves the performance of a specific use case.

In Elasticsearch 6.7 or newer this can be accomplished by specifying "execution_hint": "map" which tells Elasticsearch to aggregate field values directly without leveraging global ordinals, or in older versions this can be accomplished by using the alternate technique of executing a script inside a terms aggregation, which is described below.

By default a terms aggregation will return buckets for the top ten terms ordered by the number of documents in each bucket. Below is an example terms aggregation for top IBAN identifiers. By default this terms aggregation will rebuild global ordinals on the first execution after the last refresh.

"aggregations": {
    "top-ibans": {
        "terms": {
            "field": "IBAN_keyword"
        }
    }
}

This could be re-written to avoid using global ordinals in Elasticsearch version 6.7 or newer as follows.

"aggregations": {
    "top-ibans": {
        "terms": {
            "field": "IBAN_keyword",
            "execution_hint": "map"
        }
    }
}

In versions of Elasticsearch prior to 6.7, there was a bug that caused global ordinals to be computed even if "execution_hint": "map" was specified. This was fixed in this github pull request. For older versions, the following technique can be applied to prevent global ordinals from being built for the above terms aggregation. 

"aggregations" : {
    "top-ibans" : {
         "terms" : {
             "script": {
                 "source" : "doc['IBAN_keyword'].value",
                 "lang" : "painless"
             }
         }
     }
}

Conclusion

In this blog post I first presented a brief overview of documentation that should be followed to ensure the best overall performance of an Elasticsearch cluster. This was followed by a deep dive into terms aggregations including an overview of how they work and several options to improve their performance.

If you have any questions about terms aggregations, Elasticsearch performance, or any other Elasticsearch-related topics, have a look at our Discuss forums for valuable discussion, insights, and information. Also, don't forget to try out our Elasticsearch Service, the only hosted Elasticsearch and Kibana offering powered by the creators of Elasticsearch.