July 24, 2015

Support in the Wild: My Biggest Elasticsearch Problem at Scale

As a Support Engineer at Elastic, I come across a lot of different issues from our customers, ranging from development questions surrounding Kibana to helping a user to understand why their Elasticsearch cluster had issues. In this blog post, I want to describe the number one problem that I run into with users running Elasticsearch as they begin to scale their workloads. Without fail, when a user approaches me with issues running Elasticsearch, whether it is a customer or not, I point them to this issue.

Java Heap Pressure

Elasticsearch has so many wildly different use cases that I could not write a reasonably short blog post describing what can and cannot consume memory. However, there is one thing that constantly stands out above all of the other concerns that you might have while running an Elasticsearch cluster at scale.

For the users that I help, fielddata is the problem that is the most likely to cause their cluster's instability. Fielddata is the bane of my existence and it's the most frequent cause of the highest severity issues that I handle with our customers.

Understanding Fielddata

The inverted index is the magic that makes Elasticsearch queries so fast. This data structure holds a sorted list of all the unique terms that appear in a field, and each term points to the list of documents that contain that term:

Term:    Docs: 1  2  3  4  5
----------------------------
brown          X     X  X 
fox                     X  X
quick             X     X
----------------------------

Search asks the question: What documents contain term brown in the foo field? The inverted index is the perfect structure to answer this question: look up the term of interest in the sorted list and you immediately know which documents match your query.

Sorting or aggregations, however, need to be able to answer this question: What terms does Document 1 contain in the foo field? To answer this, we need a data structure that is the opposite of the inverted index:

Docs:   Terms:
----------------------------
1       [ brown ]
2       [ quick ]
3       [ brown ]
4       [ brown, fox, quick ]
5       [ fox ]
----------------------------

This is the purpose of fielddata. Fielddata can be generated at query time by reading the inverted index, inverting the term <-> doc data structure, and storing the results in memory. The two major downsides of this approach should be obvious:

Loading fielddata can be slow, especially with big segments.
It consumes a lot of valuable heap space.

Because loading fielddata is costly, we try to do it as seldom as possible. Once loaded, we keep it in memory for as long as possible.

By default, fielddata is loaded on demand, which means that you will not see it until you are using it. Also, by being loaded per segment, it means that new segments that get created will slowly add to your overall memory usage until the field's fielddata is evicted from memory. Eviction happens in only a few ways:

Deleting the index or indices that contains it.
Closing the index or indices that contains it.
Segment fielddata is removed when segments are removed (e.g., background merging).
- This usually just means that the problem is moving rather than going away.
Restarting the node containing the fielddata.
Clearing the relevant fielddata cache.
Automatically evicting the fielddata to make room for other fielddata.
- This will not happen with default settings.

While the first two ways will cause the memory to be evicted, they're not useful in terms of solving the problem because they make the index unusable. Segment merging is happening in the background and it is not a way to clear fielddata. The fourth and fifth ways are unlikely to be a long term solution because they do not prevent fielddata from being reloaded.

The sixth option, evicting fielddata when the cache is full, leads to different issues: one request triggers fielddata loading for one field and the next request triggers loading for another, causing the first field to be evicted. This causes memory thrashing and slow garbage collections, and your users suffer from very slow queries while they wait for their fielddata to be loaded.

Simply put, once fielddata becomes a problem, then it stays a problem.

Why Fielddata is Bad

At small scales, you can generally get away with fielddata usage without even realizing that you are using it. In highly controlled environments, you may even enjoy that specific fields are being loaded into memory for theoretically faster access.

However, almost without fail, you are bound to run into a problem with it eventually. Whether it's because someone ran a test request on the production system without thinking that it would be a problem (it's just one query, right?), your queries changed to match new data, or you just finally reached a scale where it no longer works: you will eventually run into memory pressure that does not go away.

As I noted earlier, fielddata does not go away on its own. In Elasticsearch 1.3 and later, we allow up to 60% of your Java heap's memory to be consumed by fielddata per node. We control this via the Fielddata Circuit Breaker, which checks incoming requests for potential fielddata usage and then blocks them if they require more memory than is currently available. Any circuit breaker's purpose is to prevent any bad requests, which means that it never gets the chance to cause a problem (e.g., allocate even more fielddata), but it's important to note that it will not clear any existing fielddata.

For example, if a node has 10 GBs of Java heap, then 60% of that is going to be 6 GBs. If a new request requires 1 GB of fielddata to be loaded for that node that is already using 4 GBs of the heap for fielddata, then it will allow it because 4 GB, plus 1 GB, is less than 6 GB. However, if the next request needed 2 GB for yet another field's fielddata, then the entire request would be rejected because the fielddata is exhausted (5 GB + 2 GB = 7 GB, which is clearly greater than 6 GB).

Note: for versions prior to Elasticsearch 1.3, we allowed an unlimited amount of your Java heap to be consumed by fielddata.

Finding Your Fielddata

Fortunately, it's not all bad news. Not only do we have a solution to the problem, but we also provide a way to find and understand your problem with it.

$ curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

This will provide a list of each node with its fielddata usage. For instance, at startup, my local node is using absolutely no fielddata:

id                     host            ip          node  total 
iExRFXn1Qw23iRzhwor-Wg Chriss-MBP.home 192.168.1.2 WallE    0b

To see it change, it's as simple as sorting, scripting, or aggregating any field. So let's do all three!

$ curl -XGET localhost:9200/test-index/_search -d '{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc[\"percentage\"].value > 0.5"
        }
      }
    }
  },
  "aggs": {
    "max_number": {
      "max": {
        "field": "number"
      }
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ]
}'

Although order is irrelevant for this, the first field that will be impacted will be the percentage field that is accessed inside of the scripted filter. The second field used will be the number field from the aggregation. Finally, the last field is the @timestamp field used to sort the filtered results. Taking another look at the _cat/fielddata command above confirms this:

id                     host            ip          node   total number @timestamp percentage
iExRFXn1Qw23iRzhwor-Wg Chriss-MBP.home 192.168.1.2 WallE 49.9kb 24.8kb     24.8kb       192b

Use Doc Values

The solution to this fielddata problem is to avoid it altogether. Fortunately, you can avoid the use of fielddata by manually mapping all of your fields to use doc values. Without repeating too much from the guide, doc values offload this burden by writing the fielddata to disk at index time, thereby allowing Elasticsearch to load the values outside of your Java heap as they are needed.

By taking the burden out of your heap, you get fast access to the on-disk fielddata through the file system cache, which gives in-memory performance without the cost of garbage collections coming into play. This also frees up a lot of headroom for the Elasticsearch heap so that more operations (e.g., bulk indexing and concurrent searches) can use the heap without placing the node under memory pressure, which leads to garbage collection that will slow it down.

You might be asking: How do I enable these doc values? It's as simple as adding this to every field's mapping, except analyzed string fields:

"doc_values" : true

Unfortunately, you must do this before you index any data. This means that, for any existing index that is not already using doc values, you cannot flip the switch to enable them. You would have to reindex.

Updating an Active Cluster

Naturally, up to this point, you may be wondering how to remove fielddata from your cluster. The answer depends on your data.

Time Based Indices

If you are using time based indices (e.g., logstash-2015.07.18), such as with logging use cases, then you should update your template(s) to use doc values so that future indices (e.g., tomorrow's) get created with doc values. From there, the problem should take care of itself as indices that use fielddata will age themselves out.

If you maintain indices for significant periods of time (e.g., many months), then it may be worth looking into reindexing older indices to take advantage of doc values.

Other Indices

For non-time based indices (e.g., my-index), the problem gets more complicated because it depends entirely on how you use the indices. However, to avoid using fielddata, then you will have to reindex the data. In Elasticsearch: The Definitive Guide, we discuss using aliases for this purpose as a mechanism for transitioning between two indices without impacting clients.

Potential Downsides to Doc Values

Unfortunately, nothing is perfect. From my perspective, the only important downside to doc values is that they cannot be used with analyzed strings. But by default, all strings are mapped as analyzed strings though! That is the only exception to the use of doc values.

Looking back at the Understanding Fielddata section of this post, it is critical to think about why you might use fielddata with an analyzed string. For regular, unstructured search, you will not use any fielddata. With that in mind, the only time that you should catch yourself using fielddata for analyzed strings is with the significant terms aggregation. All other uses of fielddata should be avoided by using a not_analyzed version of the string because sorting, aggregating, and scripting against an analyzed string is not going to give you the results you expect.

We are currently discussing ways to simplify this issue with the release of Elasticsearch 2.0, such as issue #11901 and issue #12394, and it is worth taking a look at these issues to see some common pitfalls so that you can avoid them yourself in the mean time! The second issue shows a very good way to take advantage of both analyzed and not analyzed strings by using multifields, thereby allowing you to use each field when it is appropriate.

There are other minor issues that I want to list for completeness:

Prior to Elasticsearch 1.4, we did not actively push doc values because they did not get as much attention as they deserved. However, with Elasticsearch 1.4 and later, you should be using them.
- As a support engineer, my honest opinion is that you should be running a minimum of Elasticsearch 1.6 or later, so this is not an issue.
Technically, doc values write extra data when documents are indexed.
- In practice, this does not degrade performance at all. The overall benefits of doc values significantly outweigh the cost.
Minor note: Facets do not use doc values, which means that Kibana 3, which relies on facets, does not use doc values.
- You should upgrade to Kibana 4 to take advantage of doc values, as well as support for aggregations, which allows for more powerful visualizations and dashboards.
- Facets were deprecated in Elasticsearch 1.0 and they were replaced with aggregations. They will be completely removed in Elasticsearch 2.0.

Fixing It For Good

We're not there yet, but Elasticsearch 2.0 is on the horizon and we expect to release the first beta very soon. I cannot wait for this release because it will do away with this issue for any new index by applying "doc_values" : true by default for every possible field (read: everything that is not an analyzed string!).

Elasticsearch 2.0 contains a lot of changes that I am excited to see. Changing doc values to default to true is just one major change that I am looking forward too, but continuing the analyzed string discussion from above, there are numerous changes that will simplify your deployment and your life (and mine too!). From my support perspective, such changes range from further improvements to shard recovery, to diff-based cluster state communication. Feel free to read about some of the other changes in the Elasticsearch 2.0.0.beta1 announcement post here and keep an eye out for the release of beta1 in the very near future.

Hope That Helps

It's literally my job to help people to be successful running our products and I have seen a lot. This is truly the number one concern that I have with users running Elasticsearch at scale and I hope that you can catch it before it becomes a problem for you, or fix it now that you are aware of it!

Stay tuned to this blog for future announcements regarding Elasticsearch 2.0. And as always, feel free to tweet us @elastic, visit our forums, and check out our webinars such as Configuring Your Elasticsearch Cluster for Performance & Scale.