21 janvier 2015

Understanding the Memory Pressure Indicator

Par Konrad Beiske

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

We all know memory is critical to Elasticsearch, but when should you add more? In the Found console we've included a memory pressure indicator so that you can easily check where your cluster is at in terms of memory usage and capacity. For those familiar with how the JVM garbage collector works: "The indicator uses the fill percentage of the old generation pool". In this article I will cover the basics of the old pool and why we chose that as the indicator.

Introduction

If you monitor the total memory used on the JVM you will typically see a sawtooth pattern where the memory usage steadily increases and then drops suddenly.

Sawtooth

Sawtooth

The reason for this sawtooth pattern is that the JVM continously needs to allocate memory on the heap as new objects are created as a part of the normal program execution. Most of these objects are however short lived and quickly become available for collection by the garbage collector. When the garbage collector finishes you’ll see a drop on the memory usage graph. This constant state of flux makes the current total memory usage a poor indicator of memory pressure.

The JVM garbage collector is designed such that it draws on the fact that most objects are short lived. There are separate pools in the heap for new objects and old objects and these pools are garbage collected separately. After the collection of the new objects pool, surviving objects are moved to the old objects pool, which is garbage collected less frequently. This is due to the fact that it’s less likely to be any substantial amount of garbage there. If you monitor each of these pools separately, you will see the same sawtooth pattern, but the old pool is fairly steady while the new pool frequently moves between full and empty. This is why we we have based our memory pressure indicator on the fill rate of the old pool.

Eden (new objects) and Oldgen (objects having survived collection in Eden)

Eden (new objects) and Oldgen (objects having survived collection in Eden)

So when do you need to upgrade your cluster? Well, this of course depends on your performance and stability needs, and hence the decision is yours, but I’ll provide you with a guideline below.

Green is Good

Anything below 75% - the green range of the indicator - is good and there is no need to upgrade unless you expect an increase in requests. In fact, there might be a substantial amount of garbage in the old pool, but there is no way to tell how much until the garbage collector starts, and that does not happen until the indicator reaches 75%. In other words, for a reported amount of up to 75% used heap the actual amount of used data might be less.

Above 75% percent and you should start paying closer attention to what is going on, both in terms of indexing requests and data stored. Consider if there are appropriate optimizations that you can do. Check your mappings, your shard count and your cache utilization. If for some reason Elasticsearch seems to use more heap than you have data, you probably should check your mappings and your sharding strategy. More on that in Sizing Elasticsearch and Six ways to crash Elasticsearch.

Heavy Memory Pressure Consumes CPU

If the old pool is still above 75% after the collector finishes, the Java virtual machine will schedule a new collection, expecting to finish just before the pool runs out of memory. This means that higher fill rate in the old pool will result in more frequent collections and thus more CPU will be spent on garbage collections as the fill rate approaches 100%. This is the behaviour of the ConcurrentMarkSweep (CMS) collector that Elasticsearch uses.

Is High Memory Pressure Always a Problem?

If you are happy with your current search performance, then there is no big cause for concern that your cluster has around 80% fill rate, but you need to be aware of that you do not have much to go on in case of spikes. The likelihood of a spike depends a lot on how much control you have over the data flow in and out of your cluster. Common sources are:

  • A burst in new documents for indexing - If you use bulk indexing, do you have a cap on the bulk size?
  • A burst in searches - Are searches triggered by users on from the internet? Do you have good cache utilization?
  • Ad hoc analytics - Trying out some new aggregations or selecting a wide time range in Kibana can consume supprisingly big amounts of memory.

High memory usage does not necessarily mean that your cluster cannot accept any more documents either. This depends on the memory profile of your searches. Does the memory requirement of a search grow with index size? If you use field data it will, but then doc values can be your saviour.

Approaching 100%

If for some reason the Concurrent Mark Sweep collector was not started or did not finish in time to free up enough memory for a needed allocation, we get an allocation failure. An allocation failure means that there was not enough free memory at the time it was requested and execution has to wait for the garbage collector. When this happens the JVM gives up on using the CMS collector and triggers the old style stop-the-world collector. It is actually faster, but at the cost of blocking everything. All threads on the JVM will be halted until the collector finishes. This blocks Elasticsearch from receiving and sending network traffic as well. If the garbage collector takes too long, other nodes might think the node is down and the stability of your cluster could be at risk.

If the collector is able to free up enough memory for the allocation, the cluster continues from exactly where it left off. Before learning of the amount of time passed during the pause and actions taken by the rest of the cluster during that time, the node might just be able to send off a few messages based on outdated information. The resilience of Elasticsearch has improved a lot in the latest versions, but concurrency and distribution is not only difficult to program with, it can be a challenge to test properly as well. In my experience, long garbage collection pauses is a repeated source of the more obscure concurrency bugs.

Running out of Memory

If the collector is not able to free up enough memory then an allocation error is thrown and whichever thread that tried to allocate memory is killed. The problem is that the heap is a shared resource among all threads and thus there is no guarantee that the thread causing the spike is the one receiving the allocation error. Most of the threads in Elasticserach belong to pools that are able to recreate them. But it is very hard to know for sure that an instance is an healthy state after an out of memory error has occurred. It’s for this reason that we at Found automatically reboot instances after out of memory errors. When that happens you will receive an email notification. For repeated alerts the emails are throttled and aggregated. We don’t want to overload your inbox and we certainly don’t want spam filters to start sorting out these alerts.

Conclusion

Our recomendation is the following:

  • Below 75%: No worries.
  • 75% to 85%: Acceptable if performance is satisfactory and load is expected to be stable.
  • Above 85%: Now is the time for action. Either reduce memory consumption or add more memory.

The 75% limit matches the CMSInitiatingOccupancyFraction setting that Elasticsearch uses. The 85% limit is based on our experience. There are no hard limits between 75% and 100%, but the performance will gradually worsen as the fill rate increases until you either run out of memory or the cluster falls apart due to long garbage collection pauses.