Engineering

Managing and troubleshooting Elasticsearch memory

Hiya! With Elastic’s expansion of our Elasticsearch Service Cloud offering and automated onboarding, we’ve expanded the Elastic Stack audience from full ops teams to data engineers, security teams, and consultants. As an Elastic support rep, I’ve enjoyed interacting with more user backgrounds and with even wider use cases. 

With a wider audience, I’m seeing more questions about managing resource allocation, in particular the mystical shard-heap ratio and avoiding circuit breakers. I get it! When I started with the Elastic Stack, I had the same questions. It was my first intro to managing Java heap and time series database shards, and scaling my own infrastructure.

When I joined the Elastic team, I loved that on top of documentation, we had blogs and tutorials so I could onboard quickly. But then I struggled my first month to correlate my theoretical knowledge to the errors users would send through my ticket queue. Eventually, I figured out, like other support reps, that a lot of the reported errors were just symptoms of allocation issues and the same seven-ish links would bring users up to speed to successfully manage their resource allocation.

Speaking as a support rep, in the following sections, I’m going to go over the top allocation management theory links we send users, the top symptoms we see, and where we direct users to update their configurations to resolve their resource allocation issues.

Theory

As a Java application, Elasticsearch requires some logical memory (heap) allocation from the system’s physical memory. This should be up to half of the physical RAM, capping at 32GB. Setting higher heap usage is usually in response to expensive queries and larger data storage. Parent circuit breaker defaults to 95%, but we recommend scaling resources once consistently hitting 85%

I highly recommend these overview articles by our team for more info:

Config

Out of the box, Elasticsearch’s default settings automatically size your JVM heap based on node role and total memory. However, as needed, you can configure it directly in the following three ways:

1. Directly in your config > jvm.options file of your local Elasticsearch files

## JVM configuration 
################################################################ 
## IMPORTANT: JVM heap size 
################################################################ 
… 
# Xms represents the initial size of total heap space 
# Xmx represents the maximum size of total heap space 
-Xms4g
-Xmx4g

2. As an Elasticsearch environment variable in your docker-compose

version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    environment:
      - node.name=es01
      - cluster.name=es
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200

3. Via our Elasticsearch Service > Deployment > Edit view. Note: The slider assigns physical memory and roughly half will be allotted to the heap.

blog-elasticsearch-memory.png

Troubleshooting

If you’re currently experiencing performance issues with your cluster, it’ll most likely come down to the usual suspects

  • Configuration issues: Oversharding, no ILM policy
  • Volume induced: High request pace/load, overlapping expensive queries / writes

All following cURL / API requests can be made in the Elasticsearch Service > API Console, as a cURL to the Elasticsearch API, or under Kibana > Dev Tools.

Oversharding

Data indices store into sub-shards which use heap for maintenance and during search/write requests. Shard size should cap at 50GB and number should cap as determined via this equation:

shards <= sum(nodes.max_heap) * 20

Taking the above Elasticsearch Service example with 8GB of physical memory across two zones (which will allocate two nodes in total)

# node.max_heap 
8GB of physical memory / 2 = 4GB of heap  
# sum(nodes.max_heap) 
4GB of heap * 2 nodes = 8GB 
# max shards 
8GB * 20 
160

Then cross-compare this to either _cat/allocation

GET /_cat/allocation?v=true&h=shards,node
shards node
    41 instance-0000000001
    41 instance-0000000000

Or to _cluster/health

GET /_cluster/health?filter_path=status,*_shards
{
  "status": "green",
  "unassigned_shards": 0,
  "initializing_shards": 0,
  "active_primary_shards": 41,
  "relocating_shards": 0,
  "active_shards": 82,
  "delayed_unassigned_shards": 0
}

So this deployment has 82 shards of the max 160 recommendation. If the count was higher than the recommendation, you may experience the symptoms in the next two sections (see below).

If any shards report >0 outside active_shards or active_primary_shards, you’ve pinpointed a major config cause for performance issues.

Most commonly if this reports an issue, it’ll be unassigned_shards>0. If these shards are primary, your cluster will report as status:red, and if only replicas it’ll report as status:yellow. (This is why setting replicas on indices is important so that if the cluster encounters an issue it can recover rather than experience data loss.)

Let’s pretend we have a status:yellow with a single unassigned shard. To investigate, we’d take a look at which index shard is having trouble via _cat/shards

GET _cat/shards?v=true&s=state
index                                     shard prirep state        docs   store ip           node
logs                                      0     p      STARTED         2  10.1kb 10.42.255.40 instance-0000000001
logs                                      0     r      UNASSIGNED
kibana_sample_data_logs                   0     p      STARTED     14074  10.6mb 10.42.255.40 instance-0000000001
.kibana_1                                 0     p      STARTED      2261   3.8mb 10.42.255.40 instance-0000000001

So this will be for our non-system index logs, which has an unassigned replica shard. Let’s see what’s giving it grief by running _cluster/allocation/explain (Pro tip: When you escalate to support, this is exactly what we do)

GET _cluster/allocation/explain?pretty&filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
{ "index": "logs",
  "node_allocation_decisions": [{
      "node_name": "instance-0000000005",
      "deciders": [{
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "node does not match any index setting [index.routing.allocation.include._tier] tier filters [data_hot]"
}]}]}

This error message points to data_hot, which is part of an index lifecycle management (ILM) policy and indicates that our ILM policy is incongruent with our current index settings. In this case, the cause of this error is from setting up a hot-warm ILM policy without having designated hot-warm nodes. (I needed to guarantee something would fail, this is me forcing error examples for y’all. See what you’ve done to me 😂.)

FYI, if you run this command when you don’t have any unassigned shards, you’ll get a 400 error saying unable to find any unassigned shards to explain because nothing’s wrong to report on.

If you get a non-logic cause (e.g., a temporary network error like node left cluster during allocation), then you can use Elastic’s handy-dandy _cluster/reroute.

POST /_cluster/reroute

This request without customizations starts an asynchronous background process that attempts to allocate all current state:UNASSIGNED shards. (Don’t be like me and not wait for it to finish before you contact dev because I thought it would be instantaneous and coincidentally escalate just in time for them to say nothing’s wrong because nothing was anymore.)

Circuit breakers

Maxing out your heap allocation can cause requests to your cluster to timeout or error and frequently will cause your cluster to experience circuit breaker exceptions. Circuit breaking causes elasticsearch.log events like

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [num/numGB], which is larger than the limit of [num/numGB], usages [request=0/0b, fielddata=num/numKB, in_flight_requests=num/numGB, accounting=num/numGB]

To investigate, take a look at your heap.percent, either by looking at _cat/nodes

GET /_cat/nodes?v=true&h=name,node*,heap*
# heap = JVM (logical memory reserved for heap)
# ram  = physical memory
name                                node.role heap.current heap.percent heap.max
tiebreaker-0000000002 mv             119.8mb           23    508mb
instance-0000000001   himrst           1.8gb           48    3.9gb
instance-0000000000   himrst           2.8gb           73    3.9gb

Or if you’ve previously enabled it, by navigating to Kibana > Stack Monitoring.

blog-elasticsearch-memory-2.png

If you’ve confirmed you’re hitting your memory circuit breakers, you’ll want to consider increasing heap temporarily to give yourself breathing room to investigate. When investigating root cause, look through your cluster proxy logs or elasticsearch.log for the preceding consecutive events. You’ll be looking for

  • expensive queries, especially:
    • high bucket aggregations
      •  I felt so silly when I found out that searches temporarily allocate a certain port of your heap before they run the query based on the search size or bucket dimensions, so setting 10,000,000 really was giving my ops team heartburn.
    • non-optimized mappings
      • The second reason to feel silly was when I thought doing hierarchical reporting would search better than flattened out data (it does not).
  • Request volume/pace: Usually batch or async queries

Time to scale

If this isn’t your first time hitting circuit breakers or you suspect it’ll be an ongoing issue (e.g., consistently hitting 85%, so it’s time to look at scaling resources), you’ll want to take a closer look at the JVM Memory Pressure as your long-term heap indicator. You can check this in Elasticsearch Service > Deployment

blog-elasticsearch-memory-3.png

Or you can calculate it from _nodes/stats

GET /_nodes/stats?filter_path=nodes.*.jvm.mem.pools.old
{"nodes": { "node_id": { "jvm": { "mem": { "pools": { "old": {
  "max_in_bytes": 532676608,
  "peak_max_in_bytes": 532676608,
  "peak_used_in_bytes": 104465408,
  "used_in_bytes": 104465408
}}}}}}}

where

JVM Memory Pressure = used_in_bytes / max_in_bytes

A potential symptom of this is high frequency and long duration from garbage collector (gc) events in your elasticsearch.log

[timestamp_short_interval_from_last][INFO ][o.e.m.j.JvmGcMonitorService] [node_id] [gc][number] overhead, spent [21s] collecting in the last [40s]

If you confirm this scenario, you’ll need to take a look either at scaling your cluster or at reducing the demands hitting it. You’ll want to investigate/consider:

  • increasing heap resources (heap/node, number of nodes)
  • decreasing shards (delete unnecessary/old data, use ILM to put data into warm/cold storage so you can shrink it, turn off replicas for data you don’t care if you lose)

Conclusion

Wooh! From what I see in Elastic Support, that’s the rundown of most common user tickets: unassigned shards, unbalanced shard-heap, circuit breakers, high garbage collection, and allocation errors. All are symptoms of the core resource allocation management conversation. Hopefully, you now know the theory and resolution steps, too.

At this point, though, if you’re stuck resolving an issue, feel free to reach out. We’re here and happy to help! You can contact us via Elastic Discuss, Elastic Community Slack, consulting, training, and support.

Cheers to our ability to self-manage Elastic Stack’s resource allocation as non-Ops (love Ops, too)!