Rejected requests
When Elasticsearch rejects a request, it stops the operation and returns an HTTP 429 response code for a TOO_MANY_REQUESTS error. The returned HTTP response body includes information on why the operation was rejected. You can retry HTTP 429 errors, but it's generally best to implement exponential backoff to avoid exacerbating performance issues.
Rejected requests are frequently caused by depleted resources. The most common of these are described here.
To check the number of rejected tasks for each thread pool, use the cat thread pool API:
GET /_cat/thread_pool?v=true&h=id,name,queue,active,rejected,completed
A high ratio of rejected to completed tasks, particularly in the search and write thread pools, indicates that Elasticsearch regularly rejects requests.
The following examples demonstrate typical queue capacity errors.
The API response body returns an
es_rejected_execution_exceptionerror:{ "shard" : 0, "node" : "XXXX", "reason" : { "reason" : "rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@26c03d4a on QueueResizingEsThreadPoolExecutor[name = XXXXX/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 968.1ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@70be0765[Running, pool size = 25, active threads = 25, queued tasks = 1000, completed tasks = 616499351]]", "type" : "es_rejected_execution_exception" }, "index" : "my-index-000001" }The error log returns an
EsRejectedExecutionExceptionerror:Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@1a25fe82 on QueueResizingEsThreadPoolExecutor[name = XXXXX/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 10.7ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@6312a0bb[Running, pool size = 25, active threads = 25, queued tasks = 1000, completed tasks = 616499351]]
To troubleshoot ongoing thread pool rejection errors, check task queue backlog due to thread pool. Refer to the Threadpool Rejections video for a troubleshooting walkthrough.
To check the number of tripped circuit breakers, use the node stats API.
GET /_nodes/stats/breaker
These statistics are cumulative from node startup. For more information, see circuit breaker errors.
Refer to the Circuit Breaker Error video for a troubleshooting walkthrough.
Elasticsearch reserves part of its JVM for indexing. An error can occur if heap usage exceeds the indexing_pressure.memory.limit setting. To check the number of indexing pressure rejections, use the node stats API.
GET _nodes/stats?human&filter_path=nodes.*.indexing_pressure
These statistics are cumulative from node startup.
The following examples demonstrate indexing pressure rejections.
The API response body returns an
es_rejected_execution_exceptionerror:{ "error" : { "root_cause" : [ { "type" : "es_rejected_execution_exception", "reason" : "rejected execution of primary operation [coordinating_and_primary_bytes=XXXXX, replica_bytes=XXXXX, all_bytes=XXXXX, coordinating_operation_bytes=XXXXX, max_coordinating_and_primary_bytes=XXXXX]" } ], "type" : "es_rejected_execution_exception", "reason" : "rejected execution of coordinating operation [coordinating_and_primary_bytes=XXXXX, replica_bytes=XXXXX, all_bytes=XXXXX, coordinating_operation_bytes=XXXXX, max_coordinating_and_primary_bytes=XXXXX]" }, "status" : 429 }The error log returns an
EsRejectedExecutionExceptionerror:Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of primary operation [coordinating_and_primary_bytes=XXXXX, replica_bytes=XXXXX, all_bytes=XXXXX, coordinating_operation_bytes=XXXXX, max_coordinating_and_primary_bytes=XXXXX]
As part of the Reading and writing documents outlined models, the portion of the error rejected execution of <category> operation reports one of the following categories: combined_coordinating_and_primary, coordinating, primary, or replica.
These errors are often related to:
- The quantity of backlogged tasks.
- The value of Bulk index is set to too large of a size.
- Large search response sizes.
- Use of the
semantic_textfield type, which can cause rejections when indexing large batches of documents if the batch may otherwise incur an Out of Memory (OOM) error.
Refer to the Index Pressure Rejections video for a troubleshooting walkthrough.
AutoOps is a monitoring tool that simplifies cluster management through performance recommendations, resource utilization visibility, and real-time issue detection with resolution paths. Learn more about AutoOps.
If Elasticsearch regularly rejects requests and other tasks, your cluster likely has high CPU usage or high JVM memory pressure. For tips, refer to High CPU usage and High JVM memory pressure.
When bulk indexing documents with the semantic_text field type, you might encounter rejections due to high memory usage during inference processing. These rejections appear as an InferenceException in your cluster logs.
To resolve this issue:
- Reduce the batch size of documents in your indexing requests.
- If reducing batch size doesn't resolve the issue, then consider scaling up your machine resources.
-
A last resort option is to adjust the indexing_pressure.memory.coordinating.limitcluster setting. The default value is 10% of the heap. Increasing this limit allows more memory to be used for coordinating operations before rejections occur.
This adjustment should only be considered after exhausting other options, as setting this value too high may risk Out of Memory (OOM) errors in your cluster. A cluster restart is required for this change to take effect.