A backlogged task queue can prevent tasks from completing and put the cluster into an unhealthy state. Resource constraints, a large number of tasks being triggered at once, and long running tasks can all contribute to a backlogged task queue.
Diagnose a task queue backlogedit
Check the thread pool status
You can use the cat thread pool API to see the number of active threads in each thread pool and how many tasks are queued, how many have been rejected, and how many have completed.
response = client.cat.thread_pool( v: true, s: 't,n', h: 'type,name,node_name,active,queue,rejected,completed' ) puts response
Inspect the hot threads on each node
If a particular thread pool queue is backed up, you can periodically poll the Nodes hot threads API to determine if the thread has sufficient resources to progress and gauge how quickly it is progressing.
response = client.nodes.hot_threads puts response
Look for long running tasks
Long-running tasks can also cause a backlog.
You can use the task management API to get information about the tasks that are running.
running_time_in_nanos to identify tasks that are taking an excessive amount of time to complete.
response = client.tasks.list( filter_path: 'nodes.*.tasks' ) puts response
Resolve a task queue backlogedit
Increase available resources
If tasks are progressing slowly and the queue is backing up, you might need to take steps to Reduce CPU usage.
In some cases, increasing the thread pool size might help.
For example, the
force_merge thread pool defaults to a single thread.
Increasing the size to 2 might help reduce a backlog of force merge requests.
Cancel stuck tasks
If you find the active task’s hot thread isn’t progressing and there’s a backlog, consider canceling the task.