This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features.
The Task Manager has an internal monitoring mechanism to keep track of a variety of metrics, which can be consumed with either the health monitoring API or the Kibana server log.
The health monitoring API provides a reliable endpoint that can be monitored. Consuming this endpoint doesn’t cause additional load, but rather returns the latest health checks made by the system. This design enables consumption by external monitoring services at a regular cadence without additional load to the system.
Each Kibana instance exposes its own endpoint at:
$ curl -X GET api/task_manager/_health
_health endpoint of each Kibana instance in the cluster is the recommended method of ensuring confidence in mission critical services such as Alerting and Actions.
Configuring the monitored health statisticsedit
The health monitoring API monitors the performance of Task Manager out of the box. However, certain performance considerations are deployment specific and you can configure them.
A health threshold is the threshold for failed task executions. Once a task exceeds this threshold, a status of
error is set on the task type execution. To configure a health threshold, use the
xpack.task_manager.monitored_task_execution_thresholds setting. You can apply this this setting to all task types in the system, or to a custom task type.
By default, this setting marks the health of every task type as
warning when it exceeds 80% failed executions, and as
error at 90%.
Set this value to a number between 0 to 100. The threshold is hit when the value exceeds this number.
To avoid a status of
error, set the threshold at 100. To hit
error the moment any task fails, set the threshold to 0.
Create a custom configuration to set lower thresholds for task types you consider critical, such as alerting tasks that you want to detect sooner in an external monitoring service.
A default configuration that sets the system-wide
A custom configuration for the
Consuming health statsedit
The health API is best consumed by via the
Additionally, the metrics are logged in the Kibana
DEBUG logger at a regular cadence.
To enable Task Manager DEBUG logging in your Kibana instance, add the following to your
logging: loggers: - context: plugins.taskManager appenders: [console] level: debug
These stats are logged based the number of milliseconds set in your
xpack.task_manager.poll_interval setting, which means it could add substantial noise to your logs. Only enable this level of logging temporarily.
Making sense of Task Manager health statsedit
The health monitoring API exposes three sections:
This section summarizes the current configuration of Task Manager. This includes dynamic configurations that change over time, such as
This section summarizes the work load across the cluster, including the tasks in the system, their types, and current status.
This section tracks excution performance of Task Manager, tracking task drift, worker load, and execution stats broken down by type, including duration and execution results.
Each section has a
timestamp and a
status that indicates when the last update to this section took place and whether the health of this section was evaluated as
status indicates the
status of the system overall.
By monitoring the
status of the system overall, and the
status of specific task types of interest, you can evaluate the health of the Kibana Task Management system.