Troubleshooting machine learning anomaly detection
editTroubleshooting machine learning anomaly detection
editUse the information in this section to troubleshoot common problems and find answers for frequently asked questions.
Restart failed anomaly detection jobs
editIf an anomaly detection job fails, try to restart the job by following the procedure described below. If the restarted job runs as expected, then the problem that caused the job to fail was transient and no further investigation is needed. If the job quickly fails after the restart, then the problem is persistent and needs further investigation. In this case, find out which node the failed job was running on by checking the job stats on the Job management pane in Kibana. Then get the logs for that node and look for exceptions and errors where the ID of the anomaly detection job is in the message to have a better understanding of the issue.
If an anomaly detection job has failed, do the following to recover from failed
state:
-
Force stop the corresponding datafeed by using the Stop datafeed API with the
force
parameter beingtrue
. For example, the following request force stops themy_datafeed
datafeed.POST _ml/datafeeds/my_datafeed/_stop { "force": "true" }
-
Force close the anomaly detection job by using the Close anomaly detection job API with the
force
parameter beingtrue
. For example, the following request force closes themy_job
anomaly detection job:POST _ml/anomaly_detectors/my_job/_close?force=true
- Restart the anomaly detection job on the Job management pane in Kibana.