Troubleshooting machine learning anomaly detection

edit

Troubleshooting machine learning anomaly detection

edit

Use the information in this section to troubleshoot common problems and find answers for frequently asked questions.

Restart failed anomaly detection jobs

edit

If an anomaly detection job fails, try to restart the job by following the procedure described below. If the restarted job runs as expected, then the problem that caused the job to fail was transient and no further investigation is needed. If the job quickly fails after the restart, then the problem is persistent and needs further investigation. In this case, find out which node the failed job was running on by checking the job stats on the Job management pane in Kibana. Then get the logs for that node and look for exceptions and errors where the ID of the anomaly detection job is in the message to have a better understanding of the issue.

If an anomaly detection job has failed, do the following to recover from failed state:

  1. Force stop the corresponding datafeed by using the Stop datafeed API with the force parameter being true. For example, the following request force stops the my_datafeed datafeed.

    POST _ml/datafeeds/my_datafeed/_stop
    {
      "force": "true"
    }
  2. Force close the anomaly detection job by using the Close anomaly detection job API with the force parameter being true. For example, the following request force closes the my_job anomaly detection job:

    POST _ml/anomaly_detectors/my_job/_close?force=true
  3. Restart the anomaly detection job on the Job management pane in Kibana.