Jobs

You can use APIs to perform the following activities:

Close Jobs

The close job API enables you to close a job. A job can be opened and closed multiple times throughout its lifecycle.

A closed job cannot receive data or perform analysis operations, but you can still explore and navigate results.

Request

POST _xpack/ml/anomaly_detectors/<job_id>/_close

Description

When you close a job, it runs housekeeping tasks such as pruning the model history, flushing buffers, calculating final results and persisting the model snapshots. Depending upon the size of the job, it could take several minutes to close and the equivalent time to re-open.

After it is closed, the job has a minimal overhead on the cluster except for maintaining its meta data. Therefore it is a best practice to close jobs that are no longer required to process data.

When a datafeed that has a specified end date stops, it automatically closes the job.

Note

If you use the force query parameter, the request returns before the associated actions such as flushing buffers and persisting the model snapshots complete. Therefore, do not use that parameter in a script that expects the job to be in a consistent state after the close job API returns.

Path Parameters

job_id (required)
(string) Identifier for the job

Query Parameters

force
(boolean) Use to close a failed job, or to forcefully close a job which has not responded to its initial close request.
timeout
(time units) Controls the time to wait until a job has closed. The default value is 30 minutes.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example closes the event_rate job:

POST _xpack/ml/anomaly_detectors/event_rate/_close

When the job is closed, you receive the following results:

{
  "closed": true
}

Create Jobs

The create job API enables you to instantiate a job.

Request

PUT _xpack/ml/anomaly_detectors/<job_id>

Path Parameters

job_id (required)
(string) Identifier for the job

Request Body

analysis_config
(object) The analysis configuration, which specifies how to analyze the data. See analysis configuration objects.
analysis_limits
Optionally specifies runtime limits for the job. See analysis limits.
data_description (required)
(object) Describes the format of the input data. This object is required, but it can be empty ({}). See data description objects.
description
(string) An optional description of the job.
model_snapshot_retention_days
(long) The time in days that model snapshots are retained for the job. Older snapshots are deleted. The default value is 1 day. For more information about model snapshots, see Model Snapshot Resources.
results_index_name
(string) The name of the index in which to store the machine learning results. The default value is shared, which corresponds to the index name .ml-anomalies-shared.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example creates the it-ops-kpi job:

PUT _xpack/ml/anomaly_detectors/it-ops-kpi
{
    "description":"First simple job",
    "analysis_config":{
      "bucket_span": "5m",
      "latency": "0ms",
      "detectors":[
        {
          "detector_description": "low_sum(events_per_min)",
          "function":"low_sum",
          "field_name": "events_per_min"
        }
      ]
    },
    "data_description": {
    "time_field":"@timestamp",
    "time_format":"epoch_ms"
    }
}

When the job is created, you receive the following results:

{
  "job_id": "it-ops-kpi",
  "job_type": "anomaly_detector",
  "description": "First simple job",
  "create_time": 1491948238874,
  "analysis_config": {
    "bucket_span": "5m",
    "latency": "0ms",
    "detectors": [
      {
        "detector_description": "low_sum(events_per_min)",
        "function": "low_sum",
        "field_name": "events_per_min",
        "detector_rules": []
      }
    ],
    "influencers": []
  },
  "data_description": {
    "time_field": "@timestamp",
    "time_format": "epoch_ms"
  },
  "model_snapshot_retention_days": 1,
  "results_index_name": "shared"
}

Delete Jobs

The delete job API enables you to delete an existing anomaly detection job.

Request

DELETE _xpack/ml/anomaly_detectors/<job_id>

Description

All job configuration, model state and results are deleted.

Important

Deleting a job must be done via this API only. Do not delete the job directly from the .ml-* indices using the Elasticsearch DELETE Document API. When X-Pack security is enabled, make sure no write privileges are granted to anyone over the .ml-* indices.

Before you can delete a job, you must delete the datafeeds that are associated with it. See Delete Datafeeds.

It is not currently possible to delete multiple jobs using wildcards or a comma separated list.

Path Parameters

job_id (required)
(string) Identifier for the job

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example deletes the event_rate job:

DELETE _xpack/ml/anomaly_detectors/event_rate

When the job is deleted, you receive the following results:

{
  "acknowledged": true
}

Get Jobs

The get jobs API enables you to retrieve configuration information for jobs.

Request

GET _xpack/ml/anomaly_detectors/

GET _xpack/ml/anomaly_detectors/<job_id>

Path Parameters

job_id
(string) Identifier for the job. This parameter does not support wildcards, but you can specify _all or omit the job_id to get information about all jobs.

Results

The API returns the following information:

jobs
(array) An array of job resources. For more information, see Job Resources.

Authorization

You must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example gets configuration information for the farequote job:

GET _xpack/ml/anomaly_detectors/farequote

The API returns the following results:

{
  "count": 1,
  "jobs": [
    {
      "job_id": "farequote",
      "job_type": "anomaly_detector",
      "description": "Multi-metric job",
      "create_time": 1491948149563,
      "finished_time": 1491948166289,
      "analysis_config": {
        "bucket_span": "5m",
        "detectors": [
          {
            "detector_description": "mean(responsetime)",
            "function": "mean",
            "field_name": "responsetime",
            "partition_field_name": "airline",
            "detector_rules": []
          }
        ],
        "influencers": [
          "airline"
        ]
      },
      "data_description": {
        "time_field": "@timestamp",
        "time_format": "epoch_ms"
      },
      "model_snapshot_retention_days": 1,
      "model_snapshot_id": "1491948163",
      "results_index_name": "shared"
    }
  ]
}

Get Job Statistics

The get jobs API enables you to retrieve usage information for jobs.

Request

GET _xpack/ml/anomaly_detectors/_stats

GET _xpack/ml/anomaly_detectors/<job_id>/_stats

Path Parameters

job_id
(string) A required identifier for the job. This parameter does not support wildcards, but you can specify _all or omit the job_id to get information about all jobs.

Results

The API returns the following information:

jobs
(array) An array of job statistics objects. For more information, see Job Statistics.

Authorization

You must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example gets usage information for the farequote job:

GET _xpack/ml/anomaly_detectors/farequote/_stats

The API returns the following results:

{
  "count": 1,
  "jobs": [
    {
      "job_id": "farequote",
      "data_counts": {
        "job_id": "farequote",
        "processed_record_count": 86275,
        "processed_field_count": 172550,
        "input_bytes": 6744714,
        "input_field_count": 172550,
        "invalid_date_count": 0,
        "missing_field_count": 0,
        "out_of_order_timestamp_count": 0,
        "empty_bucket_count": 0,
        "sparse_bucket_count": 15,
        "bucket_count": 1528,
        "earliest_record_timestamp": 1454803200000,
        "latest_record_timestamp": 1455235196000,
        "last_data_time": 1491948163685,
        "latest_sparse_bucket_timestamp": 1455174900000,
        "input_record_count": 86275
      },
      "model_size_stats": {
        "job_id": "farequote",
        "result_type": "model_size_stats",
        "model_bytes": 387594,
        "total_by_field_count": 21,
        "total_over_field_count": 0,
        "total_partition_field_count": 20,
        "bucket_allocation_failures_count": 0,
        "memory_status": "ok",
        "log_time": 1491948163000,
        "timestamp": 1455234600000
      },
      "state": "closed"
    }
  ]
}

Flush Jobs

The flush job API forces any buffered data to be processed by the job.

Request

POST _xpack/ml/anomaly_detectors/<job_id>/_flush

Description

The flush job API is only applicable when sending data for analysis using the post data API. Depending on the content of the buffer, then it might additionally calculate new results.

Both flush and close operations are similar, however the flush is more efficient if you are expecting to send more data for analysis. When flushing, the job remains open and is available to continue analyzing data. A close operation additionally prunes and persists the model state to disk and the job must be opened again before analyzing further data.

Path Parameters

job_id (required)
(string) Identifier for the job

Query Parameters

advance_time
(string) Specifies that no data prior to the date advance_time is expected.
end
(string) When used in conjunction with calc_interim, specifies the range of buckets on which to calculate interim results.
calc_interim
(boolean) If true, calculates the interim results for the most recent bucket or all buckets within the latency period.
start
(string) When used in conjunction with calc_interim, specifies the range of buckets on which to calculate interim results.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example flushes the farequote job:

POST _xpack/ml/anomaly_detectors/farequote/_flush
{
  "calc_interim": true
}

When the operation succeeds, you receive the following results:

{
  "flushed": true
}

Open Jobs

A job must be opened in order for it to be ready to receive and analyze data. A job can be opened and closed multiple times throughout its lifecycle.

Request

POST _xpack/ml/anomaly_detectors/{job_id}/_open

Description

A job must be open in order to it to accept and analyze data.

When you open a new job, it starts with an empty model.

When you open an existing job, the most recent model state is automatically loaded. The job is ready to resume its analysis from where it left off, once new data is received.

Path Parameters

job_id (required)
(string) Identifier for the job

Request Body

open_timeout
(time) Controls the time to wait until a job has opened. The default value is 30 minutes.
ignore_downtime
(boolean) If true (default), any gap in data since it was last closed is treated as a maintenance window. That is to say, it is not an anomaly

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example opens the event_rate job and sets an optional property:

POST _xpack/ml/anomaly_detectors/event_rate/_open
{
  "ignore_downtime": false
}

When the job opens, you receive the following results:

{
  "opened": true
}

Post Data to Jobs

The post data API enables you to send data to an anomaly detection job for analysis.

Request

POST _xpack/ml/anomaly_detectors/<job_id>/_data --data-binary @<data-file.json>

Description

The job must have a state of open to receive and process the data.

The data that you send to the job must use the JSON format.

File sizes are limited to 100 Mb. If your file is larger, split it into multiple files and upload each one separately in sequential time order. When running in real time, it is generally recommended that you perform many small uploads, rather than queueing data to upload larger files.

When uploading data, check the job data counts for progress. The following records will not be processed:

  • Records not in chronological order and outside the latency window
  • Records with an invalid timestamp
Important

Data can only be accepted from a single connection. Use a single connection synchronously to send data, close, flush, or delete a single job. It is not currently possible to post data to multiple jobs using wildcards or a comma-separated list.

Path Parameters

job_id (required)
(string) Identifier for the job

Request Body

reset_start
(string) Specifies the start of the bucket resetting range
reset_end
(string) Specifies the end of the bucket resetting range

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example posts data from the farequote.json file to the farequote job:

$ curl -s -H "Content-type: application/json"
-X POST http:\/\/localhost:9200/_xpack/ml/anomaly_detectors/it_ops_new_kpi/_data
--data-binary @it_ops_new_kpi.json

When the data is sent, you receive information about the operational progress of the job. For example:

{
        "job_id":"it_ops_new_kpi",
        "processed_record_count":21435,
        "processed_field_count":64305,
        "input_bytes":2589063,
        "input_field_count":85740,
        "invalid_date_count":0,
        "missing_field_count":0,
        "out_of_order_timestamp_count":0,
        "empty_bucket_count":16,
        "sparse_bucket_count":0,
        "bucket_count":2165,
        "earliest_record_timestamp":1454020569000,
        "latest_record_timestamp":1455318669000,
        "last_data_time":1491952300658,
        "latest_empty_bucket_timestamp":1454541600000,
        "input_record_count":21435
}

For more information about these properties, see Job Stats.

Update Jobs

The update job API enables you to update certain properties of a job.

Request

POST _xpack/ml/anomaly_detectors/<job_id>/_update

Path Parameters

job_id (required)
(string) Identifier for the job

Request Body

The following properties can be updated after the job is created:

Name Description Requires Restart

analysis_limits: model_memory_limit

The approximate maximum amount of memory resources required for analytical processing, in MiB. See the section called “Analysis Limits”.

Yes

background_persist_interval

Advanced configuration option. The time between each periodic persistence of the model. See Job Resources.

Yes

custom_settings

Contains custom meta data about the job.

No

description

An optional description of the job. See Job Resources.

No

model_plot_config: enabled

If true, enables calculation and storage of the model bounds for each entity that is being analyzed. See the section called “Model Plot Config”.

No

model_snapshot_retention_days

The time in days that model snapshots are retained for the job. See Job Resources.

Yes

renormalization_window_days

Advanced configuration option. The period over which adjustments to the score are applied, as new data is seen. See Job Resources.

Yes

results_retention_days

Advanced configuration option. The number of days for which job results are retained. See Job Resources.

Yes

For those properties that have Requires Restart set to Yes in this table, if the job is open when you make the update, you must stop the data feed, close the job, then restart the data feed and open the job for the changes to take effect.

Note
  • You can update the analysis_limits only while the job is closed.
  • The model_memory_limit property value cannot be decreased.
  • If the memory_status property in the model_size_stats object has a value of hard_limit, this means that it was unable to process some data. You might want to re-run this job with an increased model_memory_limit.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example updates the it_ops_new_logs job:

POST _xpack/ml/anomaly_detectors/it_ops_new_logs/_update
{
  "description":"An updated job",
  "model_plot_config": {
    "enabled": true
  },
  "analysis_limits": {
    "model_memory_limit": 1024
  },
  "renormalization_window_days": 30,
  "background_persist_interval": "2h",
  "model_snapshot_retention_days": 7,
  "results_retention_days": 60,
  "custom_settings": {
    "custom_urls" : [{
      "url_name" : "Lookup IP",
      "url_value" : "http://geoiplookup.net/ip/$clientip$"
    }]
  }
}

When the job is updated, you receive a summary of the job configuration information, including the updated property values. For example:

{
  "job_id": "it_ops_new_logs",
  "job_type": "anomaly_detector",
  "description": "An updated job",
  "create_time": 1493678314204,
  "finished_time": 1493678315850,
  "analysis_config": {
    "bucket_span": "1800s",
    "categorization_field_name": "message",
    "detectors": [
      {
        "detector_description": "Unusual message counts",
        "function": "count",
        "by_field_name": "mlcategory",
        "detector_rules": []
      }
    ],
    "influencers": []
  },
  "analysis_limits": {
    "model_memory_limit": 1024
  },
  "data_description": {
    "time_field": "time",
    "time_format": "epoch_ms"
  },
  "model_plot_config": {
    "enabled": true
  },
  "renormalization_window_days": 30,
  "background_persist_interval": "2h",
  "model_snapshot_retention_days": 7,
  "results_retention_days": 60,
  "custom_settings": {
    "custom_urls": [
      {
        "url_name": "Lookup IP",
        "url_value": "http://geoiplookup.net/ip/$clientip$"
      }
    ]
  },
  "model_snapshot_id": "1493678315",
  "results_index_name": "shared"
}

Validate Detectors

The validate detectors API validates detector configuration information.

Request

POST _xpack/ml/anomaly_detectors/_validate/detector

Description

This API enables you validate the detector configuration before you create a job.

Request Body

For a list of the properties that you can specify in the body of this API, see detector configuration objects.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example validates detector configuration information:

POST _xpack/ml/anomaly_detectors/_validate/detector
{
  "function": "metric",
  "field_name": "responsetime",
  "by_field_name": "airline"
}

When the validation completes, you receive the following results:

{
  "acknowledged": true
}

Validate Jobs

The validate jobs API validates job configuration information.

Request

POST _xpack/ml/anomaly_detectors/_validate

Description

This API enables you validate the job configuration before you create the job.

Request Body

For a list of the properties that you can specify in the body of this API, see Job Resources.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example validates job configuration information:

POST _xpack/ml/anomaly_detectors/_validate
{
    "description" : "Unusual response times by airlines",
    "analysis_config" : {
        "bucket_span": "300S",
        "detectors" :[
          {
            "function": "metric",
            "field_name": "responsetime",
            "by_field_name": "airline"}],
            "influencers": [ "airline" ]
    },
    "data_description" : {
       "time_field": "time",
       "time_format": "yyyy-MM-dd'T'HH:mm:ssX"
    }
}

When the validation is complete, you receive the following results:

{
  "acknowledged": true
}