Create an anomaly detection job
Added in 5.4.0
If you include a datafeed_config
, you must have read index privileges on the source index.
If you include a datafeed_config
but do not provide a query, the datafeed uses {"match_all": {"boost": 1}}
.
Path parameters
-
job_id
string Required The identifier for the anomaly detection job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
Query parameters
-
allow_no_indices
boolean If
true
, wildcard indices expressions that resolve into no concrete indices are ignored. This includes the_all
string or when no indices are specified. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values. Valid values are:
all
: Match any data stream or index, including hidden ones.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, or both.none
: Wildcard patterns are not accepted.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.
-
ignore_throttled
boolean Deprecated If
true
, concrete, expanded or aliased indices are ignored when frozen.
Body
Required
-
allow_lazy_open
boolean Advanced configuration option. Specifies whether this job can open when there is insufficient machine learning node capacity for it to be immediately assigned to a node. By default, if a machine learning node with capacity to run the job cannot immediately be found, the open anomaly detection jobs API returns an error. However, this is also subject to the cluster-wide
xpack.ml.max_lazy_ml_nodes
setting. If this option is set to true, the open anomaly detection jobs API does not return an error and the job waits in the opening state until sufficient machine learning node capacity is available. -
analysis_config
object Required Additional properties are allowed.
-
analysis_limits
object Additional properties are allowed.
-
background_persist_interval
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
custom_settings
object Custom metadata about the job
Additional properties are allowed.
-
Advanced configuration option, which affects the automatic removal of old model snapshots for this job. It specifies a period of time (in days) after which only the first snapshot per day is retained. This period is relative to the timestamp of the most recent snapshot for this job. Valid values range from 0 to
model_snapshot_retention_days
. -
data_description
object Required Additional properties are allowed.
-
datafeed_config
object Additional properties are allowed.
-
description
string A description of the job.
-
job_id
string -
groups
array[string] A list of job groups. A job can belong to no groups or many.
-
model_plot_config
object Additional properties are allowed.
-
Advanced configuration option, which affects the automatic removal of old model snapshots for this job. It specifies the maximum period of time (in days) that snapshots are retained. This period is relative to the timestamp of the most recent snapshot for this job. By default, snapshots ten days older than the newest snapshot are deleted.
-
renormalization_window_days
number Advanced configuration option. The period over which adjustments to the score are applied, as new data is seen. The default value is the longer of 30 days or 100 bucket spans.
-
results_index_name
string -
results_retention_days
number Advanced configuration option. The period of time (in days) that results are retained. Age is calculated relative to the timestamp of the latest bucket result. If this property has a non-null value, once per day at 00:30 (server time), results that are the specified number of days older than the latest bucket result are deleted from Elasticsearch. The default value is null, which means all results are retained. Annotations generated by the system also count as results for retention purposes; they are deleted after the same number of days as results. Annotations added by users are retained forever.
curl \
--request PUT http://api.example.com/_ml/anomaly_detectors/{job_id} \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"analysis_config":{"detectors":[{"function":"sum","field_name":"bytes","detector_description":"Sum of bytes"}],"bucket_span":"15m"},"analysis_limits":{"model_memory_limit":"11MB"},"datafeed_config":{"query":{"bool":{"must":[{"match_all":{}}]}},"indices":["kibana_sample_data_logs"],"datafeed_id":"datafeed-test-job1","runtime_mappings":{"hour_of_day":{"type":"long","script":{"source":"emit(doc['timestamp'].value.getHour());"}}}},"data_description":{"time_field":"timestamp","time_format":"epoch_ms"},"model_plot_config":{"enabled":true,"annotations_enabled":true},"results_index_name":"test-job1"}'
{
"analysis_config": {
"detectors": [
{
"function": "sum",
"field_name": "bytes",
"detector_description": "Sum of bytes"
}
],
"bucket_span": "15m"
},
"analysis_limits": {
"model_memory_limit": "11MB"
},
"datafeed_config": {
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"indices": [
"kibana_sample_data_logs"
],
"datafeed_id": "datafeed-test-job1",
"runtime_mappings": {
"hour_of_day": {
"type": "long",
"script": {
"source": "emit(doc['timestamp'].value.getHour());"
}
}
}
},
"data_description": {
"time_field": "timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true,
"annotations_enabled": true
},
"results_index_name": "test-job1"
}
{
"job_id": "test-job1",
"job_type": "anomaly_detector",
"create_time": 1656087283340,
"job_version": "8.4.0",
"allow_lazy_open": false,
"analysis_config": {
"detectors": [
{
"function": "sum",
"field_name": "bytes",
"detector_index": 0,
"detector_description": "Sum of bytes"
}
],
"bucket_span": "15m",
"influencers": [],
"model_prune_window": "30d"
},
"analysis_limits": {
"model_memory_limit": "11mb",
"categorization_examples_limit": 4
},
"datafeed_config": {
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"job_id": "test-job1",
"indices": [
"kibana_sample_data_logs"
],
"datafeed_id": "datafeed-test-job1",
"query_delay": "61499ms",
"scroll_size": 1000,
"authorization": {
"roles": [
"superuser"
]
},
"chunking_config": {
"mode": "auto"
},
"indices_options": {
"allow_no_indices": true,
"expand_wildcards": [
"open"
],
"ignore_throttled": true,
"ignore_unavailable": false
},
"runtime_mappings": {
"hour_of_day": {
"type": "long",
"script": {
"source": "emit(doc['timestamp'].value.getHour());"
}
}
},
"delayed_data_check_config": {
"enabled": true
}
},
"data_description": {
"time_field": "timestamp",
"time_format": "epoch_ms"
},
"model_plot_config": {
"enabled": true,
"annotations_enabled": true
},
"results_index_name": "custom-test-job1",
"model_snapshot_retention_days": 10,
"daily_model_snapshot_retention_after_days": 1
}