Datafeeds

Create Datafeeds

The create datafeed API enables you to instantiate a datafeed.

Request

PUT _xpack/ml/datafeeds/<feed_id>

Description

You must create a job before you create a datafeed. You can associate only one datafeed to each job.

Path Parameters

feed_id (required)
(string) A numerical character string that uniquely identifies the datafeed.

Request Body

aggregations
(object) If set, the datafeed performs aggregation searches. For more information, see Datafeed Resources.
chunking_config
(object) Specifies how data searches are split into time chunks. See Chunking Configuration Objects.
frequency
(time units) The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. For example: 150s.
indexes (required)
(array) An array of index names. Wildcards are supported. For example: ["it_ops_metrics", "server*"].
job_id (required)
(string) A numerical character string that uniquely identifies the job.
query
(object) The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {"boost": 1}}.
query_delay
(time units) The number of seconds behind real time that data is queried. For example, if data from 10:04 a.m. might not be searchable in Elasticsearch until 10:06 a.m., set this property to 120 seconds. The default value is 60s.
script_fields
(object) Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields. For more information, see Script Fields.
scroll_size
(unsigned integer) The size parameter that is used in Elasticsearch searches. The default value is 1000.
types (required)
(array) A list of types to search for within the specified indices. For example: ["network","sql","kpi"].

For more information about these properties, see Datafeed Resources.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example creates the datafeed-it-ops-kpi datafeed:

PUT _xpack/ml/datafeeds/datafeed-it-ops-kpi
{
  "job_id": "it-ops-kpi",
  "indexes": ["it_ops_metrics"],
  "types": ["kpi","network","sql"],
  "query": {
    "match_all": {
          "boost": 1
    }
  }
}

When the datafeed is created, you receive the following results:

{
  "datafeed_id": "datafeed-it-ops-kpi",
  "job_id": "it-ops-kpi",
  "query_delay": "1m",
  "indexes": [
    "it_ops_metrics"
  ],
  "types": [
    "kpi",
    "network",
    "sql"
  ],
  "query": {
    "match_all": {
      "boost": 1
    }
  },
  "scroll_size": 1000,
  "chunking_config": {
    "mode": "auto"
  }
}

Delete Datafeeds

The delete datafeed API enables you to delete an existing datafeed.

Request

DELETE _xpack/ml/datafeeds/<feed_id>

Description

Note

You must stop the datafeed before you can delete it.

Path Parameters

feed_id (required)
(string) Identifier for the datafeed

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example deletes the datafeed-it-ops datafeed:

DELETE _xpack/ml/datafeeds/datafeed-it-ops

When the datafeed is deleted, you receive the following results:

{
  "acknowledged": true
}

Get Datafeeds

The get datafeeds API enables you to retrieve configuration information for datafeeds.

Request

GET _xpack/ml/datafeeds/

GET _xpack/ml/datafeeds/<feed_id>

Path Parameters

feed_id
(string) Identifier for the datafeed. This parameter does not support wildcards, but you can specify _all or omit the feed_id to get information about all datafeeds.

Results

The API returns the following information:

datafeeds
(array) An array of datafeed objects. For more information, see Datafeed Resources.

Authorization

You must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example gets configuration information for the datafeed-it-ops-kpi datafeed:

GET _xpack/ml/datafeeds/datafeed-it-ops-kpi

The API returns the following results:

{
  "count": 1,
  "datafeeds": [
    {
      "datafeed_id": "datafeed-it-ops-kpi",
      "job_id": "it-ops-kpi",
      "query_delay": "60s",
      "frequency": "150s",
      "indexes": [
        "it_ops_metrics"
      ],
      "types": [
        "kpi",
        "network",
        "sql"
      ],
      "query": {
        "match_all": {
          "boost": 1
        }
      },
      "aggregations": {
        "buckets": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": 30000,
            "offset": 0,
            "order": {
              "_key": "asc"
            },
            "keyed": false,
            "min_doc_count": 0
          },
          "aggregations": {
            "events_per_min": {
              "sum": {
                "field": "events_per_min"
              }
            },
            "@timestamp": {
              "max": {
                "field": "@timestamp"
              }
            }
          }
        }
      },
      "scroll_size": 1000,
      "chunking_config": {
        "mode": "manual",
        "time_span": "30000000ms"
      }
    }
  ]
}

Get Datafeed Statistics

The get datafeed statistics API enables you to retrieve usage information for datafeeds.

Request

GET _xpack/ml/datafeeds/_stats

GET _xpack/ml/datafeeds/<feed_id>/_stats

Description

If the datafeed is stopped, the only information you receive is the datafeed_id and the state.

Path Parameters

feed_id
(string) Identifier for the datafeed. This parameter does not support wildcards, but you can specify _all or omit the feed_id to get information about all datafeeds.

Results

The API returns the following information:

datafeeds
(array) An array of datafeed count objects. For more information, see the section called “Datafeed Counts”.

Authorization

You must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example gets usage information for the datafeed-farequote datafeed:

GET _xpack/ml/datafeeds/datafeed-farequote/_stats

The API returns the following results:

{
  "count": 1,
  "datafeeds": [
    {
      "datafeed_id": "datafeed-farequote",
      "state": "started",
      "node": {
        "id": "IO_gxe2_S8mrzu7OpmK5Jw",
        "name": "IO_gxe2",
        "ephemeral_id": "KHMWPZoMToOzSsZY9lDDgQ",
        "transport_address": "127.0.0.1:9300",
        "attributes": {
          "max_running_jobs": "10"
        }
      },
      "assignment_explanation": ""
    }
  ]
}

Preview Datafeeds

The preview datafeed API enables you to preview a datafeed.

Request

GET _xpack/ml/datafeeds/<datafeed_id>/_preview

Description

The API returns the first "page" of results from the search that is created by using the current datafeed settings. This preview shows the structure of the data that will be passed to the anomaly detection engine.

Path Parameters

datafeed_id (required)
(string) Identifier for the datafeed

Authorization

You must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example obtains a preview of the datafeed-farequote datafeed:

GET _xpack/ml/datafeeds/datafeed-farequote/_preview

The data that is returned for this example is as follows:

[
  {
    "@timestamp": 1454803200000,
    "airline": "AAL",
    "responsetime": 132.20460510253906
  },
  {
    "@timestamp": 1454803200000,
    "airline": "JZA",
    "responsetime": 990.4628295898438
  },
  {
    "@timestamp": 1454803200000,
    "airline": "JBU",
    "responsetime": 877.5927124023438
  },
  ...
]

Start Datafeeds

A datafeed must be started in order to retrieve data from Elasticsearch. A datafeed can be started and stopped multiple times throughout its lifecycle.

Request

POST _xpack/ml/datafeeds/<feed_id>/_start

Description

Note

Before you can start a datafeed, the job must be open. Otherwise, an error occurs.

When you start a datafeed, you can specify a start time. This enables you to include a training period, providing you have this data available in Elasticsearch. If you want to analyze from the beginning of a dataset, you can specify any date earlier than that beginning date.

If you do not specify a start time and the datafeed is associated with a new job, the analysis starts from the earliest time for which data is available.

When you start a datafeed, you can also specify an end time. If you do so, the job analyzes data from the start time until the end time, at which point the analysis stops. This scenario is useful for a one-off batch analysis. If you do not specify an end time, the datafeed runs continuously.

The start and end times can be specified by using one of the following formats:

  • ISO 8601 format with milliseconds, for example 2017-01-22T06:00:00.000Z
  • ISO 8601 format without milliseconds, for example 2017-01-22T06:00:00+00:00
  • Seconds from the Epoch, for example 1390370400

Date-time arguments using either of the ISO 8601 formats must have a time zone designator, where Z is accepted as an abbreviation for UTC time.

Note

When a URL is expected (for example, in browsers), the + used in time zone designators must be encoded as %2B.

If the system restarts, any jobs that had datafeeds running are also restarted.

When a stopped datafeed is restarted, it continues processing input data from the next millisecond after it was stopped. If new data was indexed for that exact millisecond between stopping and starting, it will be ignored. If you specify a start value that is earlier than the timestamp of the latest processed record, the datafeed continues from 1 millisecond after the timestamp of the latest processed record.

Path Parameters

feed_id (required)
(string) Identifier for the datafeed

Request Body

end
(string) The time that the datafeed should end. This value is exclusive. The default value is an empty string.
start
(string) The time that the datafeed should begin. This value is inclusive. The default value is an empty string.
timeout
(time) Controls the amount of time to wait until a datafeed starts. The default value is 20 seconds.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example starts the datafeed-it-ops-kpi datafeed:

POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_start
{
  "start": "2017-04-07T18:22:16Z"
}

When the datafeed starts, you receive the following results:

{
  "started": true
}

Stop Datafeeds

A datafeed that is stopped ceases to retrieve data from Elasticsearch. A datafeed can be started and stopped multiple times throughout its lifecycle.

Request

POST _xpack/ml/datafeeds/<feed_id>/_stop

Path Parameters

feed_id (required)
(string) Identifier for the datafeed

Request Body

force
(boolean) If true, the datafeed is stopped forcefully.
timeout
(time) Controls the amount of time to wait until a datafeed stops. The default value is 20 seconds.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example stops the datafeed-it-ops-kpi datafeed:

POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_stop
{
  "timeout": "30s"
}

When the datafeed stops, you receive the following results:

{
  "stopped": true
}

Update Datafeeds

The update datafeed API enables you to update certain properties of a datafeed.

Request

POST _xpack/ml/datafeeds/<feed_id>/_update

Path Parameters

feed_id (required)
(string) Identifier for the datafeed

Request Body

The following properties can be updated after the datafeed is created:

aggregations
(object) If set, the datafeed performs aggregation searches. For more information, see Datafeed Resources.
chunking_config
(object) Specifies how data searches are split into time chunks. See Chunking Configuration Objects.
frequency
(time units) The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. For example: 150s.
indexes
(array) An array of index names. Wildcards are supported. For example: ["it_ops_metrics", "server*"].
job_id
(string) A numerical character string that uniquely identifies the job.
query
(object) The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {"boost": 1}}.
query_delay
(time units) The number of seconds behind real-time that data is queried. For example, if data from 10:04 a.m. might not be searchable in Elasticsearch until 10:06 a.m., set this property to 120 seconds. The default value is 60s.
script_fields
(object) Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields. For more information, see Script Fields.
scroll_size
(unsigned integer) The size parameter that is used in Elasticsearch searches. The default value is 1000.
types
(array) A list of types to search for within the specified indices. For example: ["network","sql","kpi"].

For more information about these properties, see Datafeed Resources.

Authorization

You must have manage_ml, or manage cluster privileges to use this API. For more information, see Cluster Privileges.

Examples

The following example updates the query for the datafeed-it-ops-kpi datafeed so that only log entries of error level are analyzed:

POST _xpack/ml/datafeeds/datafeed-it-ops-kpi/_update
{
  "query": {
    "term": {
      "level": "error"
    }
  }
}

When the datafeed is updated, you receive the full datafeed configuration with with the updated values:

{
  "datafeed_id": "datafeed-it-ops-kpi",
  "job_id": "it-ops-kpi",
  "query_delay": "1m",
  "indexes": ["it-ops"],
  "types": ["logs"],
  "query": {
    "term": {
      "level": {
        "value": "error",
        "boost": 1
      }
    }
  },
  "scroll_size": 1000,
  "chunking_config": {
    "mode": "auto"
  }
}