There are a few concepts that are core to machine learning in X-Pack. Understanding these concepts from the outset will tremendously help ease the learning process.
Machine learning jobs contain the configuration information and metadata necessary to perform an analytics task. For a list of the properties associated with a job, see Job Resources.
Jobs can analyze either a one-off batch of data or continuously in real time. Datafeeds retrieve data from Elasticsearch for analysis. Alternatively you can POST data from any source directly to an API.
As part of the configuration information that is associated with a job, detectors define the type of analysis that needs to be done. They also specify which fields to analyze. You can have more than one detector in a job, which is more efficient than running multiple jobs against the same data. For a list of the properties associated with detectors, see Detector Configuration Objects.
The X-Pack machine learning features use the concept of a bucket to divide the time series into batches for processing. The bucket span is part of the configuration information for a job. It defines the time interval that is used to summarize and model the data. This is typically between 5 minutes to 1 hour and it depends on your data characteristics. When you set the bucket span, take into account the granularity at which you want to analyze, the frequency of the input data, the typical duration of the anomalies, and the frequency at which alerting is required.
A machine learning node is a node that has
node.ml set to
which is the default behavior. If you set
false, the node can
service API requests but it cannot run jobs. If you want to use X-Pack machine learning
features, there must be at least one machine learning node in your cluster. For more
information about this setting, see Machine Learning Settings.
The X-Pack machine learning features include analysis functions that provide a wide variety of flexible ways to analyze data for anomalies.
When you create jobs, you specify one or more detectors, which define the type of analysis that needs to be done. If you are creating your job by using machine learning APIs, you specify the functions in Detector Configuration Objects. If you are creating your job in Kibana, you specify the functions differently depending on whether you are creating single metric, multi-metric, or advanced jobs. For a demonstration of creating jobs in Kibana, see Getting Started.
Most functions detect anomalies in both low and high values. In statistical
terminology, they apply a two-sided test. Some functions offer low and high
variations (for example,
high_count). These variations
apply one-sided tests, detecting anomalies only when the values are low or
high, depending one which alternative is used.