Data frame analytics job resourcesedit

Data frame analytics resources relate to APIs such as Create data frame analytics jobs and Get data frame analytics jobs.

Propertiesedit

analysis
(object) The type of analysis that is performed on the source. For example: outlier_detection. For more information, see Analysis objects.
analyzed_fields
(object) You can specify both includes and/or excludes patterns. If analyzed_fields is not set, only the relevant fields will be included. For example all the numeric fields for outlier detection.
PUT _ml/data_frame/analytics/loganalytics
{
  "source": {
    "index": "logdata"
  },
  "dest": {
    "index": "logdata_out"
  },
  "analysis": {
    "outlier_detection": {
    }
  },
  "analyzed_fields": {
        "includes": [ "request.bytes", "response.counts.error" ],
        "excludes": [ "source.geo" ]
  }
}
dest
(object) The destination configuration of the analysis. The index property (string) is the name of the index in which to store the results of the data frame analytics job. The results_field (string) property defines the name of the field in which to store the results of the analysis. The default value is ml.
id
(string) The unique identifier for the data frame analytics job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters. This property is informational; you cannot change the identifier for existing jobs.
model_memory_limit
(string) The approximate maximum amount of memory resources that are permitted for analytical processing. The default value for data frame analytics jobs is 1gb. If your elasticsearch.yml file contains an xpack.ml.max_model_memory_limit setting, an error occurs when you try to create data frame analytics jobs that have model_memory_limit values greater than that setting. For more information, see Machine learning settings.
source
(object) The source configuration, consisting of index (array) which is an array of index names on which to perform the analysis. It can be a single index or index pattern as well as an array of indices or patterns. Optionally, source can have a query (object) property. The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.

Analysis objectsedit

Data frame analytics resources contain analysis objects. For example, when you create a data frame analytics job, you must define the type of analysis it performs. Currently, outlier_detection is the only available type of analysis, however, other types will be added, for example regression.

Outlier detection configuration objectsedit

An outlier detection configuration object has the following properties:

n_neighbors
(integer) Defines the value for how many nearest neighbors each method of outlier detection will use to calculate its outlier score. When the value is not set, the system will dynamically detect an appropriate value.
method
(string) Sets the method that outlier detection uses. If the method is not set outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score. We recommend to use the ensemble method. Available methods are lof, ldof, distance_kth_nn, distance_knn.
feature_influence_threshold
(double) The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1 (0.1 by default).