Explain data frame analytics APIedit

Explains a data frame analytics config.

This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features.

Requestedit

GET _ml/data_frame/analytics/_explain

POST _ml/data_frame/analytics/_explain

GET _ml/data_frame/analytics/<data_frame_analytics_id>/_explain

POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain

Prerequisitesedit

If the Elasticsearch security features are enabled, you must have the following privileges:

  • cluster: monitor_ml

For more information, see Security privileges and Built-in roles.

Descriptionedit

This API provides explanations for a data frame analytics config that either exists already or one that has not been created yet. The following explanations are provided:

  • which fields are included or not in the analysis and why,
  • how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for model_memory_limit setting later on.

If you have object fields or fields that are excluded via source filtering, they are not included in the explanation.

Path parametersedit

<data_frame_analytics_id>
(Optional, string) Identifier for the data frame analytics job.

Request bodyedit

data_frame_analytics_config
(Optional, object) Intended configuration of data frame analytics job. Note that id and dest don’t need to be provided in the context of this API.

Response bodyedit

The API returns a response that contains the following:

field_selection

(array) An array of objects that explain selection for each field, sorted by the field names. Each object in the array has the following properties:

name
(string) The field name.
mapping_types
(string) The mapping types of the field.
is_included
(boolean) Whether the field is selected to be included in the analysis.
is_required
(boolean) Whether the field is required.
feature_type
(string) The feature type of this field for the analysis. May be categorical or numerical.
reason
(string) The reason a field is not selected to be included in the analysis.
memory_estimation

(object) An object containing the memory estimates. The object has the following properties:

expected_memory_without_disk
(string) Estimated memory usage under the assumption that the whole data frame analytics should happen in memory (i.e. without overflowing to disk).
expected_memory_with_disk
(string) Estimated memory usage under the assumption that overflowing to disk is allowed during data frame analytics. expected_memory_with_disk is usually smaller than expected_memory_without_disk as using disk allows to limit the main memory needed to perform data frame analytics.

Examplesedit

POST _ml/data_frame/analytics/_explain
{
  "data_frame_analytics_config": {
    "source": {
      "index": "houses_sold_last_10_yrs"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "price"
      }
    }
  }
}

The API returns the following results:

{
  "field_selection": [
    {
      "field": "number_of_bedrooms",
      "mappings_types": ["integer"],
      "is_included": true,
      "is_required": false,
      "feature_type": "numerical"
    },
    {
      "field": "postcode",
      "mappings_types": ["text"],
      "is_included": false,
      "is_required": false,
      "reason": "[postcode.keyword] is preferred because it is aggregatable"
    },
    {
      "field": "postcode.keyword",
      "mappings_types": ["keyword"],
      "is_included": true,
      "is_required": false,
      "feature_type": "categorical"
    },
    {
      "field": "price",
      "mappings_types": ["float"],
      "is_included": true,
      "is_required": true,
      "feature_type": "numerical"
    }
  ],
  "memory_estimation": {
    "expected_memory_without_disk": "128MB",
    "expected_memory_with_disk": "32MB"
  }
}