Inference processoredit

This functionality is experimental and may be changed or removed completely in a future release. Elastic will take a best effort approach to fix any issues, but experimental features are not subject to the support SLA of official GA features.

Uses a pre-trained data frame analytics model to infer against the data that is being ingested in the pipeline.

Table 23. Inference Options

Name Required Default Description

model_id

yes

-

(String) The ID of the model to load and infer against.

target_field

no

ml.inference.<processor_tag>

(String) Field added to incoming documents to contain results objects.

field_map

no

If defined the model’s default field map

(Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.

inference_config

no

The default settings defined in the model

(Object) Contains the inference type and its options. There are two types: regression and classification.

if

no

-

Conditionally execute this processor.

on_failure

no

-

Handle failures for this processor. See Handling Failures in Pipelines.

ignore_failure

no

false

Ignore failures for this processor. See Handling Failures in Pipelines.

tag

no

-

An identifier for this processor. Useful for debugging and metrics.

{
  "inference": {
    "model_id": "flight_delay_regression-1571767128603",
    "target_field": "FlightDelayMin_prediction_infer",
    "field_map": {
      "your_field": "my_field"
    },
    "inference_config": { "regression": {} }
  }
}

Regression configuration optionsedit

Regression configuration for inference.

results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the results_field value of the data frame analytics job that was used to train the model, which defaults to <dependent_variable>_prediction.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.

Classification configuration optionsedit

Classification configuration for inference.

num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of feature importance values per document. By default, it is zero and no feature importance calculation occurs.
results_field
(Optional, string) The field that is added to incoming documents to contain the inference prediction. Defaults to the results_field value of the data frame analytics job that was used to train the model, which defaults to <dependent_variable>_prediction.
top_classes_results_field
(Optional, string) Specifies the field to which the top classes are written. Defaults to top_classes.
prediction_field_type
(Optional, string) Specifies the type of the predicted field to write. Acceptable values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.

inference_config examplesedit

{
  "inference_config": {
    "regression": {
      "results_field": "my_regression"
    }
  }
}

This configuration specifies a regression inference and the results are written to the my_regression field contained in the target_field results object.

{
  "inference_config": {
    "classification": {
      "num_top_classes": 2,
      "results_field": "prediction",
      "top_classes_results_field": "probabilities"
    }
  }
}

This configuration specifies a classification inference. The number of categories for which the predicted probabilities are reported is 2 (num_top_classes). The result is written to the prediction field and the top classes to the probabilities field. Both fields are contained in the target_field results object.

Feature importance object mappingedit

Update your index mapping of the feature importance result field as you can see below to get the full benefit of aggregating and searching for feature importance.

"ml.inference.feature_importance": {
  "type": "nested",
  "dynamic": true,
  "properties": {
    "feature_name": {
      "type": "keyword"
    },
    "importance": {
      "type": "double"
    }
  }
}

The mapping field name for feature importance is compounded as follows:

<ml.inference.target_field>.<inference.tag>.feature_importance

If inference.tag is not provided in the processor definition, it is not part of the field path. The <ml.inference.target_field> defaults to ml.inference.

For example, you provide a tag foo in the definition as you can see below:

{
  "tag": "foo",
  ...
}

The feature importance value is written to the ml.inference.foo.feature_importance field.

You can also specify a target field as follows:

{
  "tag": "foo",
  "target_field": "my_field"
}

In this case, feature importance is exposed in the my_field.foo.feature_importance field.