Simulate a pipeline | Elasticsearch API documentation

Simulate a pipeline Generally available; Added in 5.0.0

POST /_ingest/pipeline/{id}/_simulate

All methods and paths for this operation:

GET /_ingest/pipeline/_simulate

POST /_ingest/pipeline/_simulate

GET /_ingest/pipeline/{id}/_simulate

POST /_ingest/pipeline/{id}/_simulate

Run an ingest pipeline against a set of provided documents. You can either specify an existing pipeline to use with the provided documents or supply a pipeline definition in the body of the request.

Required authorization

Cluster privileges: read_pipeline

Path parameters

id string Required

The pipeline to test. If you don't specify a pipeline in the request body, this parameter is required.

Query parameters

verbose boolean

If true, the response includes output data for each processor in the executed pipeline.

application/json

Body Required

docs array[object] Required

Sample documents to test in the pipeline.
Hide docs attributes Show docs attributes object
- _id string
  
  Unique identifier for the document. This ID must be unique within the _index.
- _index string
  
  Name of the index containing the document.
- _source object Required
  
  JSON body for the document.
pipeline object Additional properties

The pipeline to test. If you don't specify the pipeline request path parameter, this parameter is required. If you specify both this and the request path parameter, the API only uses the request path parameter.
Hide pipeline attributes Show pipeline attributes object
- description string
  
  Description of the ingest pipeline.
- on_failure array[object]
  
  Processors to run immediately after a processor failure.
  Hide on_failure attributes Show on_failure attributes object
  
  append object
  
  Appends one or more values to an existing array if the field already exists and it is an array. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Creates an array containing the provided values if the field doesn’t exist. Accepts a single value or an array of values.
  
  attachment object
  
  The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika.
  
  bytes object
  
  Converts a human readable byte value (for example 1kb) to its value in bytes (for example 1024). If the field is an array of strings, all members of the array will be converted. Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 2^63.
  
  cef object
  
  Converts a CEF message into a structured format.
  
  circle object
  
  Converts circle definitions of shapes to regular polygons which approximate them.
  
  community_id object
  
  Computes the Community ID for network flow data as defined in the Community ID Specification. You can use a community ID to correlate network events related to a single flow.
  
  convert object
  
  Converts a field in the currently ingested document to a different type, such as converting a string to an integer. If the field value is an array, all members will be converted.
  
  csv object
  
  Extracts fields from CSV line out of a single text field within a document. Any empty field in CSV will be skipped.
  
  date object
  
  Parses dates from fields, and then uses the date or timestamp as the timestamp for the document.
  
  date_index_name object
  
  The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.
  
  dissect object
  
  Extracts structured fields out of a single text field by matching the text field against a delimiter-based pattern.
  
  dot_expander object
  
  Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor.
  
  drop object
  
  Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.
  
  enrich object
  
  The enrich processor can enrich documents with data from another index.
  
  fail object
  
  Raises an exception. This is useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
  
  fingerprint object
  
  Computes a hash of the document’s content. You can use this hash for content fingerprinting.
  
  foreach object
  
  Runs an ingest processor on each element of an array or object.
  
  ip_location object
  
  Currently an undocumented alias for GeoIP Processor.
  
  geo_grid object
  
  Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape. This is useful if there is a need to interact with the tile shapes as spatially indexable fields.
  
  geoip object
  
  The geoip processor adds information about the geographical location of an IPv4 or IPv6 address.
  
  grok object
  
  Extracts structured fields out of a single text field within a document. You choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.
  
  gsub object
  
  Converts a string field by applying a regular expression and a replacement. If the field is an array of string, all members of the array will be converted. If any non-string values are encountered, the processor will throw an exception.
  
  html_strip object
  
  Removes HTML tags from the field. If the field is an array of strings, HTML tags will be removed from all members of the array.
  
  inference object
  
  Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.
  
  join object
  
  Joins each element of an array into a single string using a separator character between each element. Throws an error when the field is not an array.
  
  json object
  
  Parses a string containing JSON data into a structured object, string, or other value.
  
  kv object
  
  This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.
  
  lowercase object
  
  Converts a string to its lowercase equivalent. If the field is an array of strings, all members of the array will be converted.
  
  network_direction object
  
  Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
  
  pipeline object
  
  Executes another pipeline.
  
  redact object
  
  The Redact processor uses the Grok rules engine to obscure text in the input document matching the given Grok patterns. The processor can be used to obscure Personal Identifying Information (PII) by configuring it to detect known patterns such as email or IP addresses. Text that matches a Grok pattern is replaced with a configurable string such as <EMAIL> where an email address is matched or simply replace all matches with the text <REDACTED> if preferred.
  
  registered_domain object
  
  Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN). Uses the registered domains defined in the Mozilla Public Suffix List.
  
  remove object
  
  Removes existing fields. If one field doesn’t exist, an exception will be thrown.
  
  rename object
  
  Renames an existing field. If the field doesn’t exist or the new name is already used, an exception will be thrown.
  
  reroute object
  
  Routes a document to another target index or data stream. When setting the destination option, the target is explicitly specified and the dataset and namespace options can’t be set. When the destination option is not set, this processor is in a data stream mode. Note that in this mode, the reroute processor can only be used on data streams that follow the data stream naming scheme.
  
  script object
  
  Runs an inline or stored script on incoming documents. The script runs in the ingest context.
  
  set object
  
  Adds a field with the specified value. If the field already exists, its value will be replaced with the provided one.
  
  set_security_user object
  
  Sets user-related details (such as username, roles, email, full_name, metadata, api_key, realm and authentication_type) from the current authenticated user to the current document by pre-processing the ingest.
  
  sort object
  
  Sorts the elements of an array ascending or descending. Homogeneous arrays of numbers will be sorted numerically, while arrays of strings or heterogeneous arrays of strings + numbers will be sorted lexicographically. Throws an error when the field is not an array.
  
  split object
  
  Splits a field into an array using a separator character. Only works on string fields.
  
  terminate object
  
  Terminates the current ingest pipeline, causing no further processors to be run. This will normally be executed conditionally, using the if option.
  
  trim object
  
  Trims whitespace from a field. If the field is an array of strings, all members of the array will be trimmed. This only works on leading and trailing whitespace.
  
  uppercase object
  
  Converts a string to its uppercase equivalent. If the field is an array of strings, all members of the array will be converted.
  
  urldecode object
  
  URL-decodes a string. If the field is an array of strings, all members of the array will be decoded.
  
  uri_parts object
  
  Parses a Uniform Resource Identifier (URI) string and extracts its components as an object. This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.
  
  user_agent object
  
  The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.
- processors array[object]
  
  Processors used to perform transformations on documents before indexing. Processors run sequentially in the order specified.
  Hide processors attributes Show processors attributes object
  
  append object
  
  Appends one or more values to an existing array if the field already exists and it is an array. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Creates an array containing the provided values if the field doesn’t exist. Accepts a single value or an array of values.
  
  attachment object
  
  The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika.
  
  bytes object
  
  Converts a human readable byte value (for example 1kb) to its value in bytes (for example 1024). If the field is an array of strings, all members of the array will be converted. Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 2^63.
  
  cef object
  
  Converts a CEF message into a structured format.
  
  circle object
  
  Converts circle definitions of shapes to regular polygons which approximate them.
  
  community_id object
  
  Computes the Community ID for network flow data as defined in the Community ID Specification. You can use a community ID to correlate network events related to a single flow.
  
  convert object
  
  Converts a field in the currently ingested document to a different type, such as converting a string to an integer. If the field value is an array, all members will be converted.
  
  csv object
  
  Extracts fields from CSV line out of a single text field within a document. Any empty field in CSV will be skipped.
  
  date object
  
  Parses dates from fields, and then uses the date or timestamp as the timestamp for the document.
  
  date_index_name object
  
  The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.
  
  dissect object
  
  Extracts structured fields out of a single text field by matching the text field against a delimiter-based pattern.
  
  dot_expander object
  
  Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor.
  
  drop object
  
  Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.
  
  enrich object
  
  The enrich processor can enrich documents with data from another index.
  
  fail object
  
  Raises an exception. This is useful for when you expect a pipeline to fail and want to relay a specific message to the requester.
  
  fingerprint object
  
  Computes a hash of the document’s content. You can use this hash for content fingerprinting.
  
  foreach object
  
  Runs an ingest processor on each element of an array or object.
  
  ip_location object
  
  Currently an undocumented alias for GeoIP Processor.
  
  geo_grid object
  
  Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape. This is useful if there is a need to interact with the tile shapes as spatially indexable fields.
  
  geoip object
  
  The geoip processor adds information about the geographical location of an IPv4 or IPv6 address.
  
  grok object
  
  Extracts structured fields out of a single text field within a document. You choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.
  
  gsub object
  
  Converts a string field by applying a regular expression and a replacement. If the field is an array of string, all members of the array will be converted. If any non-string values are encountered, the processor will throw an exception.
  
  html_strip object
  
  Removes HTML tags from the field. If the field is an array of strings, HTML tags will be removed from all members of the array.
  
  inference object
  
  Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.
  
  join object
  
  Joins each element of an array into a single string using a separator character between each element. Throws an error when the field is not an array.
  
  json object
  
  Parses a string containing JSON data into a structured object, string, or other value.
  
  kv object
  
  This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.
  
  lowercase object
  
  Converts a string to its lowercase equivalent. If the field is an array of strings, all members of the array will be converted.
  
  network_direction object
  
  Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.
  
  pipeline object
  
  Executes another pipeline.
  
  redact object
  
  The Redact processor uses the Grok rules engine to obscure text in the input document matching the given Grok patterns. The processor can be used to obscure Personal Identifying Information (PII) by configuring it to detect known patterns such as email or IP addresses. Text that matches a Grok pattern is replaced with a configurable string such as <EMAIL> where an email address is matched or simply replace all matches with the text <REDACTED> if preferred.
  
  registered_domain object
  
  Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN). Uses the registered domains defined in the Mozilla Public Suffix List.
  
  remove object
  
  Removes existing fields. If one field doesn’t exist, an exception will be thrown.
  
  rename object
  
  Renames an existing field. If the field doesn’t exist or the new name is already used, an exception will be thrown.
  
  reroute object
  
  Routes a document to another target index or data stream. When setting the destination option, the target is explicitly specified and the dataset and namespace options can’t be set. When the destination option is not set, this processor is in a data stream mode. Note that in this mode, the reroute processor can only be used on data streams that follow the data stream naming scheme.
  
  script object
  
  Runs an inline or stored script on incoming documents. The script runs in the ingest context.
  
  set object
  
  Adds a field with the specified value. If the field already exists, its value will be replaced with the provided one.
  
  set_security_user object
  
  Sets user-related details (such as username, roles, email, full_name, metadata, api_key, realm and authentication_type) from the current authenticated user to the current document by pre-processing the ingest.
  
  sort object
  
  Sorts the elements of an array ascending or descending. Homogeneous arrays of numbers will be sorted numerically, while arrays of strings or heterogeneous arrays of strings + numbers will be sorted lexicographically. Throws an error when the field is not an array.
  
  split object
  
  Splits a field into an array using a separator character. Only works on string fields.
  
  terminate object
  
  Terminates the current ingest pipeline, causing no further processors to be run. This will normally be executed conditionally, using the if option.
  
  trim object
  
  Trims whitespace from a field. If the field is an array of strings, all members of the array will be trimmed. This only works on leading and trailing whitespace.
  
  uppercase object
  
  Converts a string to its uppercase equivalent. If the field is an array of strings, all members of the array will be converted.
  
  urldecode object
  
  URL-decodes a string. If the field is an array of strings, all members of the array will be decoded.
  
  uri_parts object
  
  Parses a Uniform Resource Identifier (URI) string and extracts its components as an object. This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.
  
  user_agent object
  
  The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.
- version number
  
  Version number used by external systems to track ingest pipelines.
- deprecated boolean
  
  Marks this ingest pipeline as deprecated. When a deprecated ingest pipeline is referenced as the default or final pipeline when creating or updating a non-deprecated index template, Elasticsearch will emit a deprecation warning.
  
  Default value is false.
- _meta object
  
  Arbitrary metadata about the ingest pipeline. This map is not automatically generated by Elasticsearch.
  Hide _meta attribute Show _meta attribute object
  
  * object Additional properties
- created_date string | number Generally available; Added in 9.2.0
  
  Date and time when the pipeline was created. Only returned if the human query parameter is true.
  
  One of:
  string-1 string UnitMillis number
  
  Time unit for milliseconds
- created_date_millis number Generally available; Added in 9.2.0
  
  Date and time when the pipeline was created, in milliseconds since the epoch.
- modified_date string | number Generally available; Added in 9.2.0
  
  Date and time when the pipeline was last modified. Only returned if the human query parameter is true.
  
  One of:
  string-1 string UnitMillis number
  
  Time unit for milliseconds
- modified_date_millis number Generally available; Added in 9.2.0
  
  Date and time when the pipeline was last modified, in milliseconds since the epoch.
- field_access_pattern string Generally available; Added in 9.2.0
  
  Controls how processors in this pipeline should read and write data on a document's source.
  
  Values are classic or flexible.

Responses

200 application/json
Hide response attribute Show response attribute object
- docs array[object] Required
  
  Hide docs attributes Show docs attributes object
  
  doc object
  
  The simulated document, with optional metadata.
  
  Hide doc attributes Show doc attributes object
  
  _id string Required
  
  Unique identifier for the document. This ID must be unique within the _index.
  
  _index string Required
  
  Name of the index containing the document.
  
  _ingest object Required
  
  _routing string
  
  Value used to send the document to a specific primary shard.
  
  _source object Required
  
  JSON body for the document.
  
  Hide _source attribute Show _source attribute object
  
  * object Additional properties
  
  _version
  
  _version_type string
  
  Supported values include:
  
  internal: Use internal versioning that starts at 1 and increments with each update or delete.
  
  external: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.
  
  external_gte: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.
  
  Values are internal, external, or external_gte.
  
  error object
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  Hide error attributes Show error attributes object
  
  type string Required
  
  The type of error
  
  reason string | null
  
  A human-readable explanation of the error, in English.
  
  One of:
  string-1 string string-2 string | null
  
  stack_trace string
  
  The server stack trace. Present only if the error_trace=true parameter was sent with the request.
  
  caused_by object
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  root_cause array[object]
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  suppressed array[object]
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  processor_results array[object]
  
  Hide processor_results attributes Show processor_results attributes object
  
  doc object
  
  The simulated document, with optional metadata.
  
  tag string
  
  processor_type string
  
  status string
  
  Values are success, error, error_ignored, skipped, or dropped.
  
  description string
  
  ignored_error object
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.
  
  error object
  
  Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

POST /_ingest/pipeline/{id}/_simulate

POST /_ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "set" : {
          "field" : "field2",
          "value" : "_value"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar"
      }
    },
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "rab"
      }
    }
  ]
}

resp = client.ingest.simulate(
    pipeline={
        "description": "_description",
        "processors": [
            {
                "set": {
                    "field": "field2",
                    "value": "_value"
                }
            }
        ]
    },
    docs=[
        {
            "_index": "index",
            "_id": "id",
            "_source": {
                "foo": "bar"
            }
        },
        {
            "_index": "index",
            "_id": "id",
            "_source": {
                "foo": "rab"
            }
        }
    ],
)

const response = await client.ingest.simulate({
  pipeline: {
    description: "_description",
    processors: [
      {
        set: {
          field: "field2",
          value: "_value",
        },
      },
    ],
  },
  docs: [
    {
      _index: "index",
      _id: "id",
      _source: {
        foo: "bar",
      },
    },
    {
      _index: "index",
      _id: "id",
      _source: {
        foo: "rab",
      },
    },
  ],
});

response = client.ingest.simulate(
  body: {
    "pipeline": {
      "description": "_description",
      "processors": [
        {
          "set": {
            "field": "field2",
            "value": "_value"
          }
        }
      ]
    },
    "docs": [
      {
        "_index": "index",
        "_id": "id",
        "_source": {
          "foo": "bar"
        }
      },
      {
        "_index": "index",
        "_id": "id",
        "_source": {
          "foo": "rab"
        }
      }
    ]
  }
)

$resp = $client->ingest()->simulate([
    "body" => [
        "pipeline" => [
            "description" => "_description",
            "processors" => array(
                [
                    "set" => [
                        "field" => "field2",
                        "value" => "_value",
                    ],
                ],
            ),
        ],
        "docs" => array(
            [
                "_index" => "index",
                "_id" => "id",
                "_source" => [
                    "foo" => "bar",
                ],
            ],
            [
                "_index" => "index",
                "_id" => "id",
                "_source" => [
                    "foo" => "rab",
                ],
            ],
        ),
    ],
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"pipeline":{"description":"_description","processors":[{"set":{"field":"field2","value":"_value"}}]},"docs":[{"_index":"index","_id":"id","_source":{"foo":"bar"}},{"_index":"index","_id":"id","_source":{"foo":"rab"}}]}' "$ELASTICSEARCH_URL/_ingest/pipeline/_simulate"

client.ingest().simulate(s -> s
    .docs(List.of(Document.of(d -> d
            .id("id")
            .index("index")
            .source(JsonData.fromJson("""
{"foo":"bar"}
"""))
        ),Document.of(d -> d
            .id("id")
            .index("index")
            .source(JsonData.fromJson("""
{"foo":"rab"}
"""))
        )))
    .pipeline(p -> p
        .description("_description")
        .onFailure(List.of())
        .processors(pr -> pr
            .set(se -> se
                .field("field2")
                .value(JsonData.fromJson("\"_value\""))
                .onFailure(List.of())
            )
        )
        .meta(Map.of())
    )
);

Request example

You can specify the used pipeline either in the request body or as a path parameter.

{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "set" : {
          "field" : "field2",
          "value" : "_value"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar"
      }
    },
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "rab"
      }
    }
  ]
}

Response examples (200)

A successful response for running an ingest pipeline against a set of provided documents.

{
   "docs": [
      {
         "doc": {
            "_id": "id",
            "_index": "index",
            "_version": "-3",
            "_source": {
               "field2": "_value",
               "foo": "bar"
            },
            "_ingest": {
               "timestamp": "2017-05-04T22:30:03.187Z"
            }
         }
      },
      {
         "doc": {
            "_id": "id",
            "_index": "index",
            "_version": "-3",
            "_source": {
               "field2": "_value",
               "foo": "rab"
            },
            "_ingest": {
               "timestamp": "2017-05-04T22:30:03.188Z"
            }
         }
      }
   ]
}

Simulate a pipeline Generally available; Added in 5.0.0

Required authorization

Path parameters

Query parameters

Body Required

created_date string | number Generally available; Added in 9.2.0

modified_date string | number Generally available; Added in 9.2.0

Responses

reason string | null