Simulate a pipeline Generally available; Added in 5.0.0

POST /_ingest/pipeline/{id}/_simulate

All methods and paths for this operation:

GET /_ingest/pipeline/_simulate

POST /_ingest/pipeline/_simulate
GET /_ingest/pipeline/{id}/_simulate
POST /_ingest/pipeline/{id}/_simulate

Run an ingest pipeline against a set of provided documents. You can either specify an existing pipeline to use with the provided documents or supply a pipeline definition in the body of the request.

Required authorization

  • Cluster privileges: read_pipeline

Path parameters

  • id string Required

    The pipeline to test. If you don't specify a pipeline in the request body, this parameter is required.

Query parameters

  • verbose boolean

    If true, the response includes output data for each processor in the executed pipeline.

application/json

Body Required

  • docs array[object] Required

    Sample documents to test in the pipeline.

    Hide docs attributes Show docs attributes object
    • _id string

      Unique identifier for the document. This ID must be unique within the _index.

    • _index string

      Name of the index containing the document.

    • _source object Required

      JSON body for the document.

  • pipeline object Additional properties

    The pipeline to test. If you don't specify the pipeline request path parameter, this parameter is required. If you specify both this and the request path parameter, the API only uses the request path parameter.

    Hide pipeline attributes Show pipeline attributes object
    • description string

      Description of the ingest pipeline.

    • on_failure array[object]

      Processors to run immediately after a processor failure.

      Hide on_failure attributes Show on_failure attributes object
      • append object

        Appends one or more values to an existing array if the field already exists and it is an array. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Creates an array containing the provided values if the field doesn’t exist. Accepts a single value or an array of values.

      • attachment object

        The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika.

      • bytes object

        Converts a human readable byte value (for example 1kb) to its value in bytes (for example 1024). If the field is an array of strings, all members of the array will be converted. Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 263.

      • cef object

        Converts a CEF message into a structured format.

      • circle object

        Converts circle definitions of shapes to regular polygons which approximate them.

      • community_id object

        Computes the Community ID for network flow data as defined in the Community ID Specification. You can use a community ID to correlate network events related to a single flow.

      • convert object

        Converts a field in the currently ingested document to a different type, such as converting a string to an integer. If the field value is an array, all members will be converted.

      • csv object

        Extracts fields from CSV line out of a single text field within a document. Any empty field in CSV will be skipped.

      • date object

        Parses dates from fields, and then uses the date or timestamp as the timestamp for the document.

      • date_index_name object

        The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.

      • dissect object

        Extracts structured fields out of a single text field by matching the text field against a delimiter-based pattern.

      • dot_expander object

        Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor.

      • drop object

        Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.

      • enrich object

        The enrich processor can enrich documents with data from another index.

      • fail object

        Raises an exception. This is useful for when you expect a pipeline to fail and want to relay a specific message to the requester.

      • fingerprint object

        Computes a hash of the document’s content. You can use this hash for content fingerprinting.

      • foreach object

        Runs an ingest processor on each element of an array or object.

      • ip_location object

        Currently an undocumented alias for GeoIP Processor.

      • geo_grid object

        Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape. This is useful if there is a need to interact with the tile shapes as spatially indexable fields.

      • geoip object

        The geoip processor adds information about the geographical location of an IPv4 or IPv6 address.

      • grok object

        Extracts structured fields out of a single text field within a document. You choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.

      • gsub object

        Converts a string field by applying a regular expression and a replacement. If the field is an array of string, all members of the array will be converted. If any non-string values are encountered, the processor will throw an exception.

      • html_strip object

        Removes HTML tags from the field. If the field is an array of strings, HTML tags will be removed from all members of the array.

      • inference object

        Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.

      • join object

        Joins each element of an array into a single string using a separator character between each element. Throws an error when the field is not an array.

      • json object

        Parses a string containing JSON data into a structured object, string, or other value.

      • kv object

        This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.

      • lowercase object

        Converts a string to its lowercase equivalent. If the field is an array of strings, all members of the array will be converted.

      • network_direction object

        Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.

      • pipeline object

        Executes another pipeline.

      • redact object

        The Redact processor uses the Grok rules engine to obscure text in the input document matching the given Grok patterns. The processor can be used to obscure Personal Identifying Information (PII) by configuring it to detect known patterns such as email or IP addresses. Text that matches a Grok pattern is replaced with a configurable string such as <EMAIL> where an email address is matched or simply replace all matches with the text <REDACTED> if preferred.

      • registered_domain object

        Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN). Uses the registered domains defined in the Mozilla Public Suffix List.

      • remove object

        Removes existing fields. If one field doesn’t exist, an exception will be thrown.

      • rename object

        Renames an existing field. If the field doesn’t exist or the new name is already used, an exception will be thrown.

      • reroute object

        Routes a document to another target index or data stream. When setting the destination option, the target is explicitly specified and the dataset and namespace options can’t be set. When the destination option is not set, this processor is in a data stream mode. Note that in this mode, the reroute processor can only be used on data streams that follow the data stream naming scheme.

      • script object

        Runs an inline or stored script on incoming documents. The script runs in the ingest context.

      • set object

        Adds a field with the specified value. If the field already exists, its value will be replaced with the provided one.

      • set_security_user object

        Sets user-related details (such as username, roles, email, full_name, metadata, api_key, realm and authentication_type) from the current authenticated user to the current document by pre-processing the ingest.

      • sort object

        Sorts the elements of an array ascending or descending. Homogeneous arrays of numbers will be sorted numerically, while arrays of strings or heterogeneous arrays of strings + numbers will be sorted lexicographically. Throws an error when the field is not an array.

      • split object

        Splits a field into an array using a separator character. Only works on string fields.

      • terminate object

        Terminates the current ingest pipeline, causing no further processors to be run. This will normally be executed conditionally, using the if option.

      • trim object

        Trims whitespace from a field. If the field is an array of strings, all members of the array will be trimmed. This only works on leading and trailing whitespace.

      • uppercase object

        Converts a string to its uppercase equivalent. If the field is an array of strings, all members of the array will be converted.

      • urldecode object

        URL-decodes a string. If the field is an array of strings, all members of the array will be decoded.

      • uri_parts object

        Parses a Uniform Resource Identifier (URI) string and extracts its components as an object. This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.

      • user_agent object

        The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.

    • processors array[object]

      Processors used to perform transformations on documents before indexing. Processors run sequentially in the order specified.

      Hide processors attributes Show processors attributes object
      • append object

        Appends one or more values to an existing array if the field already exists and it is an array. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Creates an array containing the provided values if the field doesn’t exist. Accepts a single value or an array of values.

      • attachment object

        The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library Tika.

      • bytes object

        Converts a human readable byte value (for example 1kb) to its value in bytes (for example 1024). If the field is an array of strings, all members of the array will be converted. Supported human readable units are "b", "kb", "mb", "gb", "tb", "pb" case insensitive. An error will occur if the field is not a supported format or resultant value exceeds 263.

      • cef object

        Converts a CEF message into a structured format.

      • circle object

        Converts circle definitions of shapes to regular polygons which approximate them.

      • community_id object

        Computes the Community ID for network flow data as defined in the Community ID Specification. You can use a community ID to correlate network events related to a single flow.

      • convert object

        Converts a field in the currently ingested document to a different type, such as converting a string to an integer. If the field value is an array, all members will be converted.

      • csv object

        Extracts fields from CSV line out of a single text field within a document. Any empty field in CSV will be skipped.

      • date object

        Parses dates from fields, and then uses the date or timestamp as the timestamp for the document.

      • date_index_name object

        The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.

      • dissect object

        Extracts structured fields out of a single text field by matching the text field against a delimiter-based pattern.

      • dot_expander object

        Expands a field with dots into an object field. This processor allows fields with dots in the name to be accessible by other processors in the pipeline. Otherwise these fields can’t be accessed by any processor.

      • drop object

        Drops the document without raising any errors. This is useful to prevent the document from getting indexed based on some condition.

      • enrich object

        The enrich processor can enrich documents with data from another index.

      • fail object

        Raises an exception. This is useful for when you expect a pipeline to fail and want to relay a specific message to the requester.

      • fingerprint object

        Computes a hash of the document’s content. You can use this hash for content fingerprinting.

      • foreach object

        Runs an ingest processor on each element of an array or object.

      • ip_location object

        Currently an undocumented alias for GeoIP Processor.

      • geo_grid object

        Converts geo-grid definitions of grid tiles or cells to regular bounding boxes or polygons which describe their shape. This is useful if there is a need to interact with the tile shapes as spatially indexable fields.

      • geoip object

        The geoip processor adds information about the geographical location of an IPv4 or IPv6 address.

      • grok object

        Extracts structured fields out of a single text field within a document. You choose which field to extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular expression that supports aliased expressions that can be reused.

      • gsub object

        Converts a string field by applying a regular expression and a replacement. If the field is an array of string, all members of the array will be converted. If any non-string values are encountered, the processor will throw an exception.

      • html_strip object

        Removes HTML tags from the field. If the field is an array of strings, HTML tags will be removed from all members of the array.

      • inference object

        Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.

      • join object

        Joins each element of an array into a single string using a separator character between each element. Throws an error when the field is not an array.

      • json object

        Parses a string containing JSON data into a structured object, string, or other value.

      • kv object

        This processor helps automatically parse messages (or specific event fields) which are of the foo=bar variety.

      • lowercase object

        Converts a string to its lowercase equivalent. If the field is an array of strings, all members of the array will be converted.

      • network_direction object

        Calculates the network direction given a source IP address, destination IP address, and a list of internal networks.

      • pipeline object

        Executes another pipeline.

      • redact object

        The Redact processor uses the Grok rules engine to obscure text in the input document matching the given Grok patterns. The processor can be used to obscure Personal Identifying Information (PII) by configuring it to detect known patterns such as email or IP addresses. Text that matches a Grok pattern is replaced with a configurable string such as <EMAIL> where an email address is matched or simply replace all matches with the text <REDACTED> if preferred.

      • registered_domain object

        Extracts the registered domain (also known as the effective top-level domain or eTLD), sub-domain, and top-level domain from a fully qualified domain name (FQDN). Uses the registered domains defined in the Mozilla Public Suffix List.

      • remove object

        Removes existing fields. If one field doesn’t exist, an exception will be thrown.

      • rename object

        Renames an existing field. If the field doesn’t exist or the new name is already used, an exception will be thrown.

      • reroute object

        Routes a document to another target index or data stream. When setting the destination option, the target is explicitly specified and the dataset and namespace options can’t be set. When the destination option is not set, this processor is in a data stream mode. Note that in this mode, the reroute processor can only be used on data streams that follow the data stream naming scheme.

      • script object

        Runs an inline or stored script on incoming documents. The script runs in the ingest context.

      • set object

        Adds a field with the specified value. If the field already exists, its value will be replaced with the provided one.

      • set_security_user object

        Sets user-related details (such as username, roles, email, full_name, metadata, api_key, realm and authentication_type) from the current authenticated user to the current document by pre-processing the ingest.

      • sort object

        Sorts the elements of an array ascending or descending. Homogeneous arrays of numbers will be sorted numerically, while arrays of strings or heterogeneous arrays of strings + numbers will be sorted lexicographically. Throws an error when the field is not an array.

      • split object

        Splits a field into an array using a separator character. Only works on string fields.

      • terminate object

        Terminates the current ingest pipeline, causing no further processors to be run. This will normally be executed conditionally, using the if option.

      • trim object

        Trims whitespace from a field. If the field is an array of strings, all members of the array will be trimmed. This only works on leading and trailing whitespace.

      • uppercase object

        Converts a string to its uppercase equivalent. If the field is an array of strings, all members of the array will be converted.

      • urldecode object

        URL-decodes a string. If the field is an array of strings, all members of the array will be decoded.

      • uri_parts object

        Parses a Uniform Resource Identifier (URI) string and extracts its components as an object. This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.

      • user_agent object

        The user_agent processor extracts details from the user agent string a browser sends with its web requests. This processor adds this information by default under the user_agent field.

    • version number

      Version number used by external systems to track ingest pipelines.

    • deprecated boolean

      Marks this ingest pipeline as deprecated. When a deprecated ingest pipeline is referenced as the default or final pipeline when creating or updating a non-deprecated index template, Elasticsearch will emit a deprecation warning.

      Default value is false.

    • _meta object

      Arbitrary metadata about the ingest pipeline. This map is not automatically generated by Elasticsearch.

      Hide _meta attribute Show _meta attribute object
      • * object Additional properties
    • created_date string | number Generally available; Added in 9.2.0

      Date and time when the pipeline was created. Only returned if the human query parameter is true.

      One of:

      Time unit for milliseconds

    • created_date_millis number Generally available; Added in 9.2.0

      Date and time when the pipeline was created, in milliseconds since the epoch.

    • modified_date string | number Generally available; Added in 9.2.0

      Date and time when the pipeline was last modified. Only returned if the human query parameter is true.

      One of:

      Time unit for milliseconds

    • modified_date_millis number Generally available; Added in 9.2.0

      Date and time when the pipeline was last modified, in milliseconds since the epoch.

    • field_access_pattern string Generally available; Added in 9.2.0

      Controls how processors in this pipeline should read and write data on a document's source.

      Values are classic or flexible.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • docs array[object] Required
      Hide docs attributes Show docs attributes object
      • doc object

        The simulated document, with optional metadata.

        Hide doc attributes Show doc attributes object
        • _id string Required

          Unique identifier for the document. This ID must be unique within the _index.

        • _index string Required

          Name of the index containing the document.

        • _ingest object Required
        • _routing string

          Value used to send the document to a specific primary shard.

        • _source object Required

          JSON body for the document.

          Hide _source attribute Show _source attribute object
          • * object Additional properties
        • _version
        • _version_type string

          Supported values include:

          • internal: Use internal versioning that starts at 1 and increments with each update or delete.
          • external: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.
          • external_gte: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.

          Values are internal, external, or external_gte.

      • error object

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Hide error attributes Show error attributes object
        • type string Required

          The type of error

        • reason string | null

          A human-readable explanation of the error, in English.

        • stack_trace string

          The server stack trace. Present only if the error_trace=true parameter was sent with the request.

        • caused_by object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • root_cause array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • suppressed array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

      • processor_results array[object]
        Hide processor_results attributes Show processor_results attributes object
        • doc object

          The simulated document, with optional metadata.

        • tag string
        • processor_type string
        • status string

          Values are success, error, error_ignored, skipped, or dropped.

        • description string
        • ignored_error object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • error object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

POST /_ingest/pipeline/{id}/_simulate
POST /_ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "set" : {
          "field" : "field2",
          "value" : "_value"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar"
      }
    },
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "rab"
      }
    }
  ]
}
resp = client.ingest.simulate(
    pipeline={
        "description": "_description",
        "processors": [
            {
                "set": {
                    "field": "field2",
                    "value": "_value"
                }
            }
        ]
    },
    docs=[
        {
            "_index": "index",
            "_id": "id",
            "_source": {
                "foo": "bar"
            }
        },
        {
            "_index": "index",
            "_id": "id",
            "_source": {
                "foo": "rab"
            }
        }
    ],
)
const response = await client.ingest.simulate({
  pipeline: {
    description: "_description",
    processors: [
      {
        set: {
          field: "field2",
          value: "_value",
        },
      },
    ],
  },
  docs: [
    {
      _index: "index",
      _id: "id",
      _source: {
        foo: "bar",
      },
    },
    {
      _index: "index",
      _id: "id",
      _source: {
        foo: "rab",
      },
    },
  ],
});
response = client.ingest.simulate(
  body: {
    "pipeline": {
      "description": "_description",
      "processors": [
        {
          "set": {
            "field": "field2",
            "value": "_value"
          }
        }
      ]
    },
    "docs": [
      {
        "_index": "index",
        "_id": "id",
        "_source": {
          "foo": "bar"
        }
      },
      {
        "_index": "index",
        "_id": "id",
        "_source": {
          "foo": "rab"
        }
      }
    ]
  }
)
$resp = $client->ingest()->simulate([
    "body" => [
        "pipeline" => [
            "description" => "_description",
            "processors" => array(
                [
                    "set" => [
                        "field" => "field2",
                        "value" => "_value",
                    ],
                ],
            ),
        ],
        "docs" => array(
            [
                "_index" => "index",
                "_id" => "id",
                "_source" => [
                    "foo" => "bar",
                ],
            ],
            [
                "_index" => "index",
                "_id" => "id",
                "_source" => [
                    "foo" => "rab",
                ],
            ],
        ),
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"pipeline":{"description":"_description","processors":[{"set":{"field":"field2","value":"_value"}}]},"docs":[{"_index":"index","_id":"id","_source":{"foo":"bar"}},{"_index":"index","_id":"id","_source":{"foo":"rab"}}]}' "$ELASTICSEARCH_URL/_ingest/pipeline/_simulate"
client.ingest().simulate(s -> s
    .docs(List.of(Document.of(d -> d
            .id("id")
            .index("index")
            .source(JsonData.fromJson("""
{"foo":"bar"}
"""))
        ),Document.of(d -> d
            .id("id")
            .index("index")
            .source(JsonData.fromJson("""
{"foo":"rab"}
"""))
        )))
    .pipeline(p -> p
        .description("_description")
        .onFailure(List.of())
        .processors(pr -> pr
            .set(se -> se
                .field("field2")
                .value(JsonData.fromJson("\"_value\""))
                .onFailure(List.of())
            )
        )
        .meta(Map.of())
    )
);
Request example
You can specify the used pipeline either in the request body or as a path parameter.
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "set" : {
          "field" : "field2",
          "value" : "_value"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar"
      }
    },
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "rab"
      }
    }
  ]
}
Response examples (200)
A successful response for running an ingest pipeline against a set of provided documents.
{
   "docs": [
      {
         "doc": {
            "_id": "id",
            "_index": "index",
            "_version": "-3",
            "_source": {
               "field2": "_value",
               "foo": "bar"
            },
            "_ingest": {
               "timestamp": "2017-05-04T22:30:03.187Z"
            }
         }
      },
      {
         "doc": {
            "_id": "id",
            "_index": "index",
            "_version": "-3",
            "_source": {
               "field2": "_value",
               "foo": "rab"
            },
            "_ingest": {
               "timestamp": "2017-05-04T22:30:03.188Z"
            }
         }
      }
   ]
}