Dissect strings

edit

The dissect processor tokenizes incoming strings using defined patterns.

processors:
  - dissect:
      tokenizer: "%{key1} %{key2}"
      field: "message"
      target_prefix: "dissect"

The dissect processor has the following configuration settings:

tokenizer
The field used to define the dissection pattern.
field
(Optional) The event field to tokenize. Default is message.
target_prefix
(Optional) The name of the field where the values will be extracted. When an empty string is defined, the processor will create the keys at the root of the event. Default is dissect. When the target key already exists in the event, the processor won’t replace it and log an error; you need to either drop or rename the key before using dissect.

For tokenization to be successful, all keys must be found and extracted, if one of them cannot be found an error will be logged and no modification is done on the original event.

A key can contain any characters except reserved suffix or prefix modifiers: /,&, +, # and ?.

See Conditions for a list of supported conditions.

Dissect example

edit

For this example, imagine that an application generates the following messages:

"App01 - WebServer is starting"
"App01 - WebServer is up and running"
"App01 - WebServer is scaling 2 pods"
"App02 - Database is will be restarted in 5 minutes"
"App02 - Database is up and running"
"App02 - Database is refreshing tables"

Use the dissect processor to split each message into two fields, for example, service.name and service.status:

processors:
  - dissect:
      tokenizer: '"%{service.name} - %{service.status}"'
      field: "message"
      target_prefix: ""

This configuration produces fields like:

"service": {
  "name": "App01",
  "status": "WebServer is up and running"
},

service.name is an ECS keyword field, which means that you can use it in Elasticsearch for filtering, sorting, and aggregations.

When possible, use ECS-compatible field names. For more information, see the Elastic Common Schema documentation.