Simplifying log data management: Harness the power of flexible routing with Elastic

observability-digital-transformation-1.jpg

In Elasticsearch 8.8, we’re introducing the reroute processor in technical preview that makes it possible to send documents, such as logs, to different data streams, according to flexible routing rules. When using Elastic Observability, this gives you more granular control over your data with regard to retention, permissions, and processing with all the potential benefits of the data stream naming scheme. While optimized for data streams, the reroute processor also works with classic indices. This blog post contains examples on how to use the reroute processor that you can try on your own by executing the snippets in the Kibana dev tools.

Elastic Observability offers a wide range of integrations that help you to monitor your applications and infrastructure. These integrations are added as policies to Elastic agents, which help ingest telemetry into Elastic Observability. Several examples of these integrations include the ability to ingest logs from systems that send a stream of logs from different applications, such as Amazon Kinesis Data Firehose, Kubernetes container logs, and syslog. One challenge is that these multiplexed log streams are sending data to the same Elasticsearch data stream, such as logs-syslog-default. This makes it difficult to create parsing rules in ingest pipelines and dashboards for specific technologies, such as the ones from the Nginx and Apache integrations. That’s because in Elasticsearch, in combination with the data stream naming scheme, the processing and the schema are both encapsulated in a data stream.

The reroute processor helps you tease apart data from a generic data stream and send it to a more specific one. You may use that mechanism to send logs to a data stream that is set up by the Nginx integration, for example, so that the logs are parsed with that integration and you can use the integration’s prebuilt dashboards or create custom ones with the fields, such as the url, the status code, and the response time that the Nginx pipeline has parsed out of the Nginx log message. You can also split out/separate regular Nginx logs and errors with the reroute processor, providing further separation ability and categorization of logs.

routing pipeline

Example use case

To use the reroute processor, first:

  1. Ensure you are on Elasticsearch 8.8

  2. Ensure you have permissions to manage indices and data streams

  3. If you don’t already have an account on Elastic Cloud, sign up for one

Next, you’ll need to set up a data stream and create a custom Elasticsearch ingest pipeline that is called as the default pipeline. Below we go through this step by step for the “mydata” data set that we’ll simulate ingesting container logs into. We start with a basic example and extend it from there.

The following steps should be utilized in the Elastic console, which is found at Management -> Dev tools -> Console. First, we need an an ingest pipeline and a template for the data stream:

PUT _ingest/pipeline/logs-mydata
{
  "description": "Routing for mydata",
  "processors": [
    {
      "reroute": {
      }
    }
  ]
}

This creates an ingest pipeline with an empty reroute processor. To make use of it, we need an index template:

PUT _index_template/logs-mydata
{
  "index_patterns": [
    "logs-mydata-*"
  ],
  "data_stream": {},
  "priority": 200,
  "template": {
    "settings": {
      "index.default_pipeline": "logs-mydata"
    },
    "mappings": {
      "properties": {
        "container.name": {
          "type": "keyword"
        }
      }
    }
  }
}

The above template is applied to all data that is shipped to logs-mydata-*. We have mapped container.name as a keyword, as this is the field we will be using for routing later on. Now, we send a document to the data stream and it will be ingested into logs-mydata-default:

POST logs-mydata-default/_doc
{
  "@timestamp": "2023-05-25T12:26:23+00:00",
  "container": {
    "name": "foo"
  }
}

We can check that it was ingested with the command below, which will show 1 result.

GET logs-mydata-default/_search

Without modifying the routing processor, this already allows us to route documents. As soon as the reroute processor is specified, it will look for data_stream.dataset and data_stream.namespace fields by default and will send documents to the corresponding data stream, according to the data stream naming scheme logs-<dataset>-<namespace>. Let’s try this out:

POST logs-mydata-default/_doc
{
  "@timestamp": "2023-03-30T12:27:23+00:00",
  "container": {
"name": "foo"
  },
  "data_stream": {
    "dataset": "myotherdata"
  }
}

As can be seen with the GET logs-mydata-default/_search command, this document ended up in the logs-myotherdata-default data stream. But instead of using default rules, we want to create our own rules for the field container.name. If the field is container.name = foo, we want to send it to logs-foo-default. For this we modify our routing pipeline:

PUT _ingest/pipeline/logs-mydata
{
  "description": "Routing for mydata",
  "processors": [
    {
      "reroute": {
        "tag": "foo",
        "if" : "ctx.container?.name == 'foo'",
        "dataset": "foo"
      }
    }
  ]
}

Let's test this with a document:

POST logs-mydata-default/_doc
{
  "@timestamp": "2023-05-25T12:26:23+00:00",
  "container": { 
    "name": "foo"
  }
}

While it would be possible to specify a routing rule for each container name, you can also route by the value of a field in the document:

PUT _ingest/pipeline/logs-mydata
{
  "description": "Routing for mydata",
  "processors": [
    {
      "reroute": {
        "tag": "mydata",
        "dataset": [
          "{{container.name}}",
          "mydata"
        ]
      }
    }
  ]
}

In this example, we are using a field reference as a routing rule. If the container.name field exists in the document, it will be routed — otherwise it falls back to mydata. This can be tested with:

POST logs-mydata-default/_doc
{
  "@timestamp": "2023-05-25T12:26:23+00:00",
  "container": { 
    "name": "foo1"
  }
}

POST logs-mydata-default/_doc
{
  "@timestamp": "2023-05-25T12:26:23+00:00",
  "container": { 
    "name": "foo2"
  }
}

This creates the data streams logs-foo1-default and logs-foo2-default.

NOTE: There is currently a limitation in the processor that requires the fields specified in a {{field.reference}} to be in a nested object notation. A dotted field name does not currently work. Also, you’ll get errors when the document contains dotted field names for any data_stream.* field. This limitation will be fixed in 8.8.2 and 8.9.0.

API keys

When using the reroute processor, it is important that the API keys specified have permissions for the source and target indices. For example, if a pattern is used for routing from logs-mydata-default, the API key must have write permissions for `logs-*-*` as data could end up in any of these indices (see example further down).

We’re currently working on extending the API key permissions for our integrations so that they allow for routing by default if you’re running a Fleet-managed Elastic Agent.

If you’re using a standalone Elastic Agent, or any other shipper, you can use this as a template to create your API key:

POST /_security/api_key
{
  "name": "ingest_logs",
  "role_descriptors": {
    "ingest_logs": {
      "cluster": [
        "monitor"
      ],
      "indices": [
        {
          "names": [
            "logs-*-*"
          ],
          "privileges": [
            "auto_configure",
            "create_doc"
          ]
        }
      ]
    }
  }
}

Future plans

In Elasticsearch 8.8, the reroute processor was released in technical preview. The plan is to adopt this in our data sink integrations like syslog, k8s, and others. Elastic will provide default routing rules that just work out of the box, but it will also be possible for users to add their own rules. If you are using our integrations, follow this guide on how to add a custom ingest pipeline.

Try it out!

This blog post has shown some sample use cases for document based routing. Try it out on your data by adjusting the commands for index templates and ingest pipelines to your own data, and get started with Elastic Cloud through a 7-day free trial. Let us know via this feedback form how you’re planning to use the reroute processor and whether you have suggestions for improvement.