Pruning incoming log volumes with Elastic

blog-thumb-elastic-on-elastic.png

To log or not to log? has always been a difficult question that software engineers still struggle with, to the detriment of site reliability engineering, or SRE, colleagues. Developers don't always get the level or context of the warnings and errors they capture in applications right and often log messages that may not always be helpful for SREs. I can admit to being one of those developers! This confusion often leads to a flood of events being ingested into logging platforms, making application monitoring and issue investigation for SREs feel a bit like this:

![I Love Lucy Conveyor Belt Gif](./images/1.gif)

Source: GIPHY

When looking to reduce your log volume, it is possible to drop information on two dimensions: fields within an event, or the entire event itself. Removing dimensions of interest ensures we can focus on known events of interest and unknown events that may be of interest.

![Log event versus field](./images/event-vs-field.jpg)

In this blog, we will discuss various approaches for dropping known irrelevant events and fields from logs via various collectors. Specifically we will focus on Beats, Logstash, Elastic Agent, Ingest Pipelines, and filtering with OpenTelemetry Collectors.

Beats

Beats are a family of lightweight shippers that allows for the forwarding of events from a particular source. They are commonly used to ingest the events from a source into not just Elasticsearch, but also other outputs such as Logstash, Kafka, or Redis as shown in the Filebeat documentation. There are six types of Beat available, which are summarized here.

Our example will focus on Filebeat specifically, but both drop processors discussed here apply to all Beats. After following the quick start guide within the Filebeat documentation, you will have a running process using configuration file filebeat.yml dictating which log files you are monitoring with Filebeat from any of the supported input types. Hopefully your configuration specifies a series of inputs in a format similar to the below:

filebeat.inputs:
- type: filestream
  id: my-logging-app
  paths:
    - /var/log/*.log

Filebeat has many options available to configure, of which a full listing is given in the filebeat.reference.yml in the documentation. However, it is the drop_event and drop_fields processors in particular that can help us exclude unhelpful messages and isolate only the relevant fields in a given event respectively.   When using the drop_event processor, you need to make sure at least one condition is present to receive the messages you want; otherwise, if no condition is specified, the processor will drop all messages. If no condition is specified in the drop_event processor, all events will be dropped. Please ensure at least one condition is present to ensure you receive the messages you want. For example, if we are not interested in HTTP requests against the /profile endpoint, we can amend the configuration to use the following condition:

filebeat.inputs:
- type: filestream
  id: my-logging-app
  paths:
    - /var/tmp/other.log
    - /var/log/*.log
processors:
  - drop_event:
      when:
          and:
            - equals:
              url.scheme: http
            - equals:
              url.path: /profile

Meanwhile, the drop_fields processor will drop the specified fields, except for the @timestamp and type fields, if the specified condition is fulfilled. Similar to the drop_event processor, if the condition is missing then the fields will always be dropped. If we wanted to exclude the error message field for successful HTTP requests, we could configure a processor similar to the below:

filebeat.inputs:
- type: filestream
  id: my-logging-app
  paths:
    - /var/tmp/other.log
    - /var/log/*.log
processors:
  - drop_fields:
      when:
          and:
            - equals:
              url.scheme: http
            - equals:
              http.response.status_code: 200
          fields: ["event.message"]
          ignore_missing: false

When dropping fields, there is always a possibility that the field might not exist on a given log message. If the field does not exist in all events being processed in Filebeat, an error will be raised if ignore_missing is specified as true rather than the default value of false.

Logstash filtering

Logstash is a free and open data processing pipeline tool that allows you to ingest, transform, and output data between a myriad of sources. It sits within the Extract, Transform, and Load (or ETL) domain. With the prior discussion of Beats, it's important to note that usage of Logstash over Beats would be recommended if you want to centralize the transformation logic. Meanwhile, Beats or Elastic Agent allow dropping events early, which can reduce network traffic requirements early. Logstash provides a variety of transformation plugins out of the box that can be used to format and transform events from any source that Logstash is connected to. A typical pipeline within the Logstash configuration file logstash.yml contains three main sections:

  1. input denotes the source of data to the pipeline.
  2. filter contains the relevant data transformation logic.
  3. The target for the transformed data is configured in the output attribute. To prevent events from making it to the output, the drop filter plugin will drop any events that meet the stated condition. A typical example of reading from an input file, dropping INFO level events, and outputting to Elasticsearch is as follows:
input {
  file {
    id => "my-logging-app"
    path => [ "/var/tmp/other.log", "/var/log/*.log" ]
  }
}
filter {
  if [url.scheme] == "http" && [url.path] == "/profile" {
    drop {
      percentage => 80
    }
  }
}
output {
  elasticsearch {
        hosts => "https://my-elasticsearch:9200"
        data_stream => "true"
    }
}

One of the lesser-known options of this filter is the ability to configure a drop rate using the percentage option. One of the scary things about filtering out log events is the fear that you will inadvertently drop unknown but relevant entries that could be useful in an outage situation. Or, there is the possibility that your software sends a large volume of messages that could flood your instance, take up vital hot storage, and increase your costs. The percentage attribute covers this case by allowing a subset of the events to be ingested into Elasticsearch, which can address these challenges. In our example above, we ingest 20% of messages matching the criteria to Elasticsearch. Similar to the drop_fields processor found in Beats, Logstash has a remove_field filter for removing individual fields. Although these can be used within many Logstash plugins, they are commonly used within the mutate plugin to transform events, similar to the below:

# Input configuration omitted
filter {
  if [url.scheme] == "http" && [http.response.status_code] == 200 {
    drop {
      percentage => 80
    }
    mutate {
      remove_field: [ "event.message" ]
    }
  }
}
# Output configuration omitted

Just like our Beats example, this will remove event.message from the events that are retained from the drop filter.

Agent

Elastic Agent is a single agent for logs, metrics, and security data that can execute on your host and send events from multiple services and infrastructure to Elasticsearch. Similar to Beats, you can use the drop_event and drop_fields processors in any integrations that support processors. For standalone installations, you should specify the processors within your elastic-agent.yml config. When using Fleet, the processing transforms are normally specified when configuring the integration under the Advanced options pop-out section, as shown below:

![Elastic Agent Kafka Integration Sample Processor](./images/elastic-agent-kafka-processor.png)

Comparing the above example with our Beats example, you'll notice they are using the same YAML-based format for both processors. There are some limitations to be aware of when using Elastic Agent processors, which are covered in the Fleet documentation. If you are unsure if processing data via Elastic Agent processors is the right thing for your use case, check out this handy matrix.

Ingest pipelines

The Elastic Agent processors discussed in the previous section will process raw event data, meaning they execute before ingest pipelines. As a result, when using both approaches, proceed with caution as removing or altering fields expected by an ingest pipeline can cause the pipeline to break. As covered in the Create and manage pipeline documentation, new pipelines can be created either within the Stack Management > Ingest Pipelines screen or via the _ingest API, which we will use. Just like the other tools covered in this piece, the drop processor will allow for any event that meets the required condition to be dropped. It's also the case that if no condition is specified, all events coming through will be dropped. What is different is that the conditional logic is written using Painless, a Java-like scripting language, rather than the YAML syntax we have used previously:

PUT _ingest/pipeline/my-logging-app-pipeline
{
  "description": "Event and field dropping for my-logging-app",
  "processors": [
    {
      "drop": {
        "description" : "Drop event",
        "if": "ctx?.url?.scheme == 'http' && ctx?.url?.path == '/profile'",
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "description" : "Drop field",
        "field" : "event.message",
        "if": "ctx?.url?.scheme == 'http' && ctx?.http?.response?.status_code == 200",
        "ignore_failure": false
      }
    }
  ]
}

The ctx variable is a map representation of the fields within the document coming through the pipeline, meaning our example will compare the values of the url.scheme and http.response.status_code fields. JavaScript developers will recognize the ? denoted null safe operator, which performs not null checks against the field access. As visible in the second processor in the above example, the use of Painless conditional logic is also relevant to the remove processor. This processor will drop the specified fields from the event when they match the specified condition. One of the capabilities that give ingest processors an edge over the other approaches is the ability to specify failure processors, either on the pipeline or a specified processor. Although Beat's does have an ignore_missing option as discussed previously, Ingest Processors allow us to add exception handling such as adding error messages to give details of the processor exception:

PUT _ingest/pipeline/my-logging-app-pipeline
{
  "description": "Event and field dropping for my-logging-app with failures",
  "processors": [
    {
      "drop": {
        "description" : "Drop event",
        "if": "ctx?.url?.scheme == 'http' && ctx?.url?.path == '/profile'",
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "description" : "Drop field",
        "field" : "event.message",
        "if": "ctx?.url?.scheme == 'http' && ctx?.http?.response?.status_code == 200",
        "ignore_failure": false
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "description": "Set 'ingest.failure.message'",
        "field": "ingest.failure.message",
        "value": "Ingestion issue"
        }
      }
  ]
}

The pipeline can then be used on a single indexing request, set as the default pipeline for an index, or even used alongside Beats and Elastic Agent.

OpenTelemetry collectors

OpenTelemetry, or OTel, is an open standard that provides APIs, tooling, and integrations to enable the capture of telemetry data such as logs, metrics, and traces from applications. Application developers commonly use the OpenTelemetry agent for their programming language of choice to send trace data and metrics directly to the Elastic Stack, as Elastic supports the OpenTelemetry protocol (OTLP). In some cases, having every application send behavioral information directly to the observability platform may be unwise. Large enterprise ecosystems may have centralized observability capabilities or may run large microservice ecosystems where adopting a standard tracing practice can be difficult. However, the sanitization of events and traces is also challenging as the number of applications and services grows. These are the situations where using one or more collectors as a router of data to the Elastic Stack makes sense. As demonstrated in the OpenTelemetry documentation and the example collector in the Elastic APM documentation, the basic configuration for an OTel collector has four main sections:

  1. receivers that define the sources of data, which can be push or pull-based.
  2. processors that can filter or transform the received data before export, which is what we are interested in doing.
  3. exporters which define how the data is sent to the final destination, in this case Elastic!
  4. A service section to define the components enabled in the collector that is needed for the other elements. Dropping events and fields can be achieved using the filter and attributes processors, respectively, in the collector config. A selection of examples of both filters is shown below:
receivers: 
  filelog:
    include: [ /var/tmp/other.log, /var/log/*.log ]
processors: 
  filter/denylist:
    error_mode: ignore
    logs:
      log_record:
        - 'url.scheme == "info"'
        - 'url.path == "/profile"'
        - 'http.response.status_code == 200'
  attributes/errors:
    actions:
      - key: error.message
        action: delete
  memory_limiter:
    check_interval: 1s
    limit_mib: 2000
  batch:
exporters:
  # Exporters configuration omitted 
service:
  pipelines:
    # Pipelines configuration omitted

The filter processor applied to the telemetry type (logs, metrics, or traces) will drop the event if it matches any of the specified conditions. Meanwhile, the attributes processor applied to our error fields will delete the error.message attribute from all events. The pattern attribute can also be used in place of the key option to remove fields matching a specified regular expression. Field deletion based on conditions as we have done in our Beats, Logstash, and ingest pipeline examples is not part of the specification. However, an alternative would be to use the transform processor to specify a complex transformation to set the value of a field and then delete.

Conclusions

The aim of the DevOps movement is to align the processes and practices of software engineering and SRE. That includes working together to ensure that relevant logs, metrics, and traces are sent from applications to our Observability platform. As we have seen first-hand with Beats, Logstash, Elastic Agent, Ingest Pipelines, and OTel collectors, the approaches for dropping events and individual fields vary according to the tool used. You may be wondering, which option is right for you?

  1. If the overhead of sending large messages over the network is a concern, transforming closer to the source using Beats or Logstash is the better option. If you're looking to minimize the system resources used in your collection and transformation, Beats may be preferred over Logstash as they have a small footprint.
  2. For centralizing transformation logic to apply to many application logs, using processors in an OTel collector may be the right approach.
  3. If you want to make use of centrally managed ingestion and transformation policies with popular services and systems such as Kafka, Nginx, or AWS, using Elastic Agent with Fleet is recommended.
  4. Ingest pipelines are great for transforming events at ingestion if you are less concerned about network overhead and would like your logic centralized within Elasticsearch. Although not covered here, other techniques such as runtime fields, index level compression, and _synthetic_source usage can also be used to reduce disk storage requirements and CPU overhead. If your favorite way to drop events or fields is not listed here, do let us know!

Resources

  1. Elastic Beats
  2. Filebeat | Filter and enhance data with processors
  3. Logstash
  4. Logstash | Filter plugins
  5. Elastic Agent
  6. Elastic Agent | Processors
  7. Ingest Pipelines
  8. Elaticsearch | Ingest processor reference
  9. OTel Collectors
  10. OTel Transforming telemetry