Brewing in Beats: New community Beats, custom fields, performance improvements

Welcome to Weekly Beats! With this series, we're keeping you up to date with all that's new in Beats, from the details of work in progress pull requests to releases and learning resources.

Lots of positive energy in the team after Elastic{ON}, thank you everyone that attended and provided us with feedback.

Custom fields for all Beats

A common request was to add Filebeat-like custom fields to all other Beats. The tags we had in the common section of the configuration files just weren’t enough. So now all Beats have fields and tags in the common section and it is also easy to add them per Beat-specific element (e.g. prospector in Filebeat, protocol in Packetbeat, etc.).

New Community Beats

More goodies from the community:

  • Flowbeat: For collecting sflow data.
  • Udpbeat: Receives structured logs over UDP and sends them to Elasticsearch. The author wrote a blog post about how to use it to get trace data for Go errors.
  • Batterybeat: Colin wrote a Beat to poll information about his Mac battery.
  • Twitterbeat: Polls the tweets of a preconfigured list of screen names and inserts them into Elasticsearch.

Better configuration file handling

Up to now we were using a generic YAML parser for reading our configuration files. This is very clean and convenient from the code perspective, but there are several things about it that annoyed us:

  • Because YAML is context sensitive, it’s difficult to delegate the parsing to another function. This makes it hard to have our modules self contained from the configuration point of view.
  • YAML is very permissive, with almost any file being valid. Because we have no visibility into the unmatched values, it’s hard for us to provide good error messages in case of configuration issues.
  • Whitespace is just problematic when it comes to configuration files. People copy paste configuration examples, and it’s enough to forget one blank character and nothing works. Add to that the point above about us not being able to provide good error message, and you can guess what’s our number one source of Discuss and Github tickets.

Those were the bad news. The good news is that Steffen is working on fixing them, and not only for the Beats but for other Go programs that use YAML as well. He is writing ucfg (universal configuration) that adds a layer above the YAML parser (or any other parser) that will eventually be able to do configuration validation, accepts dots in field names (so we don’t have to always rely on white space) and make the configuration definitions pluggable.

This is already used for making the output modules in libbeat as well as the  Packetbeat modules truly self contained.

Topbeat performance improvement

When monitoring a large number of processes, the cost of getting their command line on each poll was significant, especially on Windows. This PR fixes it by caching the command line strings. Telling benchmarks for Windows and OS X are in the PR description.

Improved shutdown logic in libbeat

Shutting down is never easy when dealing with lots of channels. McStork, a community contributor, stepped up and helped improving the logic we have for shutting down the publisher.

Metricbeat progress

Nicolas is making good progress on getting the Metricbeat infrastructure ready before adding more modules. This includes a reworking of the configuration handling and getting more metricsets in the existing modules, testing if Topbeat could be included in Metricbeat and starting the contributor guide to new Metricbeat modules.

Filebeat fix for duplicates on restarting

We discovered a bug that could cause Filebeat to re-read a complete file if it was restarted at the wrong time. It is now fixed and we’re preparing a 1.1.2 release for next week to include this.

Other notable merges since the last update

And from the newly opened / in progress / discussion PR