11 May 2016

Brewing in Beats: Metricbeat progress

By Tudor Golubenco

Welcome to Brewing in Beats! With this series, we're keeping you up to date with all that's new in Beats, from the details of work in progress pull requests to releases and learning resources.

Last week we released 5.0.0-alpha2, with several important bug fixes and new features. Apart from that, here is what we are busy with:

Per process stats in Metricbeat

We had a lot of back and forth on this PR, debating how we should best represent in Elasticsearch the metrics that are per “dynamic entity”. A good example of such entities are the processes in an operating system, but can also be mounted file systems or tables in an SQL database. After considering nested objects or creating dynamic fields, we went for the simpler solution of having individual objects for each entity. These objects are grouped in their own metricset.

Metricbeat API refactoring

In order to have a clean and stable API before adding more modules, Andrew did a lot of refactoring and documenting work on the Metricbeat modules APIs.

Configuration migration tool

Between the 1.x and 5.0 there are a few configuration file changes that break compatibility. We now have a python script that should help with migrating existing configuration files to 5.0 format. The script doesn’t do real YAML parsing, meaning that it won’t work on any possible configuration file, but it should do the job in most cases. It also requires no dependencies besides python itself and preserves the comments.

Zookeeper module in Metricbeat

Zkbeat, created by Erik Redding, was converted into a Metricbeat module. It uses the Zookeeper mntr command to get simple stats. Thanks Erik!

Kafka output deadlock fix

Due to a bug in the Go library we use for outputting to Kafka, a deadlock is possible in case infinite retries are used. In time for alpha2, a set of refactorings were done to workaround this issue. We now avoid asking for infinite retries from the library and instead we simulate it in our code.

Topbeat: fix high CPU usage on windows

We had a couple of interesting reports that indicated that on Windows Topbeat uses more CPU when it is configured to monitor less processes. It turns out that the reason was failing to use the command line cache (getting the full command line is expensive on Windows) for the processes that were filtered out. The fixing PR refactors the logic so that the command line is not read at all if the process is filtered out.

Filebeat refactoring

Heavy cleanup work is going on in the Filebeat code, making the code more readable and easier to maintain.  Previously the file state was loaded from disk every time a new file was found. The state from the registry file is now only loaded once during startup and from the one the prospector internal in memory state is used. This is more efficient and prevents race conditions.

It also makes the Registrar and the Harversters less coupled. This refactoring work is crucial for us to be able to stay on top of possible races due to all the file rotation and file systems variations.

Normalize new line character after multiline

In order for the regular expressions to work in a consistent way on multiline events, this PR makes sure \n is used as a line separator after multiline stitching.