Tech Topics

Logstash 1.5.0.Beta1 Released

We are pleased to announce the release of Logstash 1.5.0 Beta 1! You can download it here.

Note: This is a Beta release – please do not use in production.

What's in it?

The main themes of 1.5.0 are plugin management, performance improvements, and Apache Kafka integration. One of the core strengths of Logstash is the availability of plugins and ease of adding them to your pipeline to extend behavior. With this release we are making it easier to develop, manage, and publish plugins. We also made Logstash faster, so you can process more data in less time. Intrigued? Let's dive into the details.

Plugin Ecosystem Changes

Logstash has a rich collection of over 165 plugins (inputs, filters, outputs, and codecs) which are both developed by Elasticsearch and contributed to by the community. Managing that many plugins poses unique trade-offs between usability and agility. On one hand, bundling all the plugins with Logstash makes it easier to use, but forces the community to wait for new releases of Logstash to incorporate plugin updates. On the other hand, separating the distribution of plugins from Logstash makes updates easier, but impacts ease of use (especially for new users).

As we move the project forward, we are trying to balance both of these aspects. Previously, we divided all available plugins into ‘core’ and ‘contrib’ packages. The most commonly used plugins in ‘core’ were shipped by default with Logstash. The community-contributed plugins shipped separately as the 'contrib' package. With the 1.5.0 release, we are taking a step closer to making plugin management even better for our users. We have moved all the plugins into their own self-contained packages. Using rubygems as a packaging framework, we're publishing and distributing these plugins via rubygems.org. We have also added infrastructure to easily install, update, and remove plugins on top of Logstash.

For example, to install the S3 output plugin, you would have to:

<code>$LS_HOME/bin/plugin install logstash-output-s3

and that's it! Logstash will download the gem and its dependencies from rubygems.org and install them so you can start sending your data to S3 buckets.

Downloadable Logstash releases will still include lots of plugins out of the box, but now you can upgrade individual plugins and install new ones, at any time! Watch out for a more detailed blog post on our plugin ecosystem changes soon.

Performance Improvements

Logstash 1.5.0 is much faster. Let's highlight two areas where performance has gone way up.

Grok Filter

Grok filter is used to describe patterns for extracting structured data in Logstash. In this release, we have increased the throughput of the popular grok filter in some patterns by 100%. Put another way, you can process more data through Logstash when using the grok filter.

In our benchmark testing, we compared throughput in 1.5.0 and 1.4.2 by processing 6.9 million entries of Apache Web access log lines using the COMBINEDAPACHELOG grok pattern. Throughput in 1.5.0 increased from 34K events per second (eps) to 50K eps. Both tests were run on an eight-core machine with eight worker threads in Logstash. These tests we run with a single grok filter and measured throughput of events processed in the pipeline using a stdin input and stdout output. Please note that overall performance will vary with hardware and Logstash configuration used.

JSON Serialization/Deserialization

We implemented JSON serialization and deserialization using the JrJackson library, which improved the throughput by over 100%. In our previously mentioned performance tests, we sent 500,000 JSON events 1.3KB in size and measured a throughput increase from 16K eps to 30K eps. With events 45KB in size, throughput increased from 850 eps to 3.5K eps. Yeah, we thought that was pretty good, too.

Apache Kafka Integration

Today, Apache Kafka is ubiquitous in large-scale data processing systems. In scaling Logstash deployments, Kafka can also be used as an intermediate message buffer to store data between the shipping instances and indexing instances.

In 1.5.0, we have added built-in support for the Logstash Kafka input and output plugin that was originally developed by Joseph Lawson. We have enhanced these plugins by adding integration tests and documentation, and we will continue to develop new Kafka features. We have also added an Apache Avro codec so you can easily consume events stored in Kafka, enrich them, and analyze them using the ELK stack.

Adding Kafka input is as simple as:

$LS_HOME/bin/plugin install logstash-input-kafka

Kafka output:

$LS_HOME/bin/plugin install logstash-output-kafka

Improved Security

We have improved the security of the Elasticsearch output, input, and filter by adding authentication and transport encryption support. For instance, with the HTTP protocol you can configure SSL/TLS to enable encryption and HTTP basic authentication to provide a username and password while making requests. These capabilities will enable Logstash to natively integrate with the forthcoming Elasticsearch Shield security product.

Documentation

Previously, Logstash documentation was hosted on logstash.net, which made it cumbersome to find information when working with the rest of the ELK stack. We are now moving documentation for 1.5.0 and all future releases to the elasticsearch.org website under the Logstash Guide. With this migration, elasticsearch.org/guide becomes the single location for all reference documentation to learn and use the ELK stack. As we iterate on this beta release, we are actively working on improving the presentation and documentation quality. (Note: All of the old Logstash documentation will continue to be available in the current logstash.net location.)

Bug Fixes and Enhancements

In addition to all these new features, Logstash 1.5.0 has fixed a number of bugs and enhanced the functionality of many features. We would like to highlight few of them here:

  • Allow storing ‘metadata’ to an event which is not sent/encoded on output. This eliminates the need for intermediate fields for example, while using date filter.(#1834, #LOGSTASH-1798)
  • Fixed file descriptor leaks when using HTTP. The fix prevents Logstash from stalling, and in some cases crashing from out-of-memory errors (#1604)
  • Twitter input: added improvements, robustness, fixes. full_tweet option now works, and we handle Twitter rate limiting errors (#1471)
  • Filters that generated events (multiline, clone, split, metrics) now propagate those events correctly to future conditionals (#1431)
  • Elasticsearch output: Logstash will not create a message.raw field by default now. Message field is not_analyzed by Elasticsearch and adding a multi-field was essentially doubling the disk space required, with no benefit
  • Remove ability to run multiple subcommands from bin/logstash like bin/logstash agent -f sample.conf -- web (#1797)

You can read about the features, enhancements, and bug fixes in this release in the Logstash 1.5.0.Beta1 changelog

Give it a spin!

Please download Logstash 1.5.0 Beta 1, try it out, and let us know what you think on Twitter (@elasticsearch). You can also report any problems on the Github issues page.