If you’ve followed our announcements recently, we have been releasing beta versions for our Elastic products to work with the recent release of Elasticsearch 2.0 beta. Logstash is getting on the beta train too! Read along to find out what’s in this release.
IMPORTANT: This is a beta release and is intended for testing purposes only. There is no guarantee that Logstash 2.0.0-beta1 will be compatible with Logstash 2.0.0 GA.
Why version 2.0?
We'll be starting a new chapter in Logstash history with the forthcoming release of Logstash 2.0! This release is primarily about two things:
- Adhering to better versioned releases.
- Providing out of the box compatibility with Elasticsearch 2.0.
To provide better out of the box experience with Elasticsearch, we realized we needed to make breaking changes, and our current ad-hoc versioning strategy would not work for users. We would like to adhere to a versioning and release strategy that can better inform you, our users, about any breaking changes to the Logstash configuration formats, plugin APIs and other functionality.
Logstash releases follows a three-placed numbering scheme X.Y.Z. where X denotes a major release version which may break compatibility with existing configuration or functionality. Y denotes releases which includes features which are backward compatible. Z denotes releases which includes bug fixes and patches. In compliance with this scheme, today, we are releasing a beta1 version of 2.0.0 which introduces some breaking changes you can read about below.
If you've followed along with our product roadmap please note that the overall path of our Roadmap remains the same, but only the numbers have changed. Our team is still focused to deliver on the themes mentioned in the Roadmap -- resiliency, manageability and performance. We will target new enhancements like persistent queues as minor versioned releases (like 2.1, 2.2) since they'll be backwards compatible.
The 2.0.0-beta1 release, specifically, will have the following changes:
- Compatibility with Elasticsearch 2.0 and 1.0 with a massively refactored and improved Elasticsearch plugin.
- Up to a 3.7x improvement in common weblog parsing operations (GeoIP / User Agent). These have been bottlenecks for a while, and are now vastly improved.
- Kafka Output changes.
Elasticsearch Output now Defaults to HTTP
We've altered the Elasticsearch output to now default to HTTP. For those who want to use the 'node' and 'transport' protocols, support for those is now provided in a separate Elasticsearch java plugin, that must be downloaded separately. We decided to not bundle this functionality with the core Logstash distribution because it adds a good 30MB of size to the download, and creates a weird situation for users wanting to use Elasticsearch 1.x node/transport. While we have no plans to deprecate support for the node and transport protocols we strongly discourage their use for the reasons below:
- Out of the box integration: Defaulting to HTTP allows Logstash to integrate with Elasticsearch out of the box. This provides a seamless first-user experience for users to get their data, enrich it and store and analyze with Elasticsearch.
- Better Operational Experience: HTTP is not tied to specific Elasticsearch versions, the other protocols are. This allows users to upgrade their Elasticsearch cluster without having a dependency on Logstash or any other clients.
- Debuggability: If you misconfigure the node/transport protocols it is very hard for us to provide detailed feedback for why a failure is occurring.
- A leaner Logstash distribution: Shipping node/transport support in the default Logstash package added 30MB to our base install.
- The theoretical performance gains aren't there: There is a negligible speed difference between HTTP and node, typically only ~4%. This gets even smaller once filters are factored in.
We will still be supporting both plugins. We have, in fact, performed a major refactor on both plugins to remove dead code and make more efficient use of internal client objects. So, if you still prefer to use the native protocols, by all means install the new
logstash-output-elasticsearch_java plugin. If you want to use these java plugins with Elasticsearch 1.x cluster be sure to install specific versions in the 1.x plugin range. The 2.x releases of
logstash-output-elasticsearch_java will only work with Elasticsearch 2.0.
Installing java clients for Elasticsearch 1.x:
bin/plugin install --version 1.5.x logstash-output-elasticsearch_java
Installing java clients for 2.x:
bin/plugin install --version 2.0.0.beta5 logstash-output-elasticsearch_java
The Elasticsearch output has a few configuration changes to be compatible with the 2.0 beta1 release of Elasticsearch. Please make sure to read the updated documentation for configuring the HTTP and Java protocols. These configurations are not backward compatible, so you will have to update your existing config files.
How We Benchmarked the new HTTP Elasticsearch Output
As mentioned above, we recently benchmarked the different protocols and found HTTP was only about 3% slower (when using multiple output workers) given a realistic logstash config for parsing apache weblogs. This is a small price to pay for a considerable improvement in operational simplicity. Moving to HTTP also provides much better compatibility across ES version upgrades. You won't have to upgrade the logstash Elasticsearch output every time you upgrade your Elasticsearch cluster if you use HTTP. We hit these issues ourselves in our benchmarking, finding that some versions of the new Elasticsearch betas don't work with the beta jars we ship.
The test was setup with a 3-node Elasticsearch cluster running. Each node being a m3.large in a single AZ in us-east-1. There was a single Logstash node running on an m3.large in the same AZ. A variety of configs were tested to get the numbers below. Note that the HTTP based Elasticsearch output benefits greatly from having multiple
workers set. There is very little reason not to boost up the
workers config for this plugin.
As you can see in the chart below, HTTP was slightly slower than the other protocols, but not enough to register much a difference for real world use cases. You can find the raw data backing this chart here. If you're interested in running the benchmarks yourself, you can checkout our Logstash Cloud Benchmarker git repository and run them yourself.
Elasticsearch HTTP Sniffing
One of the most common reasons to use the node or transport protocols was that these protocols supported 'sniffing', whereby the Elasticsearch output would be able to connect to all nodes in the cluster in a round robin fashion, and update its list of hosts as members joined or left the cluster. We've added sniffing support in to the HTTP plugin giving it these same capabilities. You can define the interval between 'sniffs' with the
sniffing_delay configuration option, which specifies how long in seconds to wait between sniffs.
Optimizations to UserAgent and GeoIP Lookups
Logstash is often used to parse logs from webservers such as Apache and Nginx. It is often desirable to perform GeoIP lookups on IP addresses found in these logs and to classify their user agent strings. Both of these operations are surprisingly expensive. Logstash 2.0 will see a large boost in performance parsing these common fields. In the case of the user agent filter we saw a boost of ~3.7x on our sample dataset. In the case of GeoIP we saw a boost of 1.69x. This was achieved by adding an LRU cache that takes advantage of the clustering of IPs and user agents commonly seen in web requests. We highly recommend playing with the new
lru_cache_size option for both of these plugins, to see what gives your configuration the best bang for your buck. Keep in mind that larger values will speed up lookups (to a point), at the expense of memory.
Kafka Output 2.0
This beta also packages a new version of Kafka output that implements Java producer APIs included in the 0.8.2 release of Kafka. Unfortunately, the Logstash config options are not backward compatible with the previous release. If you are using the old version of Logstash output, you will have to update to the new config options. You can still use this output with any 0.8.x Kafka server since the clients itself are protocol compatible.
Give the 2.0.0-beta1 a spin! If you find any bugs please report them as issues either on the logstash core repo or on the appropriate logstash-plugins repository. You can also head over to our forum. We're excited to release Logstash 2.0, but can't do it without your help!