29 October 2015 Releases

Elasticsearch for Apache Hadoop 2.2 beta1 and 2.1.2 released

By Costin Leau

Continuing the “release bonanza” from yesterday (or depending on your timezone, this morning), I am happy to announce the releases of Elasticsearch for Apache Hadoop (aka ES-Hadoop) 2.2.0-beta1 and 2.1.2

Both releases contain a number of important bug-fixes but also new features such as:

Optimized data nodes routing

To minimize memory pressure on master nodes, the default routing has changed to read and write data only through data nodes excluding masters all together. Consolidated and crowded clusters should see increased throughput and better stability especially in long running jobs.


Support for Spark 1.5

Spark 1.5 is officially supported while maintaining compatibility with all the previous versions up to 1.x

Enhanced push down operations in Spark

Speaking of Spark, the push-down operations have been improved through the null-safe equality comparator (in Spark 1.5) and better generated IN clause.

On top of that, ES-Hadoop 2.2.0-beta1 features:

Elasticsearch 2.0 GA support

Yup, take the latest and greatest Elasticsearch release and run your Hadoop and Spark jobs against. As before, compatibility with Elasticsearch 1.x is preserved.
Not only beta1 is compatible at the REST level but also the repository-hdfs plugin has been rewritten to take advantage of the new plugin architecture in Elasticsearch 2.0.

Support for restricted/WAN installs

Running Elasticsearch in the cloud or behind a restricted firewall? Is access allowed only through a predefined number of gateways (that might or might not be part of Elasticsearch)? With beta1 this scenario is supported out of the box through a simple configuration option.
Do note that for performance reasons it is desired to have full access to the Elasticsearch data nodes.

Option to restrict the number of documents being read

For cases where only a restricted number of matches need to be returned for a given query, beta1 introduces a new configuration option to limit the results.


ES-Hadoop 2.1.x users are recommended to upgrade to 2.1.2 while those wanting to upgrade to Elasticsearch 2.0, to use ES-Hadoop 2.2.0-beta1.

Let us know what you think on Twitter (@elastic) or on our forum. You can report any problems on the GitHub issues page.

Better yet, if you would like to chat live about Elasticsearch and Hadoop/Spark, yours truly will be attending the Elastic{ON} Tour (London, Paris, New York and Chicago).

Looking forward to it!