Product release

Elasticsearch for Apache Hadoop 2.2 RC1 is out

Celebrating the start of 2016, Elasticsearch for Apache Hadoop (ES-Hadoop) 2.2 rc1 has been released.

Packing a significant number of bug fixes and enhancements, this release candidate is the last step towards a full general availability release for the current development branch. As always, the artifacts are available at the download page or Maven.

Spark-ling updates

ES-Hadoop 2.2 RC1 introduced support for the just-released Spark 1.6, in particular skipping pushed down filters that otherwise would be processed again in Spark (despite being already handled by the connector). For large result sets, this results in an important optimization.

spark-sql.png

The push-down translation has been improved, in particular when dealing with IN filters by providing better matching when dealing with raw terms vs. values (such as dates or timestamps).

Speaking of Spark SQL, the schema declaration has been improved to handle multi-valued/array fields in a simple and elegant fashion (whether the fields are nested or not).

In addition, the connector configuration is now sanitized and safely passed throughout a Spark job; this addressed a subtle bug caused by command line-only properties being discarded during a job stage and causing abnormal behavior.

YARN Enhancements

A batch of updates were done to the YARN module by upgrading to Elasticsearch 2.1.x and allowing JVM system properties to be passed directly in the children container.

Repository HDFS


repository.png

The repository HDFS plugin has seen a lot of activity. While currently for Elasticsearch 2.0 and 2.1, it requires the JVM security manager to be disabled (as Hadoop is significantly greedier than Elasticsearch itself in terms of permissions), starting from Elasticsearch 2.2, due to the security improvements the plugin can customize its own code base grants.

Please note that the migration of the plugin to Elasticsearch core has already started and is currently scheduled for Elasticsearch 2.3.

More about that in a future blog post!

Network improvements

The wan/cloud feature has seen a lot of uptake which exposed the connector to more varied network topologies and configuration. This led to a number of fixes in the way ES-Hadoop handles Elasticsearch clusters with hostnames and IPs (typically with network publishing enabled) and the translation between the two. Overall, the connector picks up more information about its environments, reducing the amount of extra configuration on the user's behalf.

Feedback

Please let us know what you think about RC1! We love to hear from you on GitHub, Twitter or the forums. (IRC works too).

Looking forward to 2016!