27 8월 2015 엔지니어링

Elasticsearch for Apache Hadoop 2.2.0-m1 and 2.1.1

By Costin Leau

Hot on the heels of Elasticsearch 2.0.0-beta1 release (check it out!), we are pleased to announce the releases of Elasticsearch for Apache Hadoop, or simply ES-Hadoop, 2.2.0-m1 and 2.1.1.

ES-Hadoop 2.2.0-m1

This first milestone in the 2.2 branch introduces, among bug fixes, several new features:

  • Elasticsearch 2.0.0-beta1 support

Want to use the new goodies in Elasticsearch 2.0 in Spark or Hadoop? That’s what ES-Hadoop is here for!
Note that compatibility with Elasticsearch 1.x is preserved in ES-Hadoop 2.2.x.

  • Classpath duplication detection

At start-up, ES-Hadoop checks whether there’s another version used in the job and properly informs the user. This helps the not-so-uncommon case of multiple versions being picked arbitrarily from different sources (task vs jar vs runtime vs Java classpath).

  • Detect (and use) dedicated date format libraries

While ES-Hadoop does provide data formatting, it now detects whether one is provided (such as Joda-Time) and uses that instead allowing richer formatting of Dates. Future versions might be extended to support the new java.time package in Java 8.

  • Upgrade to Gradle 2.6

Those building the source manually will notice the system has been upgraded to the latest stable version, resulting in some nice speed improvements and simplifications (such as being able to use Java 8 for building the Scala modules).

ES-Hadoop 2.1.1

Alongside the milestone, ES-Hadoop 2.1.1 was released.

It is the latest stable version of ES-Hadoop and contains important bug-fixes and improvements such as:

  • Push-down Spark RDD count

The count operations in Elasticsearch Spark have been pushed-down resulting in instant response no matter how big the esRDD.

  • Faster handling of unrecoverable network errors

Fatal network errors are detected early and reported better, simplifying diagnosis.

  • Restore Pig compatibility across Hadoop versions

A previous bug sometimes prevented Pig support from being used across Hadoop versions. This has now been addressed.

Feedback

ES-Hadoop 2.0.x and 2.1.x users are recommended to upgrade to 2.1.1 while those interested in trying out Elasticsearch 2.2.0, should take ES-Hadoop 2.2.0-m1 for a spin.

Let us know what you think on Twitter (@elastic) or on our forum. You can report any problems on the GitHub issues page.