Elasticsearch for Apache Hadoop 2.2.0-m1 and 2.1.1
This first milestone in the 2.2 branch introduces, among bug fixes, several new features:
- Elasticsearch 2.0.0-beta1 support
Want to use the new goodies in Elasticsearch 2.0 in Spark or Hadoop? That’s what ES-Hadoop is here for!
Note that compatibility with Elasticsearch 1.x is preserved in ES-Hadoop 2.2.x.
- Classpath duplication detection
At start-up, ES-Hadoop checks whether there’s another version used in the job and properly informs the user. This helps the not-so-uncommon case of multiple versions being picked arbitrarily from different sources (task vs jar vs runtime vs Java classpath).
- Detect (and use) dedicated date format libraries
While ES-Hadoop does provide data formatting, it now detects whether one is provided (such as Joda-Time) and uses that instead allowing richer formatting of
Dates. Future versions might be extended to support the new
java.time package in Java 8.
- Upgrade to Gradle 2.6
Those building the source manually will notice the system has been upgraded to the latest stable version, resulting in some nice speed improvements and simplifications (such as being able to use Java 8 for building the Scala modules).
Alongside the milestone, ES-Hadoop 2.1.1 was released.
It is the latest stable version of ES-Hadoop and contains important bug-fixes and improvements such as:
- Push-down Spark RDD count
count operations in Elasticsearch Spark have been pushed-down resulting in instant response no matter how big the
- Faster handling of unrecoverable network errors
Fatal network errors are detected early and reported better, simplifying diagnosis.
- Restore Pig compatibility across Hadoop versions
A previous bug sometimes prevented Pig support from being used across Hadoop versions. This has now been addressed.