Elasticsearch for Apache Hadoop version 6.0.0-beta1edit

July 31, 2017

Tested against the latest and greatest Elasticsearch 6.0.0-beta1, the first beta of ES-Hadoop for 6.0.0 includes much needed fixes and features to work in harmony with all the hearty changes landing in Elasticsearch 6.0.

Please note that this is a beta release and that we do not recommend running this in production! See our Breaking Changes in 6.0 page for more information on what you might need to modify.

Breaking Changesedit

  • Elasticsearch on YARN integration has been removed #1001 #1027
  • Using Hadoop versions 1.x is deprecated as of 5.5.0 #1001
  • ES-Hadoop will support Scala 2.11 by default in 6.0.0 #1001

New Featuresedit

Serialization
  • Support Elasticsearch "join" types #1012

Enhancementsedit

Spark
Scripting
  • Allow users to use file based scripts during updates #918
  • Support script id/file/inline options #538
REST
  • Remove uses of features deprecated in 5.0 #881

Bug Fixesedit

Serialization
  • Support nested collections of Java Bean classes in Spark #1021
  • Not able to use kryoserializer for writing data into Elasticsearch #1019
  • JacksonJsonGenerator.getParentPath will always be empty #1006
Pig
  • Using es.mapping.exclude/include and still getting StrictDynamicMappingException on excluded fields #1015
Spark
  • Spark JavaRDD give partial documents #946
REST
  • Remove unmatched format specifier from Restservice #1029

Documentationedit

  • Unclear Docs (about multi-resource writes) #990