Major releases don’t come every day, which is why I am astonishingly excited to announce the release of Elasticsearch for Apache Hadoop (aka ES-Hadoop) 6.0.0 built and tested against the latest and greatest Elasticsearch 6.0.0. This release has been a culmination of a monumental effort across Elastic as well as our awesome community. A special thank you to all who checked out the preview releases and provided invaluable feedback on them. And now, on to the shiny new stuff!
Spark 2.2.0 and Stable Support for Spark Structured Streaming
Spark 2.2.0 landed on July 11th and we spared no time in making sure we work seamlessly with it. What’s with all the excitement? Why, Structured Streaming is no longer an “Experimental” feature in this release! This means that we’re treating our Structured Streaming integration in ES-Hadoop as an evolving integration as of this beta release. Please note that due to its experimental nature in prior versions, we will only be supporting our Structured Streaming integration on Spark versions 2.2.0 and above. Don’t fret though - this doesn’t impact our existing Spark integrations at all.
RIP Elasticsearch on YARN
The Elasticsearch on Apache YARN (ES-on-YARN) beta integration has been removed in this release. ES-on-YARN was an experiment for deploying Elasticsearch on top of Hadoop’s YARN cluster resource negotiator. The project was never recommended for production use and has been in perpetual beta status since its inception. The core limitations for the project have been YARN’s lack of formal support for long-running services, which is a requirement for Elasticsearch to achieve production level stability. The ecosystem around long-running services in YARN has improved since the start of the beta, but much of the improvement is based in systems that sit on top of YARN like Apache Slider. These systems are still fairly young and would require quite a bit of work to migrate toward. With all this in mind, we have decided to cease development of the ES-on-YARN project. We’re always eager to hear your feedback, so if you have any about ES-on-YARN make it known on the github issue.
Have no fear though. When one door closes, another one opens: for users looking to easily orchestrate and manage a fleet of Elasticsearch clusters, either on-prem or in the cloud, Elastic Cloud Enterprise is the recommended solution.
Support for new Join Fields
The days are numbered for Multi-typed indices in Elasticsearch. Users who work with Parent-Child based data need not worry about the future due to the advent of the new “join” field type in Elasticsearch. We’ll be rolling out support for reading and writing data with this new field type in this release. We’re excited to hear your feedback on this new feature!
Multiple Mappings and Multiple Index Reads
We took a long hard look at how we handle Elasticsearch mappings in the connector. After that long hard look we re-wrote a healthy chunk of code to fix an unhealthy bunch of problems. In this release you will no longer be bitten by common errors when reading from multiple indices (each with varying field types). ES-Hadoop will also alert you when the indices you’re reading from have conflicting mappings in them.
Check Out Our Bug Collection
Nested Java Bean serialization problems, field exclusion problems on Pig and SparkSQL, partial document reads and serialization exceptions, parsing errors from index auto-creation, backwards compatibility errors with scroll id's, missing support for timestamps in params and much more all fixed in this release. Take a look at all of the items that have been spruced up in this release!