IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

›

Appendix C: Breaking Changes

edit

A newer version is available. Check out the latest documentation.

Appendix C: Breaking Changes

edit

For clarity, we always list any breaking changes at the top of the release notes for each version.

Deprecations in 8.18

edit

The following functionality has been deprecated in elasticsearch-hadoop 8.18 and will be removed in a future version. While this won’t have an immediate impact on your applications, we strongly encourage you take the described steps to update your code after upgrading to 8.18.

Spark 2.x support is deprecated

edit

Spark 2.x is no longer maintained. Spark 3 is still supported.

Breaking Changes in 8.9

edit

Removal of supported technologies in 8.9

edit

Support for the following less popular technologies has been removed in elasticsearch-hadoop 8.9.

Spark 1.x (Spark 2.x and 3.x are still supported)
Scala 2.10 support for Spark 2.x (Scala 2.11 and higher are still supported)
Apache Pig
Apache Storm

Breaking Changes in 8.0

edit

The default scroll size is now `1000`

edit

We’ve increased the es.scroll.size default from 50 to 1000. If you deploy ES-hadoop in a low memory environment, consider lowering es.scroll.size to avoid issues.

Deprecations in 8.0

edit

The following functionality has been deprecated in elasticsearch-hadoop 8.0 and will be removed in a future version. While this won’t have an immediate impact on your applications, we strongly encourage you take the described steps to update your code after upgrading to 8.0.

Spark 1.x support is deprecated

edit

Spark 1.x is no longer maintained. Spark 2 and Spark 3 are still supported.

Scala 2.10 support for Spark 2.x is deprecated

edit

Spark deprecated support for Scala 2.10 in the Spark 2.0.0 release and removed it in the 2.3.0 release. Scala 2.11 and 2.12 are supported for Spark 2.x.

Hadoop 1.x support is deprecated

edit

Hadoop 1.x is no longer maintained. Support for it has not been tested since elasticsearch-hadoop 6.0. Support is now formally deprecated. Hadoop 2 and Hadoop 3 are supported.

Apache Pig support is deprecated

edit

Apache Pig is no longer maintained.

Apache Storm support is deprecated

edit

Apache Storm has not been a popular Elasticsearch integration.

Breaking Changes in 7.0

edit

Cascading Integration has been removed

edit

Each integration that we support incurs a cost for testing and maintenance as new features are added in ES-Hadoop. We’ve seen the usage statistics dropping for the Cascading integration over the last couple of years. These include download numbers, community posts, and open issues. Additionally, ES-Hadoop only supports the Cascading 2 line. Cascading is already into it’s 3.X line and we haven’t seen much interest or pressure to get on to any of the newer versions. Due to these factors, Cascading was deprecated in the 6.6 release, and has been removed in the 7.0.0 release.

Java 8 or higher now required

edit

ES-Hadoop 7.0.0 and above requires a minimum Java version of 1.8 or above. The library will no longer be compiled for Java 1.6 or Java 1.7.

Configuration Changes

edit

es.input.use.sliced.partitions has been removed.

Breaking Changes in 6.0

edit

This section discusses the changes that you should be aware of if you upgrade ES-Hadoop from version 5.x to 6.x.

Configuration Changes

edit

es.input.use.sliced.partitions is deprecated in 6.5.0, and will be removed in 7.0.0. The default value for es.input.max.docs.per.partition (100000) will also be removed in 7.0.0, thus disabling sliced scrolls by default, and switching them to be an explicitly opt-in feature.

es.update.script has been deprecated in favor of the new es.update.script. inline option. The es.update.script configuration will continue to function as a legacy configuration with the same meaning as es.update.script.inline.

Pattern Formatting in Index and Type Names

edit

Coming soon in 6.0 is the removal of multiple types per index. All project tests were migrated to using single types where possible. This migration unearthed a problem with Pig and Cascading in regards to how those frameworks interact with the provided index/type names. During setup, Pig and Cascading attempt to parse these index names as paths, and when the index has a formatted pattern in it (like index-{event:yyyy.MM.dd}/type) the path parsing breaks. Going forward, the colon character : that delimits the field name from the format will be changed to the pipe character |. An example of the new patterned index names is index-{event|yyyy.MM.dd}/type. 5.x will support both separators, but will log a deprecation message when a colon : is encountered.

TTL and Timestamp Meta-fields

edit

Elasticsearch 6.0 is seeing the removal of the TTL and Timestamp meta-fields from being allowed on document index and update requests. Because of this, we will be removing the ability to specify them in 6.0 and deprecating their usage in 5.x.

Hadoop 1.x Support

edit

As of 6.0.0 we will no longer be testing on Hadoop 1.x versions. These versions have been considered end of life for quite a while. While the connector may continue to work for those versions, we no longer officially support them. Use these versions at your own risk!

YARN Integration Discontinued

edit

Due to the limitations in YARN for support of long running processes, we have decided to cease development and discontinue the YARN integration in ES-Hadoop. We thank the community for participating in the beta. If you have any questions or concerns about this, please feel free to discuss them on our forums.

Default Supported Scala Version is now 2.11

edit

In 6.0.0, our ES-Hadoop jar will ship with Spark compiled against Scala 2.11 by default. We will continue to support Scala 2.10 in compatibility artifacts, but the main project artifact will target Scala 2.11 by default.

Breaking Changes in 5.0

edit

This section discusses the changes that you should be aware of if you upgrade ES-Hadoop from version 2.x to 5.x.

Supported Hadoop Versions

edit

ES-Hadoop 5.0 has added support for new versions of software from the Hadoop ecosystem. Support for older versions is normally dropped during these cycles for the purposes of stability and to allow for cleanup of older code. Please note the following version compatibility changes in 5.0:

Support for Hive 0.13 and 0.14 has been removed. These versions of Hive are known to have serious issues that have since been repaired in 1.0. Hive 1.0 has been released for a while and a majority of distributions have already switched to it. Hive 1.0 will continue to be supported by ES-Hadoop 5.0.
With the addition of support for Storm 1.x, support for Storm 0.9 has been removed due to backwards compatibility issues.
Support for SparkSQL 1.0-1.2 has been removed. SparkSQL was originally released in Spark 1.0-1.2 as an alpha, but has since become stable in Spark 1.3 with a drastically changed API. Additionally, we have added support for Spark 2.0, which is binary-incompatible with previous versions of Spark. Instead of supporting three different version compatibilities of Spark at the same time we have decided to drop support for Spark SQL 1.0-1.2.

Names of Included Spark Jars in ES-Hadoop

edit

With the removal of support for Spark 1.0-1.2 and the addition of support for Spark 2.0, the names of all the Spark artifacts included in ES-Hadoop 5.0 have changed. All Spark artifacts in this release now have both of their Spark and Scala versions explicitly demarcated in their names (instead of just Scala and sometimes Spark).

Table1: Spark Jar Name Changes from 2.x to 5.x

Spark Version	Scala Version	ES-Hadoop 2.x name	ES-Hadoop 5.x Name
1.0 - 1.2	2.10	elasticsearch-spark-1.2_2.10-2.4.0.jar	<removed>
1.0 - 1.2	2.11	elasticsearch-spark-1.2_2.11-2.4.0.jar	<removed>
1.3 - 1.6	2.10	elasticsearch-spark_2.10-2.4.0.jar	elasticsearch-spark-13_2.10-5.0.0.jar
1.3 - 1.6	2.11	elasticsearch-spark_2.11-2.4.0.jar	elasticsearch-spark-13_2.11-5.0.0.jar
2.0+	2.10	<N/A>	elasticsearch-spark-20_2.10-5.0.0.jar
2.0+	2.11	<N/A>	elasticsearch-spark-20_2.11-5.0.0.jar

HDFS Repository

edit

In 5.0, the HDFS Repository plugin has been all but re-built from the ground up to allow for compatibility with the changes to security policies in Elasticsearch’s plugin framework.

Code has Moved

edit

In 5.0, the Repository HDFS plugin has been moved to the main Elasticsearch project. The documentation page in the elasticsearch-hadoop repository has been updated with a header signaling this move.

Security

edit

Disabling the Java SecurityManager in Elasticsearch is no longer required for the HDFS Repository plugin to function. Elasticsearch 5.0 requires all plugins to operate properly with the configured SecurityManager. The plugin was heavily modified to allow for compatibility with this new constraint. This should allow you to maintain a secured Elasticsearch instance while still using HDFS as a location for storing snapshots.

Changes in Configurations

edit

Due to constraints in the underlying security system as well as changes to the way the plugin functions, the following configurations have been removed in 5.0 with no replacement options:

concurrent_streams
user_keytab
user_principal
user_principal_hostname

Supported Versions

edit

Previously, the HDFS Repository supported both Apache Hadoop 1.x (default) and Apache Hadoop 2.x through two distributions. In 5.0, there is now only one distribution which is built against the latest Apache Hadoop 2.x (at this time 2.7.1). The distribution for Apache Hadoop 1.x has been removed.

Version `light` removed

edit

Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. Because of this, the light distribution of the repository plugin which contained no Hadoop client dependencies is no longer available in 5.0.

Strict Query Parsing

edit

In previous versions, users were able to specify options that modify search properties in Query DSL strings provided to the client. In some cases these properties would conflict with how the framework executed searches during read operations. In 5.0, when specifying a Query DSL string, if a query field is present, its contents are extracted and all other contents are discarded (such as source or size). If there is no query field, the entire text is nested inside of the query field during execution.

« Appendix B: License Release Notes »

Appendix C: Breaking Changes

Appendix C: Breaking Changes

Deprecations in 8.18

Spark 2.x support is deprecated

Breaking Changes in 8.9

Removal of supported technologies in 8.9

Breaking Changes in 8.0

The default scroll size is now 1000

Deprecations in 8.0

Spark 1.x support is deprecated

Scala 2.10 support for Spark 2.x is deprecated

Hadoop 1.x support is deprecated

Apache Pig support is deprecated

Apache Storm support is deprecated

Breaking Changes in 7.0

Cascading Integration has been removed

Java 8 or higher now required

Configuration Changes

Breaking Changes in 6.0

Configuration Changes

Pattern Formatting in Index and Type Names

TTL and Timestamp Meta-fields

Hadoop 1.x Support

YARN Integration Discontinued

Default Supported Scala Version is now 2.11

Breaking Changes in 5.0

Supported Hadoop Versions

Names of Included Spark Jars in ES-Hadoop

HDFS Repository

Code has Moved

Security

Changes in Configurations

Supported Versions

Version light removed

Strict Query Parsing

The default scroll size is now `1000`

Version `light` removed