elasticsearch-hadoop binaries can be obtained either by downloading them from the elastic.co site as a ZIP (containing project jars, sources and documentation) or by using any Maven-compatible tool with the following dependency:
<dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>7.0.0-alpha1</version> </dependency>
The jar above contains all the features of elasticsearch-hadoop and does not require any other dependencies at runtime; in other words it can be used as is.
In addition to the uber jar, elasticsearch-hadoop provides minimalistic jars for each integration, tailored for those who use just one module (in all other situations the
uber jar is recommended); the jars are smaller in size and use a dedicated pom, covering only the needed dependencies.
These are available under the same
groupId, using an
artifactId with the pattern
spark artifact. Notice the
The Spark connector framework is the most sensitive to version incompatibilities. For your convenience, a version compatibility matrix has been provided below:
|Spark Version||Scala Version||ES-Hadoop Artifact ID|
1.0 - 1.2
1.0 - 1.2
1.3 - 1.6
1.3 - 1.6
Note that Cascading itself is not available in Maven central but rather in its own repo conjars.org. Make sure to add this repository to your build configuration in order for the Cascading dependencies to be properly resolved:
<repositories> <repository> <id>conjars.org</id> <url>http://conjars.org/repo</url> </repository> </repositories>
Releases are available in the central Maven repository.
Development (or nightly or snapshots) builds are published daily at sonatype-oss repository (see below). Make sure to use snapshot versioning:
but also enable the dedicated snapshots repository :
Elasticsearch for Apache Hadoop is a client library for Elasticsearch, albeit one with extended functionality for supporting operations on Hadoop/Spark. When upgrading Hadoop/Spark versions, it is best to check to make sure that your new versions are supported by the connector, upgrading your elasticsearch-hadoop version as appropriate.
Elasticsearch for Apache Hadoop maintains backwards compatibility with the most recent minor version of Elasticsearch’s previous major release (5.X supports back to 2.4.X, 6.X supports back to 5.6.X, etc…). When you are upgrading your version of Elasticsearch, it is best to upgrade elasticsearch-hadoop to the new version (or higher) first. The new elasticsearch-hadoop version should continue to work for your previous Elasticsearch version, allowing you to upgrade as normal.
Elasticsearch for Apache Hadoop does not support rolling upgrades well. During a rolling upgrade, nodes that elasticsearch-hadoop is communicating with will be regularly disappearing and coming back online. Due to the constant connection failures that elasticsearch-hadoop will experience during the time frame of a rolling upgrade there is high probability that your jobs will fail. Thus, it is recommended that you disable any elasticsearch-hadoop based write or read jobs against Elasticsearch during your rolling upgrade process.