Elasticsearch for Apache Hadoopedit

Elasticsearch for Apache Hadoop is an open-source, stand-alone, self-contained, small library that allows Hadoop jobs (whether using Map/Reduce or libraries built upon it such as Hive, Pig or Cascading or new upcoming libraries like Apache Spark ) to interact with Elasticsearch. One can think of it as a connector that allows data to flow bi-directionaly so that applications can leverage transparently the Elasticsearch engine capabilities to significantly enrich their capabilities and increase the performance.

Elasticsearch for Apache Hadoop offers first-class support for vanilla Map/Reduce, Cascading, Pig and Hive so that using Elasticsearch is literally like using resources within the Hadoop cluster. As such, Elasticsearch for Apache Hadoop is a passive component, allowing Hadoop jobs to use it as a library and interact with Elasticsearch through Elasticsearch for Apache Hadoop APIs.

While the official name of the project is Elasticsearch for Apache Hadoop throughout the documentation the term elasticsearch-hadoop will be used instead to increase readability.

Tip

If you are looking for Elasticsearch HDFS Snapshot/Restore plugin (a separate project), please refer to its home page.

Tip

If you are looking for Elasticsearch on YARN (a separate project), please refer to its dedicated section.