WARNING: Version 2.0 has passed its EOL date.

This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.

Key features »

›

Setup

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Setup

edit

Elasticsearch for Apache Hadoop is an open-source, stand-alone, self-contained, small library that allows Hadoop jobs (whether using Map/Reduce or libraries built upon it such as Hive, Pig or Cascading) to interact with Elasticsearch. Data flows bi-directionaly so that applications can leverage transparently the Elasticsearch engine capabilities to significantly enrich their capabilities and increase the performance. Elasticsearch for Apache Hadoop offers first-class support for vanilla Map/Reduce, Cascading, Pig and Hive so that using Elasticsearch is literally like using resources within the Hadoop cluster.

While the official name of the project is Elasticsearch for Apache Hadoop through out the documentation the term elasticsearch-hadoop will be used instead to increase readability.

This document assumes the reader already has a basic familiarity with Elasticsearch and Hadoop concepts; see the Appendix A, Resources section for more information.

While every effort has been made to ensure that this documentation is comprehensive and without error, nevertheless some topics might require more explanations and some typos might have crept in. If you do spot any mistakes or even more serious errors and you have a few cycles during lunch, please do bring the error to the attention of the elasticsearch-hadoop team by raising an issue or contact us.

Thank you.

If you are looking for Elasticsearch HDFS Snapshot/Restore plugin (a separate project), please refer to its home page.

Key features »