IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› ›

Key features

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Key features

edit

The key features of Elasticsearch for Apache Hadoop include:

Scalable Map/Reduce model: elasticsearch-hadoop is built around Map/Reduce: every operation done in elasticsearch-hadoop results in multiple Hadoop tasks (based on the number of target shards) that interact, in parallel with Elasticsearch.
REST based: elasticsearch-hadoop uses Elasticsearch REST interface for communication, allowing for flexible deployments by minimizing the number of ports needed to be open within a network.
Self contained: the library has been designed to be small and efficient. At around 300KB and no extra dependencies outside Hadoop itself, distributing elasticsearch-hadoop within your cluster is simple and fast.
Universal jar: whether you are using vanilla Apache Hadoop or a certain distro, the same elasticsearch-hadoop jar works transparently across all of them.
Memory and I/O efficient: elasticsearch-hadoop is focused on performance. From pull-based parsing, to bulk updates and direct conversion to/of native types, elasticsearch-hadoop keeps its memory and network I/O usage finely-tuned.
Adaptive I/O: elasticsearch-hadoop detects transport errors and retries automatically. If the Elasticsearch node died, re-routes the request to the available nodes (which are discovered automatically). Additionally, if Elasticsearch is overloaded, elasticsearch-hadoop detects the data rejected and resents it, until it is either processed or the user-defined policy applies.
Facilitates data co-location: elasticsearch-hadoop fully integrates with Hadoop exposing its network access information, allowing co-located Elasticsearch and Hadoop clusters to be aware of each other and reduce network IO.
Map/Reduce API support: At its core, elasticsearch-hadoop uses the low-level Map/Reduce API to read and write data to Elasticsearch allowing for maximum integration flexibility and performance.
old(mapred) & new(mapreduce) Map/Reduce APIs supported: elasticsearch-hadoop automatically adjusts to your environment; one does not have to change between using the mapred or mapreduce APIs - both are supported, by the same classes, at the same time.
Apache Hive support: Run Hive queries against Elasticsearch for advanced analystics and real_time responses. elasticsearch-hadoop exposes Elasticsearch as a Hive table so your scripts can crunch through data faster then ever.
Apache Pig support: elasticsearch-hadoop supports Apache Pig exposing Elasticsearch as a native Pig Storage. Run your Pig scripts against Elasticsearch without any modifications to your configuration or the Pig client.
Cascading support: Cascading is an application framework for Java developers to simply develop robust applications on Apache Hadoop. And with elasticsearch-hadoop, Cascading can run its flows directly onto Elasticsearch.
Apache Spark: Run fast transformations directly against Elasticsearch, either by streaming data or indexing arbitrary RDDs. Available in both Java and Scala flavors.
Apache Storm: elasticsearch-hadoop supports Apache Storm exposing Elasticsearch as both a Spout (source) or a Bolt (sink).

« Documentation sections Requirements »