Elasticsearch for Apache Hadoop

Immediate Insight Into Your Big Data

Elasticsearch for Apache Hadoop — also known as es-hadoop — is a two-way connector that lets you read and write data to Elasticsearch and query it in real time, helping you leverage the power of both Hadoop and Elasticsearch.

Get Product Updates

Elasticsearch for Apache Hadoop 2.4.0

Installation Steps

  • Download and unpack the latest Elasticsearch for Apache Hadoop distribution

  • Add to Hadoop

  • Interact seamlessly with Elasticsearch for Hadoop

    Simple and friendly start up. No high barriers or infrastructure needed.

Enhancements

  • Spark:
    • Explicitly specify buildScan return type #826

Bug fixes

  • Rest:
    • Elasticsearch-spark connector failing to save data with an Illegal Argument Exception : "No class name given" #837
    • Fix memory leak caused by re-wrapping DelegatedProtocol #823
    • Fixed a bug in the Resource URI Query parsing
  • Spark:
    • Fix: Non empty BINARY fields are considered empty #834
    • Spark Datasource: Non empty BINARY fields are always considered empty #835

Non Issue

  • Hive:
    • Error Loading Data #839
  • Pig:
    • Get fields with documents in Elasticsearch #825

Elasticsearch for Apache Hadoop 5.0.0-beta1

New features

  • MR:
    • Integrating with ES ingest node pipeline #806
  • Mapping:
    • Add support for 16-bit/half floats in ES 5.x #800
    • New ES type scaled_float #822
  • Spark:
    • Provided dedicated support for Spark 1.3-1.6 DStreams #802

Enhancements

  • Spark:
    • Improve ElasticsearchRelation extensibility #829
    • Explicitly specify buildScan return type #826

Bug fixes

  • Rest:
    • Update/Upsert bulk actions with scripts are broken for ES 1.x as of 5.0.0-alpha5 #817
    • Shard size estimations for Slice API do not target shards #843
    • Elasticsearch-spark connector failing to save data with an Illegal Argument Exception : "No class name given" #837
    • Spark Datasource: Non empty BINARY fields are always considered empty #835
    • fix memory leak caused by re-wrapping DelegatedProtocol #823
  • Spark:
    • Dropping a whole column of a DataFrame while indexing the DF into ES #841
    • Fix: Non empty BINARY fields are considered empty #834
    • Version compatibility detection wrong in elasticsearch-spark-20 5.0.0-alpha5 #824

Non Issue

  • Hive:
    • Error Loading Data #839
  • Pig:
    • Get fields with documents in Elasticsearch #825
  • Rest:
    • Some way to query "total" field from result set? #830
  • Spark:
    • Array types not supported in automatic mapping #847
    • Connection error with Elasticsearch 2.4.0 #846
    • Handling decimal type in dataset #842
    • Fix import of JavaEsSparkSQL #840
    • PROBLEM: Failed to write data back to ES by using Spark DataFrame save() API !! #836