Elasticsearch for Apache Hadoop

Immediate Insight Into Your Big Data

Elasticsearch for Apache Hadoop — also known as es-hadoop — is a two-way connector that lets you read and write data to Elasticsearch and query it in real time, helping you leverage the power of both Hadoop and Elasticsearch.

Get Product Updates

Elasticsearch for Apache Hadoop 2.4.0

Installation Steps

  • Download and unpack the latest Elasticsearch for Apache Hadoop distribution

  • Add to Hadoop

  • Interact seamlessly with Elasticsearch for Hadoop

    Simple and friendly start up. No high barriers or infrastructure needed.


  • Spark:
    • Explicitly specify buildScan return type #826

Bug fixes

  • Rest:
    • Elasticsearch-spark connector failing to save data with an Illegal Argument Exception : "No class name given" #837
    • Fix memory leak caused by re-wrapping DelegatedProtocol #823
    • Fixed a bug in the Resource URI Query parsing
  • Spark:
    • Fix: Non empty BINARY fields are considered empty #834
    • Spark Datasource: Non empty BINARY fields are always considered empty #835

Non Issue

  • Hive:
    • Error Loading Data #839
  • Pig:
    • Get fields with documents in Elasticsearch #825

Elasticsearch for Apache Hadoop 5.0.0-rc1


  • Rest:
    • Expire Unused Pooled Transports in Spark Streaming #849


  • Rest:
    • Documentation : es.read.field.as.array.include NOT es.field.read.as.array.include #860


  • Pig:
    • Store Map as object with Pig in Elasticsearch #848
  • Spark:
    • pyspark fails to write with 5.0.0-beta1 #852
    • Class Not Found Exception #863
    • java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class #862