Elasticsearch for Apache Hadoop 2.1.0.Beta4


See issues on GitHub

Release Notes

Breaking changes:

  • Upgrade integration with spark 1.3.0 #400

New features:

  • Introduce configuration for returning `Date` object as primitives #422
  • create Scala 2.11 artifact in addition to Scala 2.10 #376
  • Allow client-only routing #375
  • [Spark]Is there a way to make elasticsearch-hadoop stick to the load-balancer(client), instead of going trying to ping all the data nodes? #373
  • Support serialization of case classes and java beans #365
  • allow per-doc metadata to be specified at runtime #358
  • Add support for the newly introduced `sources` in Spark 1.2 #350
  • elasticsearch-hive0.14 issue #333
  • Feature - Setting ids in spark w/o using a Map #255

Enhancements:

  • Cannot set es-resource properties from the command-line #434
  • Add warning when running Spark 1.2 jar against Spark 1.3 and vice versa #415
  • Add warnings when invoking saveToES on SchemaRDD but not using the sql package #414
  • Upgrade to Hive 1.1 #413
  • Exception in SparkSQL when es.read.metadata=true #408
  • Keep date formatting behavior aligned with ES API for spark serialization #397
  • Added ability to exclude fields from the es mapping via 'es.mapping.exclude' config to ScalaValueWriter and SchemaRDDValueWriter #391
  • Saving case class with mapping id error #384
  • Precise message for ValueWriter #370
  • hadoop trying to connect to elastic search non data node and failed #368
  • Upgrade to Spark 1.2 #347
  • TaskAttemptId string is not properly formed #346
  • sparksql cant INSERT a es table #330
  • Hive Column Comments causing Hive Query to fail #322
  • Allow the source document to be returned as is #232
  • Allow fields in a doc to be excluded/included #230
  • exclude master/client nodes from data requests #214
  • org.apache.calcite#calcite-core;0.9.2-incubating-SNAPSHOT: not found #337

Bug fixes:

  • Enforce time zone for index formatting when none is specified #435
  • failed unit test dateindexformattertest #433
  • HeartBeat vs. mapreduce.task.timeout doesn't consider "0 == infinite" case #426
  • SchemaRDD seems to be lost when loading parquet files #403
  • [spark] esRDD doesn't respect config setting "es.field.read.empty.as.null" #402
  • elasticsearch and Hive integration on Yarn #393
  • Invalid position given exception #386
  • EsSpark.esJsonRDD error #385
  • Unable to index JSON from HDFS using SchemaRDD.saveToEs() #382
  • Excluding fields when writing JSON documents from Spark to Elasticsearch doesn't work #381
  • [Spark] Case Class example failed after compile #378
  • Duplicate documents returned on alias scan #363
  • Dynamic es.resource.write fails to find nested field #362
  • Elastic search Hive integration issues #359
  • Constant values should not be quoted by default #353
  • When using nested objects with MR the returned array does use the correct type #342
  • java.util.Date cannot be cast to org.apache.hadoop.io.Writable #340
  • java.lang.UnsupportedOperationException caused by org.elasticsearch.hadoop.mr.EsInputFormat ? #338
  • group by error #331
  • elasticsearch-spark_2.10-2.1.0.Beta2 exception when join with parqustFile #323
  • JSON serialization error #311
  • Document Count in ES Different from Number of Entries Pushed #283
  • Spark: UpdateScriptParams: JSON serialization error #351
  • Cannot Find Node #436
  • Hive Runtime Error while Writting into ES table #432
  • savetoES can't use pre-defined mapping #424
  • Not able to transfer data from hive to elastic-search #417
  • Not able to insert data into ES using elasticsearch-hadoop-2.1.0.Beta3.jar #416
  • internal.es.yarn.file default configuration not present in YARN/cfg.properties #411
  • Hive- Elasticsearch Write Operation #409
  • SparkSQL fails inserting data into Elasticsearch index #406
  • Es-Hadoop ingestion through Pig is missing the mappings #405
  • How can be sure of data colocation on my Spark/ES cluster ? #383
  • HashMap[String,String] and elastic search type mapping is not kicking in to map String to Integer #372
  • Indexing ES using rdd over https connection #371
  • Serialization Issue from scala.collection.immutable.HashMap$HashTrieMap #369
  • Support external versioning of documents #343
  • Can one increase number of partitions and hence spark nodes used? #339
  • Fix nested type serialization #327
  • [2.1.0.Beta2] [ES 1.3.2] [Spark 1.1.0] EsHadoopNoNodesLeftException #303
  • Issue while joining two hive tables stored on ES #293
  • Anyway to silence 'WARN EsInputFormat: Cannot determine task id...' #427
  • Bug in ElasticSearch and Spark SQL: Using SQL to query out data from JSON documents is totally wrong! #377

Docs:

  • Incorrect parameter names es.update.params and es.update.params.json in config examples #430
  • Mistakes in documentation #421
  • Update Spark doc for new API InputFormat #401
  • Fixed include/exclude examples #392
  • Not able to locate scala.XML while adding 2.1.0.Beta3 version in dependency #374
  • Fix typo, cleanup paragraph #367
  • better document the date formatting feature for dynamic writing #360
  • Document needs correction #329
  • Demonstrate Storm's tick feature #312
  • Improve Pig file size to increase parallelism according to shard size #294