Elasticsearch for Apache Hadoop version 5.0.0edit

October 26, 2016

Breaking Changesedit

  • Update to compat with Storm 1.x.x #769
  • Why does EsBolt still use the old JAR #754
  • Drop HDFS repository #741
  • Drop support for Spark 1.0-1.2 #740
  • Remove legacy Hive version #739

New Featuresedit

Mapping
  • Add support for 16-bit/half floats in ES 5.x #800
  • New ES type scaled_float #822
MR
  • Integrating with ES ingest node pipeline #806
New Slice API
  • Add the ability to create IndexPartition based on the desired number of documents per split #812 (issue: #778)
Spark
  • Support for Spark 2.0 / Datasets #647
  • Provided dedicated support for Spark 1.3-1.6 DStreams #802

Enhancementsedit

Build
  • 2.3.1 release notes list many bugs as fixed without commits #770
REST
  • Expire Unused Pooled Transports in Spark Streaming #849
  • Validate field exclusion settings when EsSpark.saveJsonToEs() is executed #782
  • Review the error messages in ES 5.x and properly parse them #779
  • Align extended Boolean parsing with Elasticsearch #798
  • Fields replacement in es.update.script.params fails for objects #760
  • Replace/Ignore DOT character in field names before inserting into Elastic Search #758
Spark
  • Improve ElasticsearchRelation extensibility #829
  • Explicitly specify buildScan return type #826
  • Upgrade to Spark 1.6.2 #797
  • Fix example for spark. #768 (issue: #767)

Bug Fixesedit

Build
  • Repository HDFS still referred in the build system #775
  • Spark 1.2 artifacts are still referenced #774
Hive
  • Date type not properly inserted into ES #757
MR
  • Reading boolean 0/1 values from ES into Spark does not work (although false/true is ok) #795
  • Elasticsearch Mapreduce write "es.mapping.exclude" not working #752
MR and Spark
  • Restore data locality preference for hadoop and spark #819 (issue: #814)
NetworkClient
  • Node [127.0.0.1:9200] failed (Read timed out); no other nodes left - aborting… #753
  • Invalid target URI POST@null/pharmadata/test/_search #748
Pig
  • Pig maps can have values of different types #777
Rest
  • Update/Upsert bulk actions with scripts are broken for ES 1.x as of 5.0.0-alpha5 #817
  • Shard size estimations for Slice API do not target shards #843
  • Elasticsearch-spark connector failing to save data with an Illegal Argument Exception : "No class name given" #837
  • Spark Datasource: Non empty BINARY fields are always considered empty #835
  • Fix memory leak caused by re-wrapping DelegatedProtocol #823
  • Fixed parsing of ES mapping with OBJECT field named properties #810 (issue: #809)
  • Not all errors in ES are reported when writing #720
  • es.index.auto.create setting doesn’t work as expected #793
  • Parsing error messages can cause a NPE #776
  • shard preference concatenation with | gives query error #874
Spark
  • Dropping a whole column of a DataFrame while indexing the DF into ES #841
  • Fix: Non empty BINARY fields are considered empty #834
  • saveToEs saves fields with NULL values #792
  • Failed to write data back to ES by using Spark DataFrame save() API !! #749
  • Pushdown option not working as expected with Spark data frames #734
URLDecoder
  • Illegal hex characters in escape (%) pattern - For input string: " S" #747
Yarn
  • Authenticate with a local keytab in the YARN Application Master #807

Documentationedit

  • es.read.field.as.array.include NOT es.field.read.as.array.include #860
  • Corrected property name #805
  • Repository HDFS doc improvements #781
  • Snapshot Creation Exception #813
  • Corrected link to ADD command #788
  • Update spark.adoc #785
  • Update cloud.adoc #773
  • Update configuration.adoc #772
  • Tests fail in build #755
  • Documentation Issue : Wrong package name #838
  • ES_CLASSPATH removed from ES 2.x #861

Non-issueedit

Hive
  • Error Loading Data #839
  • EsHadoopIllegalArgumentException: Cannot detect ES version #794
  • ES-Hive throwing exception while reading for bulk no. of columns #790
  • Create a external table with location command success while elasticsearch-hadoop do not support location function #786
Pig
  • Store Map as object with Pig in Elasticsearch #848
  • Get fields with documents in Elasticsearch #825
  • Failure while using EsStorage twice on a single Pig script to store a Parent Child relation #756
  • Store Tuple as object with Pig in Elasticsearch #746
REST
  • Some way to query "total" field from result set? #830
  • PROBLEM: Failed to write data back to ES by using Spark DataFrame save() API !! #836
  • Elasticsearch : Cannot detect ES version #791
Spark
  • Class Not Found Exception #863
  • java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class #862
  • Array types not supported in automatic mapping #847
  • Connection error with Elasticsearch 2.4.0 #846
  • Handling decimal type in dataset #842
  • Fix import of JavaEsSparkSQL #840
  • Got exception when I tried to invoke _mtermvector #796
  • ES-Hadoop on Spark 2.0 #759
  • elasticsearch-hadoop exception about mapping #766
  • Parameter ES_MAPPING_TIMESTAMP is not working using saveToEs #765