January 8, 2016

Elasticsearch for Apache Hadoop 2.2 RC1 is out

Celebrating the start of 2016, Elasticsearch for Apache Hadoop (ES-Hadoop) 2.2 rc1 has been released.

Packing a significant number of bug fixes and enhancements, this release candidate is the last step towards a full general availability release for the current development branch. As always, the artifacts are available at the download page or Maven.

Spark-ling updates

ES-Hadoop 2.2 RC1 introduced support for the just-released Spark 1.6, in particular skipping pushed down filters that otherwise would be processed again in Spark (despite being already handled by the connector). For large result sets, this results in an important optimization.

The push-down translation has been improved, in particular when dealing with IN filters by providing better matching when dealing with raw terms vs. values (such as dates or timestamps).

Speaking of Spark SQL, the schema declaration has been improved to handle multi-valued/array fields in a simple and elegant fashion (whether the fields are nested or not).

In addition, the connector configuration is now sanitized and safely passed throughout a Spark job; this addressed a subtle bug caused by command line-only properties being discarded during a job stage and causing abnormal behavior.

YARN Enhancements

A batch of updates were done to the YARN module by upgrading to Elasticsearch 2.1.x and allowing JVM system properties to be passed directly in the children container.

Repository HDFS

The repository HDFS plugin has seen a lot of activity. While currently for Elasticsearch 2.0 and 2.1, it requires the JVM security manager to be disabled (as Hadoop is significantly greedier than Elasticsearch itself in terms of permissions), starting from Elasticsearch 2.2, due to the security improvements the plugin can customize its own code base grants.

Please note that the migration of the plugin to Elasticsearch core has already started and is currently scheduled for Elasticsearch 2.3.

More about that in a future blog post!

Network improvements

The wan/cloud feature has seen a lot of uptake which exposed the connector to more varied network topologies and configuration. This led to a number of fixes in the way ES-Hadoop handles Elasticsearch clusters with hostnames and IPs (typically with network publishing enabled) and the translation between the two. Overall, the connector picks up more information about its environments, reducing the amount of extra configuration on the user's behalf.

Feedback

Please let us know what you think about RC1! We love to hear from you on GitHub, Twitter or the forums. (IRC works too).

Looking forward to 2016!

Context engineering

Vector database

Search powered applications

Logs

Threat protection

Workflows

Elasticsearch

Kibana (Discover, Dashboards)

Elastic Agent Builder

AutoOps

Piped query language

Jina AI search models

Elastic Cloud Serverless

Elastic Cloud Hosted

Self-managed Elasticsearch

Ecommerce search

Customer support search

Search-driven apps

Log analytics

Infrastructure monitoring

Digital experience monitoring

App performance monitoring

AIOps

LLM observability

Next-gen SIEM

Workflows for security

XDR and endpoint security

AI for security

10x your data's value

Cloud providers

Elastic AI Ecosystem

Search AI Partner Program

AV-Comparatives

Forrester Wave™ XDR

Gartner Magic Quadrant Leader

IDC MarketScape

Search

Security

Observability

Get started

Demo gallery

Downloads

Integrations

Docs

Elasticsearch Labs

Elastic Security Labs

Elastic Observability Labs

Blog

Community

Events

Webinars

Discuss

Training

Support

Consulting