PRESS RELEASE

Elastic Adds Support for Spark, Spark SQL, and Storm to Hadoop Connector

24 June 2015

Contact information

Amy White

Elastic Communications


Mountain View, Calif. and Amsterdam, The Netherlands - 24 June 2015 -

Elastic, the company behind the popular open source projects Elasticsearch, Logstash, and Kibana with more than 20 million downloads, today released version 2.1 of its Hadoop connector, Elasticsearch for Apache Hadoop, adding support for Spark, Spark SQL, and Storm.

Elasticsearch brings real-time search and analytics to Hadoop applications 

While Hadoop provides value as a high-scale, low-cost storage medium for doing batch analytics, it was not designed for real-time data availability or responsive queries. Elasticsearch for Apache Hadoop makes it simple for businesses to index, query, and backup data from Hadoop into Elasticsearch, adding the capabilities of a scalable, real-time search and analytics engine to their Hadoop platform – whether it’s for internal or external applications, or for performing back-office system monitoring, security, and log analysis. With the release of Elasticsearch for Apache Hadoop 2.1, Elastic’s Hadoop connector now supports Spark, Spark SQL, and Storm, in addition to native integration with MapReduce, Hive, Pig, and Cascading. It is also certified with Cloudera, Databricks, Hortonworks, and MapR.

“We see Hadoop being used more and more in our customer base, especially in industries that deal with vast amounts of varied data, like financial services, healthcare, and telecommunications. With added support for Spark, Spark SQL, and Storm, Elasticsearch for Apache Hadoop 2.1 provides Elastic’s rich search and analytics to the next-generation run-times in the Hadoop ecosystem,” said Costin Leau, Hadoop engineering lead at Elastic. 

“We use Spark to process large batches of sensor data that we then index in Elasticsearch to predict, and in turn, prevent equipment failure for our industrial and manufacturing customers,” said Roy Russo, vice president of engineering at Predikto, Inc. “Spark helped us improve our ETL processes by a factor of 10, and thanks to Elasticsearch for Apache Hadoop, we are able to get real-time insights, build dynamic dashboards, and plot maps with Elasticsearch’s rich querying capabilities and GeoJSON support.”

Key features of Elasticsearch for Apache Hadoop 2.1

  • Native integration with all GA releases of Spark and Spark SQL (1.0-1.4), including push down translation of Spark SQL to Elasticsearch Query DSL 
  • Native integration with Apache Storm, providing the ability to read/write data to Elasticsearch via Spout (read streams) and Bolt (write streams)
  • Security enhancements, including: 
    • Basic HTTP authentication allowing Hadoop jobs running against a restricted Elasticsearch cluster to identify themselves 
    • SSL/TLS support for cryptographic connections between Elasticsearch and Hadoop, enabling data-sensitive environments to transparently encrypt the data at transport level to prevent snooping and preserve data confidentiality
    • PKI support for Shield-enabled Elasticsearch clusters
  • Certification with CDH5.x, Databricks Spark, HDP2.x, and MapR 4.x 

Learn More

About Elastic

Founded in 2012 by the people behind the Elasticsearch and Apache Lucene open source projects, Elastic provides real-time insights and makes massive amounts of structured and unstructured data usable for developers and enterprises. By focusing on scalability, ease-of-use, and ease-of-integration, Elasticsearch, Logstash, and Kibana power many of the world’s leading mobile, social, consumer and business applications. Since its initial release, the open source stack has achieved more than 25 million cumulative downloads.

Elastic is backed by Benchmark Capital, Index Ventures, and NEA with headquarters in Amsterdam and Mountain View, California, and offices and employees around the world. To learn more, visit www.elastic.co.