I am pleased to announce Elasticsearch for Apache Hadoop releases 2.0.2 and 2.1.Beta2. (If you haven't been following our story so far, es-hadoop is our connector that serves up real-time search & analytics for your Hadoop deployments.)
2.0.2 is the latest stable release containing several bug fixes and is recommended upgrade for all existing users.
Spark SQL Support
2.1 Beta2 extends our
native Spark support through Spark SQL integration. One can save
SchemaRDDs to Elasticsearch or materialize them based on indices or queries (effectively creating views).
For example, finding out the “Smith"s is a one liner:
import org.apache.spark.sql.SQLContext import org.elasticsearch.spark.sql._ ... val people = sqlContext.esRDD("spark/people","?q=Smith") // check the associated schema println(people.schema) // root // |-- name: string (nullable = true) // |-- surname: string (nullable = true) // |-- age: long (nullable = true)
The data and its associated schema are loaded through the returned
SchemaRDD and through Spark SQL, and can be further interrogated through SQL.
Writing to Elasticsearch looks strikingly similar, as any
SchemaRDD can be indexed. For this example, let's use the Java support:
import org.apache.spark.sql.api.java.*; import org.elasticsearch.spark.sql.java.api.JavaEsSparkSQL; JavaSchemaRDD people = JavaSQLContext.parquetFile("people.dat") // filter data using SQL people.registerTempTable("people"); JavaSchemRDD teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19") // index it to Elastic JavaEsSparkSQL.saveToEs(teenagers, "spark/teens");
Again, it's just a one liner to save the data to Elasticsearch.
In addition to the Spark SQL support, the Spark module has had several improvements: the existing
RDDs have been enhanced to
PairRDDs and the code base has been upgraded to Spark 1.1 while maintaining backwards compatibility with Spark 1.0.
CDH 5.1 Certified
Happy to announce that @elasticsearch for Apache Hadoop 2.1 is now certified on Cloudera 5, including support for Apache Spark!
— Cloudera Connect (@ClouderaConnect) October 2, 2014
Speaking of Spark, we are glad to report that es-hadoop is now officially certified for CDH 5.1 (in addition to CDH 5.0) this time including the Spark category. We are tracking our releases to Hadoop's releases to make sure our product evolves in step with its ecosystem, giving our users peace of mind knowing that es-hadoop will simply work out of the box.
Apache Storm Integration
2.1 Beta2 makes
Apache Storm a first class citizen. (And, by the way, congrats to the Storm team for graduating to a top level project) in the Apache incubator. es-hadoop brings real-time search and analytics to Storm's stream data processing platform through dedicated native
Spout implementation to ingest data and fan-out queries from and to Storm topologies.
To index data to Elasticsearch simply use
TopologyBuilder builder = new TopologyBuilder(); builder.setBolt("esBolt", new EsBolt("twitter/tweets"));
Executing queries in Elasticsearch for Storm is yet another one-liner:
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("es-spout", new EsSpout("twitter/tweets", "?q=nfl*), 5); builder.setBolt("bolt", new PrinterBolt()).shuffleGrouping("es-spout");
Under the covers, es-hadoop uses its parallelized infrastructure to
Bolt instances across the index shards for what we call partition-to-partition architecture.
Low-latency/high-performance patterns like micro-batching and tick-tuples are supported to provide excellent through-put out of the box and closely integrate the real-time capabilities of Storm and Elasticsearch.
Elasticsearch 1.4 Repository Support
Elasticsearch 1.4 Beta 1 was released last week bringing significant enhancements especially in resilience area. Among them, the snapshot and restore infrastructure has been revisited, with the new version supported by 2.1 Beta 2. (For Elasticsearch 1.0 – 1.3 please use es-hadoop 2.0.x.)
Elasticsearch Comes to NYC
If you happen to be in NYC next week and are interested in Elasticsearch, we'd love to talk to you!
Join us for the meetup (please RSVP – seats are limited) on Oct 15th at Twitter or if you are attending Strata NYC please pass by our booth. Many thanks to Twitter for hosting us!
We look forward to your feedback on 2.1.Beta2 – you can find the binaries are available on the download page and the new features explained in the reference documentation. As always, you can file bugs or feature requests on GitHub.