Elasticsearch & MapR Hadoop: The Best of Both Worlds

It’s been a big week for Elasticsearch and Hadoop! On Tuesday, we announced the release of Elasticsearch for Apache Hadoop 1.3 M3, which includes a ton of new features. And just last week, we announced our latest partnership in the Hadoop ecosystem, with MapR Technologies. We’re beyond thrilled to partner with MapR and to help their customers - and ours - add real-time search and analytics capabilities to their MapR Hadoop Distribution clusters.

With the combination of MapR and Elasticsearch, developers gain a scalable, distributed architecture to quickly perform search and discovery across tremendous amounts of information. The combined solution is already in use at leading enterprise companies including Solutionary, the leading pure-play managed security service provider, and several Fortune 100 financial services institutions.

For example, we worked closely with MapR Technologies to help a large financial institute store all of their raw access logs - billions of documents - in Hadoop. The documents were indexed into Elasticsearch using the Elasticsearch for Apache Hadoop integration, then visualized using Kibana. This approach allowed the customer to have near real time visibility into their data through Kibana, yet also run batch oriented jobs over all their raw data when needed.

Moreover, by using the Elasticsearch for Apache Hadoop, our data search and analytics capabilities were also available while executing the aforementioned Map/Reduce, Hive, or Pig jobs. The distributed nature of the Map/Reduce model fits really well on top of Elasticsearch because we correlate the number of Map/Reduce tasks with the number of Elasticsearch shards for a particular query. So every time a query is run, the system dynamically generates a number of Hadoop splits proportional to the number of shards available so that the jobs are run in parallel – your Hadoop cluster scales easily alongside Elasticsearch and vice-versa.

Best of all, Elasticsearch for Apache Hadoop provides a single jar that enables real time search and analytics across different Hadoop, Cascading, Hive and Pig versions and across multiple Hadoop distributions, whether it is vanilla Apache Hadoop, CDH, HDP, MapR or Pivotal. No dependencies, all the functionality.

We look forward to hearing what you’re building with Elasticsearch and MapR. Most of all, we want to hear how using this software together makes your life better, and how we can improve. Talk to us on Twitter any time!