Tech Topics

Elasticsearch on YARN and SSL Support in Elasticsearch Hadoop

I am happy to announce Elasticsearch for Apache Hadoop 2.1.Beta3 has just been released. We are introducing two new features: SSL connectivity and enhanced HTTP authentication and dedicated support for running Elasticsearch on YARN.

Elasticsearch on YARN

With 2.1.Beta3, we introduce the Elasticsearch-on-YARN (aka es-yarn) project for running an Elasticsearch cluster within a YARN environment. Similar to the repository-hdfs plugin, es-yarn is distributed as part of the Elasticsearch for Apache Hadoop (aka es-hadoop) project, but is independent and has no dependencies outside YARN itself.

With es-yarn, one can now provision, start and stop Elasticsearch directly on a YARN cluster. In YARN lingo, es-yarn bootstraps a client that deploys a dedicated ApplicationManager in YARN which, on its behalf, creates one container for each Elasticsearch node required.

For the user, es-yarn is a straight-forward CLI (Command-Line Interface) for deploying and managing the life cycle of the Elasticsearch cluster within YARN.

To wit, simply download elasticsearch-yarn-2.1.Beta3.jar and run:

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar
No command specified
Usage:
 -download-es  : Downloads Elasticsearch.zip
 -install      : Installs/Provisions Elasticsearch-YARN into HDFS
 -install-es   : Installs/Provisions Elasticsearch into HDFS
 -start        : Starts provisioned Elasticsearch in YARN
 -status       : Reports status of Elasticsearch in YARN
 -stop         : Stops Elasticsearch in YARN
 -help         : Prints this help
Configuration options can be specified _after_ each command; see the documentation for more information.

Each command should be self-explanatory. Typically one would:

Download Elasticsearch

You can do this yourself. However, out of the box, es-yarn can do this for you:

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -download-es
Downloading Elasticsearch 1.4.0
Downloading ...........................................................................DONE

Provision Elasticsearch and Elasticsearch on YARN in HDFS

To start a YARN application, YARN needs to get access to the needed artifacts from HDFS. es-yarn can provision HDFS on your behalf:

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -install-es
Uploaded /opt/es-yarn/downloads/elasticsearch-2.1.Beta3.zip to HDFS at hdfs://127.0.0.1:50463/apps/elasticsearch/elasticsearch-2.1.Beta3.zip
$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -install
Uploaded opt/es-yarn/elasticsearch-yarn-2.1.Beta3.jar to HDFS at hdfs://127.0.0.1:50463/apps/elasticsearch/elasticsearch-yarn-2.1.Beta3.jar

Start Elasticsearch on YARN

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -start
Launched a 1 node Elasticsearch-YARN cluster [application_1415921358606_0001@http://hadoop:8088/proxy/application_1415921358606_0001/] at Fri Nov 14 21:11:39 EET 2014

and voila:

Want to run multiple nodes? Just tell es-yarn so:

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -start containers=2
Launched a 2 node Elasticsearch-YARN cluster [application_1415921359403_0001@http://hadoop:8088/proxy/application_1415921359403_0001/] at Fri Nov 14 21:24:53 EET 2014

Stop the cluster

When you are done, shutdown the cluster like this:

$ hadoop jar elasticsearch-yarn-2.1.Beta3.jar -stop
Stopped Elasticsearch-YARN cluster with id application_1415921358606_0001

It’s that simple!

SSL and HTTP authentication

es-hadoop uses REST over HTTP to communicate with Elastisearch. Release 2.1.Beta3 introduces official support for basic HTTP authentication allowing Hadoop jobs running against a restricted Elasticsearch cluster to identify themselves accordingly. While es-hadoop has supported authentication through its proxy options, with 2.1.Beta3 it is graduated to an individual component and thus can be used within or outside the context of a proxy configuration.

Further more, the new 2.1 Beta release introduces SSL/TLS support for cryptographic connections between Elasticsearch and your Hadoop cluster. Thus data-sensitive environments can transparently encrypt the data at transport level to prevent snooping and preserve data confidentiality.

Note that while self-signed certificates are supported (though are disabled by default) for development, for production environments we strongly recommend using a proper authority to create your certificates.

Strata Barcelona

If you happen to be in Barcelona next week and are interested in Elasticsearch, we'd love to talk to you!

We are headed to Strata Barcelona and we’ll be giving two presentations, plus you can always stop by our booth (P5). Yours truly is presenting on Thursday, Nov. 20th and Shay, the man himself, is on Friday, Nov. 21st. Further more on Friday evening, you can get your Elasticsearch: The Definitive Guide copy signed by Clinton and Zack!

Also, join us for the meetup (please RSVP – seats are limited) on Thursday, Nov. 20th at CCIB.

We look forward to your feedback on Elasticsearch Hadoop 2.1.Beta3 – – you can find the binaries are available on the download page and the new features explained in the reference documentation. As always, you can file bugs or feature requests on GitHub.