Tech Topics

Azure Cloud Plugin for Elasticsearch

UPDATE September 21, 2015: This blog post was updated to reflect the new settings in elasticsearch.yml. These settings were supported starting with Elasticsearch 1.5 and the old settings were deprecated at that time. Starting with Elasticsearch 2.0, only the new settings will be supported.

In cloud environments like Azure, multicast is often (always?) forbidden. So you need to provide a list of nodes to help Elasticsearch to discover the nodes of the cluster. Starting new instances could be then tricky as you have to maintain a minimal list of nodes.

And what happens when a virtual machine goes down? When it comes back up, it could have a new IP address. So you need to edit unicast settings for each node, right?

We are pleased to announce the first release of Azure cloud plugin for elasticsearch. This first release uses the Azure API for the unicast discovery mechanism and simplifies your cluster growth management a lot.

Azure Virtual Machine Discovery

Azure plugin uses Azure REST API to perform automatic discovery, which is similar to multicast discovery in multicast-friendly environments. You just have to:

  • Create Azure instances (VMs)
  • Install Elasticsearch
  • Install Azure cloud plugin
  • Modify elasticsearch.yml file and define Azure cloud settings (Azure API key, subscription ID and service name)
  • Start Elasticsearch

And… You're done!

Want More Details?

Suppose that you want to build an Ubuntu 13 virtual machine running Elasticsearch. Let's say that you already have an Azure account with your ssh keys already defined and uploaded to Azure, that you have installed Windows Azure Command-Line Tool, that you have a storage account ready to use.

# You first need to generate a java keystore (azurekeystore.pkcs12) 
# from your existing ssh key (azure-private.key) and certificate (azure-certificate.pem)
openssl x509 -outform der -in azure-certificate.pem -out azure-certificate.cer
openssl pkcs8 -topk8 -nocrypt -in azure-private.key -inform PEM -out azure-pk.pem -outform PEM
openssl x509 -inform der -in azure-certificate.cer -out azure-cert.pem
cat azure-cert.pem azure-pk.pem > azure.pem.txt
openssl pkcs12 -export -in azure.pem.txt -out azurekeystore.pkcs12 -name azure -noiter -nomaciter
# Deploy an Ubuntu image on an extra small instance in West Europe:
azure vm create azure-elasticsearch-cluster \
  b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB \
  --vm-name myesnode1 \
  --location "West Europe" \
  --vm-size extrasmall \
  --ssh 22 \
  --ssh-cert /tmp/azure-certificate.pem \
  elasticsearch password1234!!
# "elasticsearch/password1234!!" are the SSH login/password for this instance.
# Connect to your instance when started
# SSH settings for convenience
HOST=myescluster.cloudapp.net
SSH_OPTIONS="-o User=elasticsearch -o IdentityFile=/tmp/azure-private.key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
# Copy your keystore to the VM
scp $SSH_OPTIONS /tmp/azurekeystore.pkcs12 $HOST:/home/elasticsearch
# Connect to the VM
ssh $SSH_OPTIONS $HOST

Install either the latest OpenJDK using sudo apt-get install openjdk-7-jre-headless or Oracle JDK and then install Elasticsearch and its Azure cloud plugin:

curl -s https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.10.deb \
  -o elasticsearch-0.90.10.deb
sudo dpkg -i elasticsearch-0.90.10.deb
sudo /usr/share/elasticsearch/bin/plugin -install \ 
  elasticsearch/elasticsearch-cloud-azure/1.0.0.alpha1

Edit /etc/elasticsearch/elasticsearch.yml and add:

cloud.azure.management:
   subscription.id: your_azure_subscription_id
   cloud.service.name: your_azure_cloud_service_name
   keystore:
      path: /home/elasticsearch/azurekeystore.pkcs12
      password: your_password_for_keystore

discovery:
  type: azure

Restart Elasticsearch and you're done! Now this instance uses the Azure API to get a list of available nodes.

sudo service elasticsearch restart

Scaling Out!

Hey! But we have started only one node! It's not really a cluster, right? Let's scale out and bring more nodes to the party!

# From your local machine, shutdown azure node and create an image:
azure vm shutdown myesnode1
azure vm capture myesnode1 esnode-image --delete
# Start 10 instances:
for x in $(seq 1 10)
    do
        echo "Launching azure instance #$x..."
        azure vm create azure-elasticsearch-cluster \
                        esnode-image \
                        --vm-name myesnode$x \
                        --vm-size extrasmall \
                        --ssh $((21 + $x)) \
                        --ssh-cert /tmp/azure-certificate.pem \
                        --connect \
                        elasticsearch password1234!!
    done

You should now have a cluster running with 10 nodes!

What's Next?

First, we love to hear feedback from our community! Feel free to ask questions on the mailing list and raise issues or ask for feature requests on GitHub. Pull requests are warmly welcomed too!

We plan to add a blog post on how to use it with Microsoft Windows virtual machines.

Also, Elasticsearch 1.0 comes with the great Snapshot and Restore feature. This basically means that we will provide new capabilities to use Azure Blob Storage for snapshots.

Stay tuned!