Here Comes the Cloud

The above configuration enables auto discovery in Amazon AWS. Simply replace amazon with rackspace to work on the Rackspace cloud. There is a long list of compute cloud providers supported, including GoGrid, and Terremark.

Gateway

ElasticSearch has been designed to do reliable asynchronous long term persistency. This enables several features including the ability for fast local “runtime” storage (including in-memory) while having a long term storage that can be slower. The Gateway concept is described in the Search Engine Time Machine post.

But first, a step back. When designing a system that would be deployed on the cloud, lets take a search engine for example ;), things come and go. One of those things that come and go are disks. So, local storage, in cloud environments, is considered transient. In Amazon AWS for example, EBS (Elastic Block Store) was introduced to provide a mountable disk that survives restarts. So, we could configure our search engine to store the index on EBS. But, EBS requires periodic snapshotting to S3 (amazon blob store) for “safe” persistency, since EBS can certainly suffer from failures as well. Of course, this means more money spent on your cloud deployment since now one pays for both EBS and S3.

One way to work around this is to persist directly from the local store to S3 by writing some sort of synchronization script / code. But, if the machines fails we will loose all the data up to the point when the script last ran. The next step is to add replication (and sharding for performance) and so on. All of this is provided by elasticsearch out of the box.

Here is how elasticsearch can be configured to store both its cluster metadata (to survive full cluster failure) and indices in the cloud:

cloud:
    account: <Your Amazon AWS Account Here>
    key: <Your Amazon AWS Secret Key Here>
    blobstore:
        type: amazon
gateway:
    type: cloud
    cloud:
        container: mycontainerhere

The above simple configuration will store things in Amazon S3. Simply change amazon to rackspace to use Rackspace CloudFiles. There is a long list of blobstore providers supported, including Azureblob.

Final Words

As you can see, elasticsearch is now a first class citizen when running on the cloud. I believe that it has actually created a new level of intimate integration of products with the cloud. Both the Discovery and Gateway means that managing an elasticsearch deployment on the cloud is a breeze.

As a side note, I would like to note that cross cloud support is done using jclouds. Highly recommended.

-shay.banon

Sign up for product updates!

Subscribe to the RSS feed RSS