22 de outubro de 2013

Managing Elasticsearch with Found

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

Introduction

This article will show how you can easily create a cluster, upgrade versions and make it a high availability cluster. I will also briefly describe what happens behind the scenes of various types of operations.

Whether you need a small cluster for a few hours of testing, or a highly available mission critical production cluster, we work hard to make it easier to manage Elasticsearch clusters — so our clients can focus on their core business: building cool stuff!

Note: This article mainly describes how Found’s service works. Elasticsearch in Production has many important details about running Elasticsearch – whether you use Found or manage your own clusters. The aforementioned article goes into depth on why enough memory is key to Elasticsearch, and how high availability can be achieved in cloud environments.

Update August 30, 2015: Please note that some of the information in this article may be a little outdated. For up-to-date information about how to manage Elasticsearch with Found, see our Documentation.

Making a Cluster

Assuming you have signed up for Found and signed in to our administration console, you can quickly create a new cluster. Clusters are dedicated and completely isolated from each other.

Step 1: Choosing region and optionally specifying a name for your cluster

First, we need to know in what region you want to host your cluster. It is recommended to chose a region with close geographical proximity to your systems, to lower latencies. It’s optional to specify a name for your cluster, however it’s wise if you have several clusters!

Step 2: Choosing capacity and high availability

You will now be asked to opt for the size of the cluster, as well as whether you need high availability. As opposed to the region you have selected, capacity and high availability can be changed later on for a running cluster, without downtime, during indexing and searching.

Step 3: Choosing Elasticsearch version and plugins

Lastly, you can specify which Elasticsearch version you want to provision and which plugins you want enabled. As we will see in the next section, this can also be changed without downtime, assuming you upgrade to a compatible version.

When you click “Create cluster”, we will provision the cluster. This usually takes less than a minute. When the cluster is created, you will be redirected to the cluster’s overview page, which will list the instance(s) making up the cluster as starting. After a few seconds, the instances will be listed as started, as in the following figure:

Cluster overview

In the overview, you can see your Elasticsearch-endpoints, links to various dashboards and the status of the instances making up your cluster.

Changing a Cluster

There are several ways to change a running cluster:

Upgrade Elasticsearch version, where the two versions are compatible and can communicate with each other. This is the case when upgrading a minor version, e.g. 0.90.2 to 0.90.3.
Major version upgrade of Elasticsearch, where the two versions are not compatible and cannot communicate with each other. This is the case when upgrading from e.g. 0.20.4 to 0.90.2.
Version downgrade. This is not as common, and one has to be careful with the on-disk compatability of the Lucene indexes. For example, Elasticsearch 0.90.3 uses Lucene 4.4, while Elasticsearch 0.90.2 uses Lucene 4.3. Lucene 4.3 cannot read indexes modified by Lucene 4.4.
Capacity upgrade or downgrade, where the amount of reserved memory is changed.
Change available plugins.
Adjust the number of data centers the cluster runs in, i.e. changing the high availability configuration.

Configuring a running cluster

You can change the configuration of a running cluster from the “Configuration”-pane.

With the exception of major version upgrades, we can perform all these changes without having to interrupt your cluster - you can continue searching and indexing. The changes can also be done in bulk: in one action, you can add more memory, upgrade, adjust the number of plugins and adjust the number of availability zones.

We perform all of these type of changes by making the cluster with the new configuration - in its entirety - join the existing cluster. After joining, the new nodes will recover the indexes. When they are done, they will start receiving requests. When all the new nodes are ready, we bring down the old ones.

By doing it this way, we reduce the risk of doing any changes. If the new nodes have any problems, the old ones are still there, processing requests.

Major Version Upgrades

For major version upgrades, we have to bring the cluster to a full stop before upgrading, as the nodes cannot communicate with each other. This is done by flushing all changes so we are sure we can recover them, then we start the cluster with the new version.

While Elasticsearch is working on making upgrades across major versions possible, major version upgrades often include so many changes that upgrades can be risky. This is usually true for any kind of software. Our recommended approach for major version upgrades is to simply make a new cluster with the latest major version, reindex everything and make sure index requests are temporarily sent to both clusters. With the new cluster ready, you can then do a hot swap and send requests to the new cluster. Since you are only billed for the hours a cluster is running, the few extra dollars added to your bill for having an extra cluster running for a while is money well spent. Since the cluster with the version known to work well is already running, you can quickly roll back if the new version has errors.

We make it easy to manage multiple clusters with different versions. We do not force customers to upgrade their clusters. If we need to end-of-life a very old version, you can expect to be notified in due time.

Note: If you use a Platform-as-a-Service-provider like Heroku, the administration console is slightly different and does not allow you to make changes that will affect the price. That must be done in the platform provider’s addon system. You can still do things like change Elasticsearch version or -plugins.

Tearing it Down

The cluster’s configuration pane allows you to delete a cluster:

Deleting a running cluster

Deleting is final and cannot be undone. Billing stops immediately when the cluster has been deleted, rounding up to the nearest hour. This means you can easily start a cluster, run some tests and tear it down again when you are done.

In not too long, we will make our cluster management APIs available. This will enable you to automate the running of e.g. final integration tests before deploying to production, for a few cents. We are also looking into making it possible to pause a small running cluster. This can be quite useful when you only occasionally use a staging cluster.

On High Availability

High availability is achieved by running a cluster with replicas in multiple availability zones, to prevent against downtime when inevitable infrastructure problems occur. Our article on Elasticsearch in Production covers this more extensively.

We offer the options of running in one, two or three availability zones. Running in two AZ’s is our default high availability configuration. It provides reasonably high protection against infrastructure failures and intermittent network problems. You might want three zones if you need even higher protection, or perhaps just one zone if the cluster is mainly used for testing or development.

As mentioned above, this is something that you can change while the cluster is running. For example, when you prepare a new cluster for production use, you can first run it in a single zone, then add another zone right before deploying to production.

While running in multiple zones increases a cluster’s reliability, it does not protect against problematic searches causing nodes to run out of memory, for example. For a cluster to be highly reliable and available, it is also important to have enough memory.

Accessing the Logs

We provide easy access to the logs of all the nodes in your cluster. You can browse and search through logs produced the last 14 days:

Searching the logs

Metadata such as level, logger and instance/zone can be clicked to filter on those types of logs.

Access Control

Note: We strongly advise configuring the access control for your cluster

The default configuration implies that anyone knowing the cluster-ID has full access to your cluster (bear in mind that not all Elasticsearch clients support basic authentication).

We highly recommend using the access control feature to at least require authentication. Authentication uses HTTP Basic-authentication. Most, but not all HTTP- and Elasticsearch-libraries support this.

You can limit access based on path, source IP, method, username/password and whether SSL is used. The access control section of the dashboard has annotated samples to use as templates for your own ACLs.

Modifying access control

Summary and Further Reading

We have demonstrated how to create a cluster, upgrade it in various ways (while indexing and searching!) and tear it down. Furthermore, we have described how to access your cluster’s logs, and limit who can access your cluster.

Even though Found takes care of most tasks when it comes to managing clusters, it’s advisable to have some knowledge about what is important when running an Elasticsearch cluster in production. We provide all user facing features of Elasticsearch without any restrictions, which also means there are many ways to run into problems. The article on Elasticsearch as a NoSQL Database goes through what expectations you can have of Elasticsearch when comparing it to other NoSQL stores. To get the most out of Elasticsearch, it helps to understand some of its internal datastructures as well. Our article series on Elasticsearch from the Bottom up covers exactly that.