2016年06月22日 エンジニアリング

Running Elasticsearch on AWS

By Kosho OwaDara Gies

Part I - Provisioning EC2 Instances

We often talk to customers running Elasticsearch clusters on Amazon Web Services (AWS). AWS is a convenient way to provision and scale machine resources in response to changing business requirements. Elasticsearch takes advantage of EC2's on-demand machine architecture enabling the addition and removal of EC2 instances and corresponding Elasticsearch nodes as capacity and performance requirements change.

In this article we will show you how to deploy Elasticsearch 2.3.3 on Amazon EC2. In this example we will configure a three node Elasticsearch cluster.

Step 1: Choose an Amazon Machine Image (AMI)

Elasticsearch runs on various operating systems such as CentOS, Redhat, Ubuntu, and Amazon Linux. We suggest using the latest Amazon Linux AMI — "Amazon Linux AMI 2016.03.0 (HVM), SSD Volume Type".

Choose an Amazon Machine Image (AMI).png

Step 2: Choose an Instance Type

A reasonable starting instance type is m3.2xlarge which provides 8 vCPUs, 30 GiB of memory, 2 x 80 GB SSD drives and comes with High Network Performance. Solid State Drives are preferred as indexing is IO intensive and High Network Performance is essential for cluster performance and reliability.

M3.2xlarge is a baseline recommendation. To determine whether it is an appropriate choice, you should benchmark your solution to determine whether it meets performance and scaling requirements.

Click the "Next: Configure Instance Details" button.

Choose an Instance Type.png

Step 3: Configure Instance Details

Each Elasticsearch node will run on its own dedicated EC2 instance, so set the number of instances to 3.

Note that any AWS accounts that have been created after December 4, 2013 only support EC2-VPC, so the "Network" option for picking "Launch into EC2-Classic" won't be available for those users and should not be enabled anyway.

Selecting "Enable termination protection" is a good idea as it prevents accidental deletion of nodes and their data.

Leave the default values for remaining fields and click the "Next: Add Storage" button.

Configure Instance Details.png

Step 4: Add Storage

Let's leave the storage Size at 8 GiB. If you happen to know your index storage requirements at this time, you can adjust the storage now. Leave the Volume Type set to General Purpose SSD.

Click the "Next: Tag Instance" button.

Add Storage.png

Step 5: Tag Instance

In this field, provide a key and value pair, for example "name" and "esonaws", to make it easy to recall the ec2 instances.

Click the "Next: Configure Security Group" button.

Step 5 Tag Instance.png

Step 6: Configure Security Group

This configuration panel allows you to configure a set of firewall rules for accessing your instance. By default, Elasticsearch exposes TCP port 9200 for REST API access and TCP port 9300 for internal cluster communication. Consider adding rules to allow connecting to TCP port 9200 from desired subnets, typically private subnets, and TCP port 9300 from the subnets where Elasticsearch nodes live. If you plan to change the default port settings in elasticsearch.yml, configure rules for those ports rather than TCP ports 9200 and 9300.

Also, add a rule to allow SSH connections on port 22, so you can connect to the instance in the later steps.

Click The "Review and Launch" button.

Configure Security Group.png

Step 7: Review Instance Launch

Note any warnings and review the Instance Launch settings and click the "Launch" button when ready.

Review Instance Launch.png

At this point you will be prompted to provide a key pair or create a new key pair.

Select Key Pair.png

This is necessary to enable SSH access to the EC2 instance. If you need help setting up a key pair the Amazon EC2 Key Pairs article provides an overview and instructions for creating new new key pairs.

Start up the EC2 instances and take note of the assigned private IP addresses which we will use in a following step.


Part II - Installing Elasticsearch

RPM

Log into each EC2 instance via SSH.

$ ssh -v -i /pathto/[certfilename].pem ec2-user@[ec2hostname]

Then install the Elasticsearch RPM package on each EC2 instance as instructed below.

$ sudo rpm -i https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm

Other versions of Elasticsearch are available here. Refer to the guide if you prefer installing with yum.

Register Elasticsearch as a system service.

$ sudo chkconfig --add elasticsearch

Install Plugins

You need to install AWS cloud plugin on each EC2 instance in the cluster.

$ cd /usr/share/elasticsearch/ 
$ sudo bin/plugin install cloud-aws

If there are any additional plugins you need, such as Marvel, for monitoring, or ICU, for additional language support, now is a good time to install them.

Configure Elasticsearch

The maximum JVM heap size should be based upon the machine's memory. Open "/etc/sysconfig/elasticsearch" on each EC2 instance with your favorite editor and set the "ES_HEAP_SIZE" and "MAX_LOCKED_MEMORY" parameters. The following configuration will fit a m3.2xlarge instance. The "ES_HEAP_SIZE" is recommended to be half of the memory but not more than 32GB.

ES_HEAP_SIZE=15g 
MAX_LOCKED_MEMORY=unlimited

Open "/etc/elasticsearch/elasticsearch.yml" on every machine and edit the following settings.

cluster.name:esonaws 
bootstrap.mlockall: true 
discovery.zen.ping.unicast.hosts: [_ip_address_,…] 
network.host: [_ip_address_]

"discovery.zen.ping.unicast.hosts" is a list of EC2 instance private IP addresses. All the master-eligible nodes must be listed. In a small cluster all nodes can be configured as both master nodes and data nodes. "network.host" is the EC2 instance private IP address of this host that is shared with the other nodes in the cluster.

The IP address is not required for a single node cluster. "_site_" and "_local_" represent the private address and the local loopback address "127.0.0.1" and allow access to those from remote.

Starting Up and Verification

If you are setting up multiple Elasticsearch nodes, they must all be the same version, same plugins and equivalent configurations. Start up Elasticsearch on each EC2 instance.

$ sudo service elasticsearch start

Once started, let's verify the Elasticsearch cluster by using curl to request the cluster state.

$ curl localhost:9200/_cluster/health?pretty

If successful, the "status" will be "green" (or "yellow" for a single node). The "number_of_nodes" should be the same number of nodes started. Depending on your index settings, you will need a minimum of two nodes for the cluster "status" to turn "green". A minimum of 3 nodes is recommended to avoid leader election conflicts. If "status" isn't "green" you can also check the Security Group configuration or check logs under "/var/log/elasticsearch" for errors.

{ 
  “cluster_name” : “esonaws”, 
  “status” : “green”, 
  “timed_out” : false, 
  “number_of_nodes” : 3, 
  “number_of_data_nodes” : 3, 
  “active_primary_shards” : 8, 
  “active_shards” : 16, 
  “relocating_shards” : 0, 
  “initializing_shards” : 0, 
  “unassigned_shards” : 0, 
  “delayed_unassigned_shards” : 0, 
  “number_of_pending_tasks” : 0, 
  “number_of_in_flight_fetch” : 0, 
  “task_max_waiting_in_queue_millis” : 0, 
  “active_shards_percent_as_number” : 100.0 
}

Your Elasticsearch cluster is ready!

Summary

Deploying an Elasticsearch cluster on Amazon EC2 is relatively easy, but it does require a number of configuration steps, familiarity with SSH, key pair management, and also assumes that you will be managing the machines.

If you prefer the ease-of-use of a managed service, Elastic Cloud, Elastic's official hosted Elasticsearch and Kibana offering on AWS, is a great choice. You can spin up a cluster in just a few clicks. Elastic Cloud also comes with Security, Kibana, supported Plugins, on-demand cluster scaling, automatic version backup and more.

elastic cloud is easy vi.gif

The Elastic Cloud trial is free and doesn't require a credit card. Here's a link to a short video that describes Elastic Cloud in a little more detail.