Migrating from Amazon Elasticsearch Serviceedit

This is a fairly technical guide for migrating from Amazon Web Services Elasticsearch (AWS ES) to Elasticsearch Service on Elastic Cloud. These steps may require some programming experience. AWS ES clusters are commonly provisioned into a Virtual Private Cloud (VPC), but they can also be located on a public-facing endpoint. In order to keep this guide universal, we describe how to migrate your data in either scenario.

Before you beginedit

Following are a few things to make note of before you proceed with the steps to migrate your AWS Elasticsearch data.

Permissions

It’s important to understand the IAM security steps in this process. First, in order to snapshot an AWS ES cluster into S3, your AWS ES cluster needs permission to write to a private S3 bucket. This requires an IAM role and policy that have the necessary permissions. Next, you’ll need to attach an IAM policy to an IAM user, creating a new user if necessary. You can use that IAM user to connect to your AWS ES cluster, and later your Elastic-managed deployment can use the same credentials to read the snapshot from your S3 bucket.

To learn more about setting up an IAM role, policy, and user, see Working with Amazon Elasticsearch Service Index Snapshots in the AWS documentation.

Tools

During this procedure, if you don’t already have a current snapshot of your ES data, you will run a manual snapshot request on your AWS Elasticsearch cluster. If you can access your ES cluster directly, you can use the Postman client to run the request. If your ES cluster is inside a Virtual Private Cloud (VPC), you can use the Python AWS SDK. Details about each tool are provided in Part 2.

AWS variables

There are several variables that you’ll need to make note of along the way. We suggest that you copy and paste the following table to a notes file, where you can reference it as you proceed through this guide. This will make it easy to fill in the values specific for your migration.

Expand to view the variables table

Table 1. Data migration variables

Description

Variable

Value

AWS ES Domain ARN

DOMAIN_ARN

-

AWS ES Endpoint URL

ES_ENDPOINT

-

AWS ES Region

ES_REGION

-

AWS S3 Bucket Name

S3_BUCKET_NAME

-

AWS S3 Region

S3_REGION_NAME

-

AWS IAM Role ARN

ROLE_ARN

-

AWS IAM Access Key ID

ACCESS_KEY

-

AWS IAM Secret Access Key

SECRET_KEY

-

AWS ES Snapshot Repository

SNAPSHOT_REPO

-

AWS ES Snapshot Name

SNAPSHOT_NAME

-

You can change the values of SNAPSHOT_REPO and SNAPSHOT_NAME or use the values provided in these examples, namely my-snapshot-repo and my-snapshot.

Procedureedit

Migrating from AWS involves three main tasks.

Part 1
Set up an AWS Identity and Access Management (IAM) user with access to an AWS S3 storage bucket.
Part 2
Take a snapshot of your existing Elasticsearch data. If you can’t run commands on your Elasticsearch instance because it’s within a VPC that you can’t access, you’ll need to run a lightweight client on a host within your VPC.
Part 3
Set up a deployment on Elasticsearch Service and restore the snapshot data to the new Elasticsearch cluster.

If you already have your AWS ES cluster manually snapshotted to S3, you can skip ahead to Part 3 of this guide, to create a new deployment in Elasticsearch Service and populate it with data restored from your snapshot.

Part 1 - Set up an IAM user with access to an S3 bucket

Part 2 - Take a snapshot of your Elasticsearch data

  1. If you can access your Elasticsearch cluster directly:

  2. If your Elasticsearch cluster is inside a VPC that you don’t have access to (such as through a VPN):

Part 3 - Restore your snapshot to a new deployment

Part 1 - Set up an IAM user with access to an S3 bucketedit

1. Get your AWS ES details

You will need some basic information about your AWS ES cluster to snapshot it to S3.

  1. In your AWS Console, go to the Elasticsearch Service.
  2. Click on the domain of the cluster you want to snapshot.
  3. Copy the Endpoint URL value to your notes file (ES_ENDPOINT).
  4. Copy the Domain ARN value to your notes file (DOMAIN_ARN).
  5. Note which AWS region (for example, us-east-1) your AWS ES cluster is located in (ES_REGION).

This information will be used further in, first in the IAM policy creation and later to issue commands to the Elasticsearch cluster.

2. Create an AWS S3 bucket

We’ll need an S3 bucket to store the snapshot.

Your S3 bucket must be in the same region as your AWS ES cluster. You will be able to restore from there to an Elastic-managed deployment in any region or cloud provider (AWS, GCP, or Azure).

  1. In your AWS Console, go to the S3 service.
  2. Click Create bucket to create a private S3 bucket.
  3. Choose your privacy and security settings.
  4. Copy the name of the bucket to your notes file (S3_BUCKET_NAME).
  5. Copy the region of the bucket to your notes file (S3_REGION_NAME).

3. Create an IAM role

Next, we’ll create a role to delegate permission to Amazon Elasticsearch Service to take a snapshot into S3.

  1. In your AWS Console, go to the IAM service.
  2. Open the Roles page.
  3. Click Create role.
  4. Select EC2 as the service that will use this new role (we will change it later).
  5. Click Next: Permissions.
  6. Leave the policies on the role empty for now.
  7. Click Next: Tags.
  8. Click Next: Review.
  9. Name the role: TheSnapshotRole.
  10. Click Create role.
  11. From the list of roles, click on the role you just created: TheSnapshotRole.
  12. Open the Trust relationships tab.
  13. Click Edit trust relationship.
  14. Copy and paste the following JSON into the Policy Document field, replacing the sample text:

    {
     "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": {
          "Service": "es.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }]
    }
  15. Click Update Trust Policy.
  16. Open the Permissions tab.
  17. Click Add inline policy.
  18. Open the JSON tab.
  19. Copy and paste the following JSON, replacing the sample text.

    • Replace S3_BUCKET_NAME with the correct value (in two places).

      {
        "Version": "2012-10-17",
        "Statement": [{
            "Action": [
              "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
              "arn:aws:s3:::S3_BUCKET_NAME"
            ]
          },
          {
            "Action": [
              "s3:GetObject",
              "s3:PutObject",
              "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
              "arn:aws:s3:::S3_BUCKET_NAME/*"
            ]
          }
        ]
      }
  20. Click Review policy.
  21. Name the policy: TheSnapshotS3Policy.
  22. Click Create policy.
  23. Copy the Role ARN value to your notes file (ROLE_ARN).

You have created an IAM role with an inline policy that can read & write to your S3 bucket.

4. Create an IAM policy

We need to create a new IAM policy that has permission to assume the IAM role created in the previous step, in order to register the snapshot repository.

  1. In your AWS Console, go to the IAM service.
  2. Open the Policies page.
  3. Click Create policy.
  4. Open the JSON tab.
  5. Copy and paste the following JSON, replacing the sample text.

    • Replace ROLE_ARN with the correct value.
    • Replace DOMAIN_ARN with the correct value.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "ROLE_ARN"
          },
          {
            "Effect": "Allow",
            "Action": "es:ESHttpPut",
            "Resource": "DOMAIN_ARN/*"
          }
        ]
      }
  6. Click Review policy.
  7. Name the policy: TheSnapshotPolicy.
  8. Click Create policy.

You have created an IAM policy that allows the IAM role to talk to your AWS ES domain.

5. Create an IAM user

If you don’t already have an IAM user, we’ll need to create one and give it access to your private S3 bucket. If you do have an IAM user, you can simply attach the following IAM policy to it.

  1. In your AWS Console, go to the IAM service.
  2. Open the Users page.
  3. Click Add user.
  4. Name the user: TheSnapshotUser.
  5. For the access type, select Programmatic access.
  6. Click Next: Permissions.
  7. Click Attach existing policies directly.
  8. Filter the policies by entering TheSnapshot in the search field.
  9. Select the checkbox next to the policy TheSnapshotPolicy.
  10. Click Next: Tags.
  11. Click Next: Review.
  12. Click Create user.
  13. Copy the Access key ID value to your notes file (ACCESS_KEY).
  14. Under Secret access key click Show.
  15. Copy the Secret access key value to your notes file (SECRET_KEY).
  16. Click Close.
  17. From the list, click the user that you created: TheSnapshotUser.
  18. Click Add inline policy.
  19. Open the JSON tab.
  20. Copy and paste the following JSON, replacing the sample text.

    • Replace S3_BUCKET_NAME with the correct value (in 2 places).

      {
        "Version": "2012-10-17",
        "Statement": [{
            "Action": [
              "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
              "arn:aws:s3:::S3_BUCKET_NAME"
            ]
          },
          {
            "Action": [
              "s3:GetObject",
              "s3:PutObject",
              "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
              "arn:aws:s3:::S3_BUCKET_NAME/*"
            ]
          }
        ]
      }

      Click Review policy. Name the policy: TheSnapshotUserS3Policy. Click Create policy.

Your AWS S3 bucket is set up, along with an IAM role, policy, and user to access it. The next steps are to take a snapshot of your current ES data.

Part 2 - Take a snapshot of your Elasticsearch dataedit

In this section we’ll take a snapshot to record the latest state of your AWS Elasticsearch indices.

Choose the instructions to use depending on your AWS Elasticsearch configuration. If your Elasticsearch cluster is not in a VPC and can be accessed directly, follow steps 1a and 1b. If your Elasticsearch cluster is in a VPC that you cannot access directly, follow steps 2a and 2b

1a. Register a snapshot repository using Postman

Before running a manual snapshot, you need to register a snapshot repository with your deployment. This requires sending a signed request to your AWS ES domain.

If your Elasticsearch cluster is not in a VPC and can be accessed directly, you can execute a snapshot request manually by calling the Elasticsearch snapshot API. Postman is a great tool for managing and running API requests. We will use it here to simplify the signing of our AWS API requests.

  1. Create a new Postman request.
  2. Under the Authorization tab, in the TYPE drop-down box, select AWS Signature.
  3. Enter your ACCESS_KEY, SECRET_KEY, AWS_REGION, and es as the Service Name. Leave the Session Token field blank.
  4. Under the Body tab, select raw and set the format to JSON. Add the following payload, replacing the values for S3_REGION_NAME, S3_BUCKET_NAME, ROLE_ARN, ES_ENDPOINT, and SNAPSHOT_REPO:

    {
            "type": "s3",
            "settings": {
                    "region": S3_REGION_NAME,
                    "bucket": S3_BUCKET_NAME,
                    "role_arn": ROLE_ARN
            }
    }
  5. Set the request type to PUT and enter:

    https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO

    where:

    • ES_ENDPOINT is the Elasticsearch endpoint URL
    • SNAPSHOT_REPO is a name of your choosing for the new repository.
  6. Click Send.

The snapshot repo for your S3 bucket will now be created.

1b. Take a snapshot of your data

We will take a snapshot of the your current ES data and store it in the newly registered repository.

  1. Using Postman, create a new request.
  2. Under the Authorization tab, in the TYPE drop-down box, select AWS Signature.
  3. Enter your ACCESS_KEY, SECRET_KEY, AWS_REGION, and es as the Service Name. Leave the Session Token field blank.
  4. Set the request type to PUT and enter:

    https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME

    where:

    • ES_ENDPOINT is the Elasticsearch endpoint URL.
    • SNAPSHOT_REPO is a name of the repository that you registered.
    • SNAPSHOT_NAME is the name of the snapshot to create. The actual snapshot name must be lower-case.

The time required to take a snapshot depends on the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a 504 GATEWAY_TIMEOUT. That documentation suggests that you can ignore this error and just wait for the snapshot to complete successfully.

You can check the status of your snapshot by calling:

GET https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME?pretty

Once you’ve taken a snapshot successfully, you can skip ahead to Part 3.

2a. Configure the Python AWS SDK

Before running a manual snapshot, you need to register a snapshot repository with your deployment. This requires sending a signed request to your AWS ES cluster.

If your Elasticsearch cluster is in a VPC that you cannot access directly, you will need access to a host, such as EC2, that is within your VPC and that can execute the scripts that follow. In these steps and examples we use the Python AWS SDK, but you can use any language that has an AWS SDK (for example, Java, Ruby, Go, or others).

We’ll install the Python AWS SDK using Python’s package installer PIP (pip3). This requires Python version 3 to be installed. If you don’t have Python version 3 installed, you can get it by just installing pip3. Your operating system’s package manager will install Python version 3 automatically, since it’s a dependency to pip3. If you get stuck, refer to the Python installation documentation.

Install pip3

To install pip3 on Red Hat and derivatives, use yum:

$ sudo yum -y install python3-pip

Alternatively, some Fedora distributions label the pip3 package differently:

$ sudo yum -y install python36-pip

In case neither of the previous Python package install commands work, you can search for the correct package name:

yum search pip

On Debian derivatives such as Ubuntu, use apt-get:

sudo apt-get -y install python3-pip

Install the Python AWS SDK

Once pip3 is installed, you can install the Python AWS SDK, named boto3:

$ pip3 install --user boto3 requests_aws4auth
Collecting boto3
...
Successfully installed boto3-1.9.106 requests-aws4auth-0.9 ...

Note that root access is not needed if you specify the --user flag.

Create an ~/.aws directory to hold your AWS credentials. Run the following command to create the directory:

$ mkdir ~/.aws

Create a file called credentials with your favorite editor. We’ll use nano for simplicity:

$ nano ~/.aws/credentials

Copy and paste the following contents into the file, replacing ACCESS_KEY and SECRET_KEY with the actual values:

[default]
aws_access_key_id = ACCESS_KEY
aws_secret_access_key = SECRET_KEY

Type Control + X to exit nano, and follow the prompts to save the file.

In the next steps, we’ll write a few Python scripts to perform the tasks we need.

2b. Manually snapshot AWS ES

Let’s run a quick test using a Python script to list the indices in our AWS ES cluster. This will ensure that our AWS credentials are working and prove that we can access the cluster.

Create a file called indices.py with your favorite editor. We’ll use nano for simplicity:

$ nano indices.py

Copy and paste the following contents, replacing ES_ENDPOINT and ES_REGION with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Listing Indices from AWS ES ...")
req = requests.get(host + '/_cat/indices?v', auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Type Control + X to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 indices.py

Your output should look similar to the following:

Listing Indices from AWS ES ...
HTTP Response Code: 200
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   testindex yME2BphgR3Gt1ln6n03nHQ   5   1          1            0      4.4kb          4.4kb

Now create a file called register.py with your favorite editor.

$ nano register.py

Copy and paste the following contents, replacing ES_ENDPOINT, ES_REGION, SNAPSHOT_REPO, SNAPSHOT_NAME, S3_REGION_NAME, S3_BUCKET_NAME, and ROLE_ARN with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
s3_region_name = 'S3_REGION_NAME'
s3_bucket_name = 'S3_BUCKET_NAME'
role_arn = 'ROLE_ARN'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
headers = {"Content-Type": "application/json"}
payload = {
        "type": "s3",
        "settings": {
                "region": s3_region_name,
                "bucket": s3_bucket_name,
                "role_arn": role_arn
        }
}
print("Registering Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name
req = requests.put(url, auth=auth, json=payload, headers=headers)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Type Control + X to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 register.py

Your output should look similar to the following:

Registering Snapshot with AWS ES ...
HTTP Response Code: 200
{"acknowledged":true}

Next, create a file called snapshot.py with your favorite editor.

$ nano snapshot.py

Copy and paste the following contents, replacing ES_ENDPOINT, ES_REGION, SNAPSHOT_REPO, and SNAPSHOT_NAME with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Starting Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name + '/' + snapshot_name
req = requests.put(url, auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Type Control + X to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 snapshot.py

Your output should look similar to the following:

Starting Snapshot with AWS ES ...
HTTP Response Code: 200
{"accepted":true}

The time required to take a snapshot depends on the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a 504 GATEWAY_TIMEOUT. That documentation suggests that you can ignore this error and just wait for the snapshot to complete successfully.

Finally, let’s check the status of our snapshot. Create a file called status.py.

$ nano status.py

Copy and paste the following contents, replacing ES_ENDPOINT, ES_REGION, SNAPSHOT_REPO, and SNAPSHOT_NAME with your values:

import boto3, requests
from requests_aws4auth import AWS4Auth
host = 'ES_ENDPOINT'
region = 'ES_REGION'
repo_name = 'SNAPSHOT_REPO'
snapshot_name = 'SNAPSHOT_NAME'
creds = boto3.Session().get_credentials()
auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token)
print("Getting Status of Snapshot with AWS ES ...")
url = host + '/_snapshot/' + repo_name + '/' + snapshot_name + '?pretty'
req = requests.get(url, auth=auth)
print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)

Type Control + X to exit nano, and follow the prompts to save the file.

Run the Python script.

$ python3 status.py

Your output should look similar to the following:

Getting Status of Snapshot with AWS ES ...
HTTP Response Code: 200
{
  "snapshots" : [ {
    "snapshot" : "my-snapshot",
    "uuid" : "ClYKt5g8QFO6r3kTCEzjqw",
    "version_id" : 6040299,
    "version" : "6.4.2",
    "indices" : [ "testindex" ],
    "include_global_state" : true,
    "state" : "SUCCESS",
    "start_time" : "2019-03-03T14:46:04.094Z",
    "start_time_in_millis" : 1551624364094,
    "end_time" : "2019-03-03T14:46:04.847Z",
    "end_time_in_millis" : 1551624364847,
    "duration_in_millis" : 753,
    "failures" : [ ],
    "shards" : {
      "total" : 5,
      "failed" : 0,
      "successful" : 5
    }
  } ]
}

If you see "state":"SUCCESS" then you have successfully taken a snapshot to S3 and are ready for Part 3!

Part 3 - Restore your snapshot to a new deploymentedit

1. Create a deployment in Elasticsearch Service Navigate to Elastic Cloud and register for an account to gain access to the 14-day free trial. Once you’ve logged in, follow the instructions to create a deployment in AWS, Google Cloud, or Microsoft Azure.

For detailed instructions and descriptions of all of the options, see Create your deployment.

2. Add your secrets to the keystore

Once your deployment is ready, store your ACCESS_KEY and SECRET_KEY in the Keystore.

  1. Navigate to the Security page of your new deployment.
  2. Click Create settings.
  3. With Type set to Single string, add the following keys and their values:

    • s3.client.default.access_key
    • s3.client.default.secret_key

3. Register your snapshot repository in your new deployment

To follow this step your deployment must be at Elastic Stack version 7.2 or higher. If you are using an earlier deployment version, see our more detailed instructions for configuring a snapshot repository using AWS.

  1. In your same deployment in Elasticsearch Service, open Kibana and go to Management > Snapshot and Restore.
  2. On the Repositories tab, click Register a repository.
  3. Provide a name for for your repository and select type AWS S3.
  4. Provide the following settings:

    • Client: default
    • Bucket: YOUR_S3_BUCKET_NAME
  5. Add any other settings that you wish to configure.
  6. Click Register.
  7. Click Verify to confirm that your settings are correct and the deployment can connect to your repository.

4. Restore from your new snapshot repository

Still on the Snapshot and Restore page in Kibana:

  1. Click the Snapshots tab.
  2. Search for the snapshot that you created earlier.
  3. Click Restore.
  4. Select the indices you wish to restore.
  5. Configure any other relevant settings.
  6. Click Restore snapshot to begin the process.

The time required to restore from a snapshot varies based on the size of your data.

5. Explore Elasticsearch Service

Now that you are up and running with your own data, explore the power of the latest version of the Elastic Stack by trying: SIEM, Lens, Machine Learning, APM, Maps, Index Lifecycle Management, Snapshot Lifecycle Management, Logs, Metrics, Monitoring, Canvas, Uptime, and more!