Migrate from Amazon OpenSearch Service (formerly Amazon Elasticsearch Service)edit
This is a fairly technical guide for migrating from Amazon OpenSearch Service (formerly referred to as Amazon Elasticsearch Service, Amazon ES, or AWS ES) to Elasticsearch Service on Elastic Cloud. These steps may require some programming experience. Amazon OpenSearch Service clusters are commonly provisioned into a Virtual Private Cloud (VPC) with a private IP address, but they can also be located on a public-facing endpoint. In order to keep this guide universal, we describe how to migrate your data in either scenario. Please note, these steps assume that your Amazon OpenSearch Service domain runs an Elasticsearch OSS version.
Before you beginedit
Following are a few things to make note of before you proceed with the steps to migrate your Amazon OpenSearch Service data.
It’s important to understand the IAM security steps in this process. First, in order to snapshot an AWS ES cluster into S3, your AWS ES cluster needs permission to write to a private S3 bucket. This requires an IAM role and policy that have the necessary permissions. Next, you’ll need to attach an IAM policy to an IAM user, creating a new user if necessary. You can use that IAM user to connect to your AWS ES cluster, and later your Elastic-managed deployment can use the same credentials to read the snapshot from your S3 bucket.
To learn more about setting up an IAM role, policy, and user, check Creating index snapshots in Amazon OpenSearch Service in the AWS documentation.
During this procedure, if you don’t already have a current snapshot of your ES data, you will run a manual snapshot request on your AWS Elasticsearch cluster. If you can access your ES cluster directly, you can use the Postman client to run the request. If your ES cluster is inside a Virtual Private Cloud (VPC), you can use the Python AWS SDK. Details about each tool are provided in Part 2.
There are several variables that you’ll need to make note of along the way. We suggest that you copy and paste the following table to a notes file, where you can reference it as you proceed through this guide. This will make it easy to fill in the values specific for your migration.
Expand to view the variables table
Table 1. Data migration variables
Description |
Variable |
Value |
AWS ES Domain ARN |
DOMAIN_ARN |
- |
AWS ES Endpoint URL |
ES_ENDPOINT |
- |
AWS ES Region |
ES_REGION |
- |
AWS S3 Bucket Name |
S3_BUCKET_NAME |
- |
AWS S3 Region |
S3_REGION_NAME |
- |
AWS IAM Role ARN |
ROLE_ARN |
- |
AWS IAM Access Key ID |
ACCESS_KEY |
- |
AWS IAM Secret Access Key |
SECRET_KEY |
- |
AWS ES Snapshot Repository |
SNAPSHOT_REPO |
- |
AWS ES Snapshot Name |
SNAPSHOT_NAME |
- |
You can change the values of SNAPSHOT_REPO
and SNAPSHOT_NAME
or use the values provided in these examples, namely my-snapshot-repo
and my-snapshot
.
Procedureedit
Migrating from AWS involves three main tasks.
- Part 1
- Set up an AWS Identity and Access Management (IAM) user with access to an AWS S3 storage bucket.
- Part 2
- Take a snapshot of your existing Elasticsearch data. If you can’t run commands on your Elasticsearch instance because it’s within a VPC that you can’t access, you’ll need to run a lightweight client on a host within your VPC.
- Part 3
- Set up a deployment on Elasticsearch Service and restore the snapshot data to the new Elasticsearch cluster.
If you already have your Amazon OpenSearch Service cluster manually snapshotted to S3, you can skip ahead to Part 3 of this guide, to create a new deployment in Elasticsearch Service and populate it with data restored from your snapshot.
Part 1 - Set up an IAM user with access to an S3 bucket
Part 2 - Take a snapshot of your Elasticsearch data
-
If you can access your Elasticsearch cluster directly:
-
If your Elasticsearch cluster is inside a VPC that you don’t have access to (such as through a VPN):
Part 3 - Restore your snapshot to a new deployment
Part 1 - Set up an IAM user with access to an S3 bucketedit
You will need some basic information about your Amazon OpenSearch Service cluster to snapshot it to S3.
- In your AWS Console, go to the Amazon OpenSearch Service.
- Select on the domain of the cluster you want to snapshot.
-
Copy the
Endpoint
URL value to your notes file (ES_ENDPOINT). -
Copy the
Domain ARN
value to your notes file (DOMAIN_ARN). - Note which AWS region (for example, us-east-1) your AWS ES cluster is located in (ES_REGION).
This information will be used further in, first in the IAM policy creation and later to issue commands to the cluster.
We’ll need an S3 bucket to store the snapshot.
Your S3 bucket must be in the same region as your Amazon OpenSearch Service cluster. You will be able to restore from there to an Elastic-managed deployment in any region or cloud provider (AWS, GCP, or Azure).
- In your AWS Console, go to the S3 service.
- Select Create bucket to create a private S3 bucket.
- Choose your privacy and security settings.
- Copy the name of the bucket to your notes file (S3_BUCKET_NAME).
- Copy the region of the bucket to your notes file (S3_REGION_NAME).
Next, we’ll create a role to delegate permission to Amazon OpenSearch Service to take a snapshot into S3.
- In your AWS Console, go to the IAM service.
- Open the Roles page.
- Select Create role.
- Select EC2 as the service that will use this new role (we will change it later).
- Select Next: Permissions.
- Leave the policies on the role empty for now.
- Select Next: Tags.
- Select Next: Review.
-
Name the role:
TheSnapshotRole
. - Select Create role.
-
From the list of roles, choose the role you just created:
TheSnapshotRole
. - Open the Trust relationships tab.
- Select Edit trust relationship.
-
Copy and paste the following JSON into the Policy Document field, replacing the sample text:
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "es.amazonaws.com" }, "Action": "sts:AssumeRole" }] }
- Select Update Trust Policy.
- Open the Permissions tab.
- Select Add inline policy.
- Open the JSON tab.
-
Copy and paste the following JSON, replacing the sample text.
-
Replace
S3_BUCKET_NAME
with the correct value (in two places).{ "Version": "2012-10-17", "Statement": [{ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME" ] }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME/*" ] } ] }
-
- Select Review policy.
-
Name the policy:
TheSnapshotS3Policy
. - Select Create policy.
-
Copy the
Role ARN
value to your notes file (ROLE_ARN).
You have created an IAM role with an inline policy that can read & write to your S3 bucket.
We need to create a new IAM policy that has permission to assume the IAM role created in the previous step, in order to register the snapshot repository.
- In your AWS Console, go to the IAM service.
- Open the Policies page.
- Select Create policy.
- Open the JSON tab.
-
Copy and paste the following JSON, replacing the sample text.
-
Replace
ROLE_ARN
with the correct value. -
Replace
DOMAIN_ARN
with the correct value.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "ROLE_ARN" }, { "Effect": "Allow", "Action": "es:ESHttpPut", "Resource": "DOMAIN_ARN/*" } ] }
-
Replace
- Select Review policy.
-
Name the policy:
TheSnapshotPolicy
. - Select Create policy.
You have created an IAM policy that allows the IAM role to talk to your Amazon OpenSearch Service domain.
If you don’t already have an IAM user, we’ll need to create one and give it access to your private S3 bucket. If you do have an IAM user, you can simply attach the following IAM policy to it.
- In your AWS Console, go to the IAM service.
- Open the Users page.
- Select Add user.
-
Name the user:
TheSnapshotUser
. - For the access type, select Programmatic access.
- Select Next: Permissions.
- Select Attach existing policies directly.
-
Filter the policies by entering
TheSnapshot
in the search field. -
Select the checkbox next to the policy
TheSnapshotPolicy
. - Select Next: Tags.
- Select Next: Review.
- Select Create user.
- Copy the Access key ID value to your notes file (ACCESS_KEY).
- Under Secret access key, select Show.
-
Copy the
Secret access key
value to your notes file (SECRET_KEY). - Select Close.
-
From the list, choose the user that you created:
TheSnapshotUser
. - Select Add inline policy.
- Open the JSON tab.
-
Copy and paste the following JSON, replacing the sample text.
-
Replace
S3_BUCKET_NAME
with the correct value (in 2 places).{ "Version": "2012-10-17", "Statement": [{ "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME" ] }, { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::S3_BUCKET_NAME/*" ] } ] }
Select Review policy. Name the policy:
TheSnapshotUserS3Policy
. Select Create policy.
-
Your AWS S3 bucket is set up, along with an IAM role, policy, and user to access it. The next steps are to take a snapshot of your current ES data.
Part 2 - Take a snapshot of your Elasticsearch dataedit
In this section we’ll take a snapshot to record the latest state of your Amazon OpenSearch Service indices.
1a. Register a snapshot repository using Postman
Before running a manual snapshot, you need to register a snapshot repository with your deployment. This requires sending a signed request to your AWS ES domain.
If your Amazon OpenSearch Service cluster can be accessed directly, you can execute a snapshot request manually by calling the Elasticsearch snapshot API. Postman is a great tool for managing and running API requests. We will use it here to simplify the signing of our AWS API requests.
- Create a new Postman request.
- Under the Authorization tab, in the TYPE drop-down box, select AWS Signature.
-
Enter your
ACCESS_KEY
,SECRET_KEY
,AWS_REGION
, andes
as the Service Name. Leave the Session Token field blank. -
Under the Body tab, select raw and set the format to JSON. Add the following payload, replacing the values for
S3_REGION_NAME
,S3_BUCKET_NAME
,ROLE_ARN
,ES_ENDPOINT
, andSNAPSHOT_REPO
:{ "type": "s3", "settings": { "region": S3_REGION_NAME, "bucket": S3_BUCKET_NAME, "role_arn": ROLE_ARN } }
-
Set the request type to
PUT
and enter:https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO
where:
-
ES_ENDPOINT
is the Elasticsearch endpoint URL -
SNAPSHOT_REPO
is a name of your choosing for the new repository.
-
- Select Send.
The snapshot repo for your S3 bucket will now be created.
1b. Take a snapshot of your data
We will take a snapshot of the your current ES data and store it in the newly registered repository.
- Using Postman, create a new request.
- Under the Authorization tab, in the TYPE drop-down box, select AWS Signature.
-
Enter your
ACCESS_KEY
,SECRET_KEY
,AWS_REGION
, andes
as the Service Name. Leave the Session Token field blank. -
Set the request type to
PUT
and enter:https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME
where:
-
ES_ENDPOINT
is the Elasticsearch endpoint URL. -
SNAPSHOT_REPO
is a name of the repository that you registered. -
SNAPSHOT_NAME
is the name of the snapshot to create. The actual snapshot name must be lower-case.
-
The time required to take a snapshot depends on the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a 504 GATEWAY_TIMEOUT
. That documentation suggests that you can ignore this error and just wait for the snapshot to complete successfully.
You can check the status of your snapshot by calling:
GET https://ES_ENDPOINT/_snapshot/SNAPSHOT_REPO/SNAPSHOT_NAME?pretty
Once you’ve taken a snapshot successfully, you can skip ahead to Part 3.
2a. Configure the Python AWS SDK
Before running a manual snapshot, you need to register a snapshot repository with your deployment. This requires sending a signed request to your Amazon OpenSearch Service cluster.
If your Elasticsearch cluster is in a VPC that you cannot access directly, you will need access to a host, such as EC2, that is within your VPC and that can execute the scripts that follow. In these steps and examples we use the Python AWS SDK, but you can use any language that has an AWS SDK (for example, Java, Ruby, Go, or others).
We’ll install the Python AWS SDK using Python’s package installer PIP (pip3). This requires Python version 3 to be installed. If you don’t have Python version 3 installed, you can get it by just installing pip3. Your operating system’s package manager will install Python version 3 automatically, since it’s a dependency to pip3. If you get stuck, refer to the Python installation documentation.
Install pip3
To install pip3
on Red Hat and derivatives, use yum
:
$ sudo yum -y install python3-pip
Alternatively, some Fedora distributions label the pip3
package differently:
$ sudo yum -y install python36-pip
In case neither of the previous Python package install commands work, you can search for the correct package name:
yum search pip
On Debian derivatives such as Ubuntu, use apt-get
:
sudo apt-get -y install python3-pip
Install the Python AWS SDK
Once pip3
is installed, you can install the Python AWS SDK, named boto3
:
$ pip3 install --user boto3 requests_aws4auth Collecting boto3 ... Successfully installed boto3-1.9.106 requests-aws4auth-0.9 ...
Note that root
access is not needed if you specify the --user
flag.
Create an ~/.aws
directory to hold your AWS credentials. Run the following command to create the directory:
$ mkdir ~/.aws
Create a file called credentials
with your favorite editor. We’ll use nano
for simplicity:
$ nano ~/.aws/credentials
Copy and paste the following contents into the file, replacing ACCESS_KEY
and SECRET_KEY
with the actual values:
[default] aws_access_key_id = ACCESS_KEY aws_secret_access_key = SECRET_KEY
Type Control
+ X
to exit nano, and follow the prompts to save the file.
In the next steps, we’ll write a few Python scripts to perform the tasks we need.
Let’s run a quick test using a Python script to list the indices in our AWS ES cluster. This will ensure that our AWS credentials are working and prove that we can access the cluster.
Create a file called indices.py
with your favorite editor. We’ll use nano
for simplicity:
$ nano indices.py
Copy and paste the following contents, replacing ES_ENDPOINT
and ES_REGION
with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Listing Indices from AWS ES ...") req = requests.get(host + '/_cat/indices?v', auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Type Control
+ X
to exit nano, and follow the prompts to save the file.
Run the Python script.
$ python3 indices.py
Your output should look similar to the following:
Listing Indices from AWS ES ... HTTP Response Code: 200 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open testindex yME2BphgR3Gt1ln6n03nHQ 5 1 1 0 4.4kb 4.4kb
Now create a file called register.py
with your favorite editor.
$ nano register.py
Copy and paste the following contents, replacing ES_ENDPOINT
, ES_REGION
, SNAPSHOT_REPO
, SNAPSHOT_NAME
, S3_REGION_NAME
, S3_BUCKET_NAME
, and ROLE_ARN
with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' s3_region_name = 'S3_REGION_NAME' s3_bucket_name = 'S3_BUCKET_NAME' role_arn = 'ROLE_ARN' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) headers = {"Content-Type": "application/json"} payload = { "type": "s3", "settings": { "region": s3_region_name, "bucket": s3_bucket_name, "role_arn": role_arn } } print("Registering Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name req = requests.put(url, auth=auth, json=payload, headers=headers) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Type Control
+ X
to exit nano, and follow the prompts to save the file.
Run the Python script.
$ python3 register.py
Your output should look similar to the following:
Registering Snapshot with AWS ES ... HTTP Response Code: 200 {"acknowledged":true}
Next, create a file called snapshot.py
with your favorite editor.
$ nano snapshot.py
Copy and paste the following contents, replacing ES_ENDPOINT
, ES_REGION
, SNAPSHOT_REPO
, and SNAPSHOT_NAME
with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Starting Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name + '/' + snapshot_name req = requests.put(url, auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Type Control
+ X
to exit nano, and follow the prompts to save the file.
Run the Python script.
$ python3 snapshot.py
Your output should look similar to the following:
Starting Snapshot with AWS ES ... HTTP Response Code: 200 {"accepted":true}
The time required to take a snapshot depends on the size of the AWS ES domain. According to AWS documentation, long-running snapshot operations sometimes show a 504 GATEWAY_TIMEOUT
. That documentation suggests that you can ignore this error and just wait for the snapshot to complete successfully.
Finally, let’s check the status of our snapshot. Create a file called status.py
.
$ nano status.py
Copy and paste the following contents, replacing ES_ENDPOINT
, ES_REGION
, SNAPSHOT_REPO
, and SNAPSHOT_NAME
with your values:
import boto3, requests from requests_aws4auth import AWS4Auth host = 'ES_ENDPOINT' region = 'ES_REGION' repo_name = 'SNAPSHOT_REPO' snapshot_name = 'SNAPSHOT_NAME' creds = boto3.Session().get_credentials() auth = AWS4Auth(creds.access_key, creds.secret_key, region, 'es', session_token=creds.token) print("Getting Status of Snapshot with AWS ES ...") url = host + '/_snapshot/' + repo_name + '/' + snapshot_name + '?pretty' req = requests.get(url, auth=auth) print("HTTP Response Code: " + str(req.status_code) + '\n' + req.text)
Type Control
+ X
to exit nano, and follow the prompts to save the file.
Run the Python script.
$ python3 status.py
Your output should look similar to the following:
Getting Status of Snapshot with AWS ES ... HTTP Response Code: 200 { "snapshots" : [ { "snapshot" : "my-snapshot", "uuid" : "ClYKt5g8QFO6r3kTCEzjqw", "version_id" : 6040299, "version" : "6.4.2", "indices" : [ "testindex" ], "include_global_state" : true, "state" : "SUCCESS", "start_time" : "2019-03-03T14:46:04.094Z", "start_time_in_millis" : 1551624364094, "end_time" : "2019-03-03T14:46:04.847Z", "end_time_in_millis" : 1551624364847, "duration_in_millis" : 753, "failures" : [ ], "shards" : { "total" : 5, "failed" : 0, "successful" : 5 } } ] }
If you get "state":"SUCCESS"
then you have successfully taken a snapshot to S3 and are ready for Part 3!
Part 3 - Restore your snapshot to a new deploymentedit
1. Create a deployment in Elasticsearch Service Navigate to Elastic Cloud and register for an account to gain access to the 14-day free trial. Once you’ve logged in, follow the instructions to create a deployment in AWS, Google Cloud, or Microsoft Azure.
For detailed instructions and descriptions of all of the options, check Create your deployment.
2. Add your secrets to the keystore
Once your deployment is ready, store your ACCESS_KEY
and SECRET_KEY
in the Keystore.
- Navigate to the Security page of your new deployment.
- Locate Elasticsearch keystore and select Add settings.
-
With Type set to Single string, add the following keys and their values:
-
s3.client.default.access_key
-
s3.client.default.secret_key
-
3. Register your snapshot repository in your new deployment
To follow this step your deployment must be at Elastic Stack version 7.2 or higher. If you are using an earlier deployment version, check our more detailed instructions for configuring a snapshot repository using AWS.
- In your same deployment in Elasticsearch Service, open Kibana and go to Management > Snapshot and Restore.
- On the Repositories tab, select Register a repository.
- Provide a name for for your repository and select type AWS S3.
-
Provide the following settings:
-
Client:
default
-
Bucket:
YOUR_S3_BUCKET_NAME
-
Client:
- Add any other settings that you wish to configure.
- Select Register.
- Select Verify to confirm that your settings are correct and the deployment can connect to your repository.
4. Restore from your new snapshot repository
Still on the Snapshot and Restore page in Kibana:
- Open the Snapshots tab.
- Search for the snapshot that you created earlier.
- Select Restore.
- Select the indices you wish to restore.
- Configure any other relevant settings.
- Select Restore snapshot to begin the process.
The time required to restore from a snapshot varies based on the size of your data.
5. Explore Elasticsearch Service
Now that you are up and running with your own data, explore the power of the latest version of the Elastic Stack by trying: SIEM, Lens, Machine Learning, APM, Maps, Index Lifecycle Management, Snapshot Lifecycle Management, Logs, Metrics, Monitoring, Canvas, Uptime, and more!