Back up a cluster’s dataedit

As with any software that stores data, it is important to routinely back up your data. Elasticsearch replicas provide high availability during runtime; they enable you to tolerate sporadic node loss without an interruption of service.

Replicas do not provide protection from catastrophic failure, however. For that, you need a real backup of your cluster—a complete copy in case something goes wrong.

To back up your cluster’s data, you can use the snapshot API.

A snapshot is a backup taken from a running Elasticsearch cluster. You can take a snapshot of individual indices or of the entire cluster and store it in a repository on a shared filesystem, and there are plugins that support remote repositories on S3, HDFS, Azure, Google Cloud Storage and more.

Snapshots are taken incrementally. This means that when it creates a snapshot of an index, Elasticsearch avoids copying any data that is already stored in the repository as part of an earlier snapshot of the same index. Therefore it can be efficient to take snapshots of your cluster quite frequently.

Tip

If your cluster has Elasticsearch security features enabled, when you back up your data the snapshot API call must be authorized.

The snapshot_user role is a reserved role that can be assigned to the user who is calling the snapshot endpoint. This is the only role necessary if all the user does is periodic snapshots as part of the backup procedure. This role includes the privileges to list all the existing snapshots (of any repository) as well as list and view settings of all indices, including the .security index. It does not grant privileges to create repositories, restore snapshots, or search within indices. Hence, the user can view and snapshot all indices, but cannot access or modify any data.

For more information, see Security privileges and Built-in roles.