S3 repository settings
You can use AWS S3 as a repository for Snapshot and restore. This page lists the settings you can use to configure how Elasticsearch connects to S3 and how it stores data in your repository.
There are two categories of settings:
- Client settings control how Elasticsearch connects to S3, including authentication credentials, endpoints, proxies, and timeouts. These are defined per client in
elasticsearch.ymlor the Elasticsearch keystore, and can be shared across multiple repositories. - Repository settings control per-repository behavior such as the target bucket, chunk size, throttling, and encryption. These are specified when creating or updating a repository via the API.
For step-by-step setup instructions, refer to S3 repository.
The S3 client used to connect to S3 has a number of available settings. The settings have the form s3.client.CLIENT_NAME.SETTING_NAME. By default, s3 repositories use a client named default, but this can be modified using the repository setting client. For example, to use an S3 client named my-alternate-client, register the repository as follows:
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"client": "my-alternate-client"
}
}
Most client settings can be added to the elasticsearch.yml configuration file with the exception of the secure settings, which you add to the Elasticsearch keystore. For more information about creating and updating the Elasticsearch keystore, refer to Secure settings.
For example, if you want to use specific credentials to access S3, then run the following commands to add these credentials to the keystore.
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore add s3.client.default.session_token
If you do not configure these settings then Elasticsearch will attempt to automatically obtain credentials from the environment in which it is running:
- Nodes running on an instance in AWS EC2 will attempt to use the EC2 Instance Metadata Service (IMDS) to obtain instance role credentials. Elasticsearch supports IMDS version 2 only.
- Nodes running in a container in AWS ECS and AWS EKS will attempt to obtain container role credentials similarly.
You can switch from using specific credentials back to the default of using the instance role or container role by removing these settings from the keystore as follows:
bin/elasticsearch-keystore remove s3.client.default.access_key
bin/elasticsearch-keystore remove s3.client.default.secret_key
# a session token is optional so the following command may not be needed
bin/elasticsearch-keystore remove s3.client.default.session_token
Define the relevant secure settings in each node's keystore before starting the node. The secure settings described here are all reloadable so you may update the keystore contents on each node while the node is running and then call the Nodes reload secure settings API to apply the updated settings to the nodes in the cluster. After this API completes, Elasticsearch will use the updated setting values for all future snapshot operations, but ongoing operations may continue to use older setting values.
The following list contains the available S3 client settings. Those that must be stored in the keystore are marked as "secure" and are reloadable; the other settings belong in the elasticsearch.yml file.
s3.client.CLIENT_NAME.region-
Determines the region to use to sign requests made to the service. Also determines the regional endpoint to which Elasticsearch sends its requests, unless you specify a particular endpoint using the
endpointsetting. If not set, Elasticsearch will attempt to determine the region automatically using the AWS SDK. Elasticsearch must use the correct region to sign requests because this value is required by the S3 request-signing process.If you are using an S3-compatible service then it is unlikely the AWS SDK will be able to determine the correct region name automatically, so you must set it manually. Your service's region name is under the control of your service administrator and need not refer to a real AWS region, but the value to which you configure this setting must match the region name your service expects.
s3.client.CLIENT_NAME.access_key(Secure, reloadable)- An S3 access key. If set, the
secret_keysetting must also be specified. If unset, the client will use the instance or container role instead. s3.client.CLIENT_NAME.secret_key(Secure, reloadable)- An S3 secret key. If set, the
access_keysetting must also be specified. s3.client.CLIENT_NAME.session_token(Secure, reloadable)- An S3 session token. If set, the
access_keyandsecret_keysettings must also be specified. s3.client.CLIENT_NAME.endpoint-
The S3 service endpoint to connect to. This defaults to the regional endpoint corresponding to the configured
region, but the AWS documentation lists alternative S3 endpoints. If you are using an S3-compatible service then you should set this to the service's endpoint. The endpoint should specify the protocol and host name, e.g.https://s3.ap-southeast-4.amazonaws.com,http://minio.local:9000.When using HTTPS, this repository type validates the repository's certificate chain using the JVM-wide truststore. Ensure that the root certificate authority is in this truststore using the JVM's
keytooltool. If you have a custom certificate authority for your S3 repository and you use the Elasticsearch bundled JDK, then you will need to reinstall your CA certificate every time you upgrade Elasticsearch. s3.client.CLIENT_NAME.protocol- The protocol scheme to use to connect to S3, if
endpointis set to an incomplete URL which does not specify the scheme. Valid values are eitherhttporhttps. Defaults tohttps. Avoid using this setting. Instead, set theendpointsetting to a fully-qualified URL that starts with eitherhttp://orhttps://. s3.client.CLIENT_NAME.proxy.host- The host name of a proxy to connect to S3 through.
s3.client.CLIENT_NAME.proxy.port- The port of a proxy to connect to S3 through.
s3.client.CLIENT_NAME.proxy.scheme- The scheme to use for the proxy connection to S3. Valid values are either
httporhttps. Defaults tohttp. This setting allows to specify the protocol used for communication with the proxy server. s3.client.CLIENT_NAME.proxy.username(Secure, reloadable)- The username to connect to the
proxy.hostwith. s3.client.CLIENT_NAME.proxy.password(Secure, reloadable)- The password to connect to the
proxy.hostwith. s3.client.CLIENT_NAME.read_timeout- (time value) The maximum time Elasticsearch will wait to receive the next byte of data over an established, open connection to the repository before it closes the connection. The default value is 50 seconds.
s3.client.CLIENT_NAME.max_connections- The maximum number of concurrent connections to S3. The default value is
50. s3.client.CLIENT_NAME.max_retries- The number of retries to use when an S3 request fails. The default value is
3. s3.client.CLIENT_NAME.connection_max_idle_time- (time value) The timeout after which Elasticsearch will close an idle connection. The default value is 60 seconds.
s3.client.CLIENT_NAME.path_style_access- Whether to force the use of the path style access pattern. If
true, the path style access pattern will be used. Iffalse, the access pattern will be automatically determined by the AWS Java SDK (See AWS documentation for details). Defaults tofalse.
In versions 7.0, 7.1, 7.2 and 7.3 all bucket operations used the now-deprecated path style access pattern. If your deployment requires the path style access pattern then you should set this setting to true when upgrading.
s3.client.CLIENT_NAME.disable_chunked_encoding- Whether chunked encoding should be disabled or not. If
false, chunked encoding is enabled and will be used where appropriate. Iftrue, chunked encoding is disabled and will not be used, which may mean that snapshot operations consume more resources and take longer to complete. It should only be set totrueif you are using a storage service that does not support chunked encoding. See the AWS Java SDK documentation for details. Defaults tofalse.
The s3 repository type supports a number of settings to customize how data is stored in S3. These can be specified when creating the repository. For example:
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"another_setting": "setting-value"
}
}
The following settings are supported:
bucket-
(Required) Name of the S3 bucket to use for snapshots.
The bucket name must adhere to Amazon's S3 bucket naming rules.
client- The name of the S3 client to use to connect to S3. Defaults to
default. base_path-
Specifies the path to the repository data within its bucket. Defaults to an empty string, meaning that the repository is at the root of the bucket. The value of this setting should not start or end with a
/.NoteDon't set
base_pathwhen configuring a snapshot repository for Elastic Cloud Enterprise. Elastic Cloud Enterprise automatically generates thebase_pathfor each deployment so that multiple deployments may share the same bucket. chunk_size- (byte value) The maximum size of object that Elasticsearch will write to the repository when creating a snapshot. Files which are larger than
chunk_sizewill be chunked into several smaller objects. Elasticsearch may also split a file across multiple objects to satisfy other constraints such as themax_multipart_partslimit. Defaults to5TBwhich is the maximum size of an object in AWS S3. compress- When set to
truemetadata files are stored in compressed format. This setting doesn't affect index files that are already compressed by default. Defaults totrue. max_restore_bytes_per_sec- (Optional, byte value) Maximum snapshot restore rate per node. Defaults to unlimited. Note that restores are also throttled through recovery settings.
max_snapshot_bytes_per_sec- (Optional, byte value) Maximum snapshot creation rate per node. Defaults to
40mbper second. Note that if the recovery settings for managed services are set, then it defaults to unlimited, and the rate is additionally throttled through recovery settings. readonly-
(Optional, Boolean) If
true, the repository is read-only. The cluster can retrieve and restore snapshots from the repository but not write to the repository or create snapshots in it.Only a cluster with write access can create snapshots in the repository. All other clusters connected to the repository should have the
readonlyparameter set totrue.If
false, the cluster can write to the repository and create snapshots in it. Defaults tofalse.ImportantIf you register the same snapshot repository with multiple clusters, only one cluster should have write access to the repository. Having multiple clusters write to the repository at the same time risks corrupting the contents of the repository.
server_side_encryption- When set to
truefiles are encrypted on server side using AES256 algorithm. Defaults tofalse. buffer_size- (byte value) Minimum threshold below which the chunk is uploaded using a single request. Beyond this threshold, the S3 repository will use the AWS Multipart Upload API to split the chunk into several parts, each of
buffer_sizelength, and to upload each part in its own request. Note that setting a buffer size lower than5mbis not allowed since it will prevent the use of the Multipart API and may result in upload errors. It is also not possible to set a buffer size greater than5gbas it is the maximum upload size allowed by S3. Defaults to100mbor5%of JVM heap, whichever is smaller. max_multipart_parts- (integer) The maximum number of parts that Elasticsearch will write during a multipart upload of a single object. Files which are larger than
buffer_size × max_multipart_partswill be chunked into several smaller objects. Elasticsearch may also split a file across multiple objects to satisfy other constraints such as thechunk_sizelimit. Defaults to10000which is the maximum number of parts in a multipart upload in AWS S3. canned_acl- The S3 repository supports all S3 canned ACLs :
private,public-read,public-read-write,authenticated-read,log-delivery-write,bucket-owner-read,bucket-owner-full-control. Defaults toprivate. You could specify a canned ACL using thecanned_aclsetting. When the S3 repository creates buckets and objects, it adds the canned ACL into the buckets and objects. storage_class- Sets the S3 storage class for objects written to the repository. Values may be
standard,reduced_redundancy,standard_ia,onezone_iaandintelligent_tiering. Defaults tostandard. Refer to S3 storage classes for more information. delete_objects_max_size- (integer) Sets the maxmimum batch size, betewen 1 and 1000, used for
DeleteObjectsrequests. Defaults to 1000 which is the maximum number supported by the AWS DeleteObjects API. max_multipart_upload_cleanup_size- (integer) Sets the maximum number of possibly-dangling multipart uploads to clean up in each batch of snapshot deletions. Defaults to
1000which is the maximum number supported by the AWS ListMultipartUploads API. If set to0, Elasticsearch will not attempt to clean up dangling multipart uploads. throttled_delete_retry.delay_increment- (time value) This value is used as the delay before the first retry and the amount the delay is incremented by on each subsequent retry. Default is 50ms, minimum is 0ms.
throttled_delete_retry.maximum_delay- (time value) This is the upper bound on how long the delays between retries will grow to. Default is 5s, minimum is 0ms.
throttled_delete_retry.maximum_number_of_retries- (integer) Sets the number times to retry a throttled snapshot deletion. Defaults to
10, minimum value is0which will disable retries altogether. Note that if retries are enabled in the Azure client, each of these retries comprises that many client-level retries. get_register_retry_delay- (time value) Sets the time to wait before trying again if an attempt to read a linearizable register fails. Defaults to
5s. unsafely_incompatible_with_s3_conditional_writes- (boolean) Elasticsearch uses AWS S3's support for conditional writes to protect against repository corruption. If your repository is based on a storage system which claims to be S3-compatible but does not accept conditional writes, set this setting to
trueto make Elasticsearch perform unconditional writes, bypassing the repository corruption protection, while you work with your storage supplier to address this incompatibility with AWS S3. Defaults tofalse.
elasticsearch.yml or the keystore instead.
For example, the following overrides the endpoint from the named client:
PUT _snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"client": "my-client",
"bucket": "my-bucket",
"endpoint": "my.s3.endpoint"
}
}