While Elasticsearch requires very little configuration, there are a number of settings which need to be configured manually and should definitely be configured before going into production.
If you are using the
.tar.gz archives, the
directories are sub-folders of
$ES_HOME. If these important folders are
left in their default locations, there is a high risk of them being deleted
while upgrading Elasticsearch to a new version.
In production use, you will almost certainly want to change the locations of the data and log folder:
path: logs: /var/log/elasticsearch data: /var/data/elasticsearch
The RPM and Debian distributions already use custom paths for
path.data settings can be set to multiple paths, in which case all paths
will be used to store data (although the files belonging to a single shard
will all be stored on the same data path):
path: data: - /mnt/elasticsearch_1 - /mnt/elasticsearch_2 - /mnt/elasticsearch_3
A node can only join a cluster when it shares its
cluster.name with all the
other nodes in the cluster. The default name is
elasticsearch, but you
should change it to an appropriate name which describes the purpose of the
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster.
By default, Elasticsearch will use the first seven characters of the randomly generated UUID as the node id. Note that the node id is persisted and does not change when a node restarts and therefore the default node name will also not change.
It is worth configuring a more meaningful name which will also have the advantage of persisting after restarting the node:
node.name can also be set to the server’s HOSTNAME as follows:
It is vitally important to the health of your node that none of the JVM is
ever swapped out to disk. One way of achieving that is set the
bootstrap.memory_lock setting to
For this setting to have effect, other system settings need to be configured
first. See Enable
bootstrap.memory_lock for more details about how to set up memory locking
By default, Elasticsearch binds to loopback addresses only — e.g.
[::1]. This is sufficient to run a single development node on a server.
In fact, more than one node can be started from the same
on a single node. This can be useful for testing Elasticsearch’s ability to
form clusters, but it is not a configuration recommended for production.
In order to communicate and to form a cluster with nodes on other servers,
your node will need to bind to a non-loopback address. While there are many
network settings, usually all you need to configure is
Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and will scan ports 9300 to 9305 to try to connect to other nodes running on the same server. This provides an auto- clustering experience without having to do any configuration.
When the moment comes to form a cluster with nodes on other servers, you have to provide a seed list of other nodes in the cluster that are likely to be live and contactable. This can be specified as follows:
The port will default to
A hostname that resolves to multiple IP addresses will try all resolved addresses.
To prevent data loss, it is vital to configure the
discovery.zen.minimum_master_nodes setting so that each master-eligible node
knows the minimum number of master-eligible nodes that must be visible in
order to form a cluster.
Without this setting, a cluster that suffers a network failure is at risk of
having the cluster split into two independent clusters — a split brain — which will lead to data loss. A more detailed explanation is provided
in Avoiding split brain with
To avoid a split brain, this setting should be set to a quorum of master- eligible nodes:
(master_eligible_nodes / 2) + 1
In other words, if there are three master-eligible nodes, then minimum master
nodes should be set to
(3 / 2) + 1 or