Nodes orchestration

This section covers the following topics:

NodeSets overview
Cluster upgrade
Cluster upgrade patterns
StatefulSets orchestration
Limitations
Advanced control during rolling upgrades
Cluster rolling restarts
Restart allocation delay

NodeSets overview

NodeSets are used to specify the topology of the Elasticsearch cluster. Each NodeSet represents a group of Elasticsearch nodes that share the same Elasticsearch configuration and Kubernetes Pod configuration.

Tip

You can use YAML anchors to declare the configuration change once and reuse it across all the node sets.

		apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 9.4.4
  nodeSets:
  - name: master-nodes
    count: 3
    config:
      node.roles: ["master"]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard
  - name: data-nodes
    count: 10
    config:
      node.roles: ["data"]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1000Gi
        storageClassName: standard
		
	

In this example, the Elasticsearch resource defines two NodeSets:

master-nodes with 10Gi volumes
data-nodes with 1000Gi volumes

The Elasticsearch cluster is composed of 13 nodes: 3 master nodes and 10 data nodes.

Upgrading the cluster

ECK handles smooth upgrades from one cluster specification to another. You can apply a new Elasticsearch specification at any time. For example, based on the Elasticsearch specification described in the NodeSets overview, you can:

Add five additional Elasticsearch data nodes: In data-nodes change the value in the count field from 10 to 15.
Increase the memory limit of data nodes to 32Gi: Set a different resource limit in the existing data-nodes NodeSet.
Replace dedicated master and dedicated data nodes with nodes having both master and data roles: Replace the two existing NodeSets by a single one with a different name and the appropriate Elasticsearch configuration settings.
Upgrade Elasticsearch from version 7.2.0 to 7.3.0: Change the value in the version field.

ECK orchestrates NodeSet changes with no downtime and makes sure that:

Before a node is removed, the relevant data is migrated to other nodes (with some limitations).
When a cluster topology changes, these Elasticsearch settings are adjusted accordingly:
- discovery.seed_hosts
- cluster.initial_master_nodes
- discovery.zen.minimum_master_nodes
- _cluster/voting_config_exclusions
Rolling upgrades are performed safely with existing PersistentVolumes reused where possible.

StatefulSets orchestration

Behind the scenes, ECK translates each NodeSet specified in the Elasticsearch resource into a StatefulSet in Kubernetes. The NodeSet specification is based on the StatefulSet specification:

count corresponds to the number of replicas in the StatefulSet. A StatefulSet replica is a Pod — which corresponds to an Elasticsearch node.
podTemplate can be used to customize some aspects of the Elasticsearch Pods created by the underlying StatefulSet.
The StatefulSet name is derived from the Elasticsearch resource name and the NodeSet name. Each Pod in the StatefulSet gets a name generated by suffixing the pod ordinal to the StatefulSet name. Elasticsearch nodes have the same name as the Pod they are running on.

The actual Pod creation is handled by the StatefulSet controller in Kubernetes. ECK relies on the OnDelete StatefulSet update strategy since it needs full control over when and how Pods get upgraded to a new revision.

When a Pod is removed and recreated (maybe with a newer revision), the StatefulSet controller makes sure that the PersistentVolumes attached to the original Pod are then attached to the new Pod.

Cluster upgrade patterns

Depending on how the NodeSets are updated, ECK handles the Kubernetes resource reconciliation in various ways.

A new NodeSet is added to the Elasticsearch resource.

ECK creates the corresponding StatefulSet. It also sets up Secrets and ConfigMaps to hold the TLS certificates and Elasticsearch configuration files.
The node count of an existing NodeSet is increased.

ECK increases the replicas of the corresponding StatefulSet.
The node count of an existing NodeSet is decreased.

ECK migrates data away from the Elasticsearch nodes due to be removed and then decreases the replicas of the corresponding StatefulSet. PersistentVolumeClaims belonging to the removed nodes are automatically removed as well.
An existing NodeSet is removed.

ECK migrates data away from the Elasticsearch nodes in the NodeSet and removes the underlying StatefulSet.
The specification of an existing NodeSet is updated. For example, the Elasticsearch configuration, or the PodTemplate resources requirements.

ECK performs a rolling upgrade of the corresponding Elasticsearch nodes. It follows the Elasticsearch rolling upgrade best practices to update the underlying Pods while maintaining the availability of the Elasticsearch cluster where possible. In most cases, the process simply involves restarting Elasticsearch nodes one-by-one. Note that some cluster topologies may be impossible to deploy without making the cluster unavailable (check Limitations ).
An existing NodeSet is renamed.

ECK creates a new NodeSet with the new name, migrates data away from the old NodeSet, and then removes it. During this process the Elasticsearch cluster could temporarily have more nodes than normal. The Elasticsearch update strategy controls how many nodes can exist above or below the target node count during the upgrade.

In all these cases, ECK handles StatefulSet operations according to the Elasticsearch orchestration best practices by adjusting the following orchestration settings:

discovery.seed_hosts
cluster.initial_master_nodes
discovery.zen.minimum_master_nodes
_cluster/voting_config_exclusions

Limitations

Due to relying on Kubernetes primitives such as StatefulSets, the ECK orchestration process has some inherent limitations.

Cluster availability during rolling upgrades is not guaranteed for the following cases:
- Single-node clusters
- Clusters containing indices with no replicas

If an Elasticsearch node holds the only copy of a shard, this shard becomes unavailable while the node is upgraded. To ensure high availability it is recommended to configure clusters with three master nodes, more than one node per data tier and at least one replica per index.

Elasticsearch Pods may stay Pending during a rolling upgrade if the Kubernetes scheduler cannot re-schedule them back. This is especially important when using local PersistentVolumes. If the Kubernetes node bound to a local PersistentVolume does not have enough capacity to host an upgraded Pod which was temporarily removed, that Pod will stay Pending.
Rolling upgrades can only make progress if the Elasticsearch cluster health is green. There are exceptions to this rule if the cluster health is yellow and if the following conditions are satisfied:
- A cluster version upgrade is in progress and some Pods are not up to date.
- There are no initializing or relocating shards.

If these conditions are met, then ECK can delete a Pod for upgrade even if the cluster health is yellow, as long as the Pod is not holding the last available replica of a shard.

The health of the cluster is deliberately ignored in the following cases:

If all the Elasticsearch nodes of a NodeSet are unavailable, probably caused by a misconfiguration, the operator ignores the cluster health and upgrades nodes of the NodeSet.
If an Elasticsearch node to upgrade is not healthy, and not part of the Elasticsearch cluster, the operator ignores the cluster health and upgrades the Elasticsearch node.
- Elasticsearch versions cannot be downgraded. For example, it is impossible to downgrade an existing cluster from version 7.3.0 to 7.2.0. This is not supported by Elasticsearch.

Advanced users may force an upgrade by manually deleting Pods themselves. The deleted Pods are automatically recreated at the latest revision.

Operations that reduce the number of nodes in the cluster cannot make progress without user intervention, if the Elasticsearch index replica settings are incompatible with the intended downscale. Specifically, if the Elasticsearch index settings demand a higher number of shard copies than data nodes in the cluster after the downscale operation, ECK cannot migrate the data away from the node about to be removed. You can address this in the following ways:

Adjust the Elasticsearch index settings to a number of replicas that allow the desired node removal.
Use auto_expand_replicas to automatically adjust the replicas to the number of data nodes in the cluster.

Advanced control during rolling upgrades

During Elasticsearch rolling upgrades, ECK follows a set of rules (also known as predicates) to ensure the upgrade process is safe and does not put the cluster at risk. For example, one of these predicates ensures that only a single master node is upgraded at a time, while another prevents nodes from being restarted if the cluster is in a red state.

These predicates can be selectively disabled for certain scenarios where the ECK operator will not proceed with an Elasticsearch cluster upgrade because it deems it to be "unsafe".

For a complete list of available predicates, their meaning, and example usage, refer to ECK upgrade predicates.

Warning

Selectively disabling the predicates is extremely risky, and carry a high chance of either data loss, or causing a cluster to become completely unavailable. Use them only if you are sure that you are not causing permanent damage to an Elasticsearch cluster.
These predicates might change in the future. We will be adding, removing, and renaming these over time, so be careful in adding these to any automation.
Also, make sure you remove them after use by running kublectl annotate elasticsearch.elasticsearch.k8s.elastic.co/elasticsearch-sample eck.k8s.elastic.co/disable-upgrade-predicates-

Cluster Rolling Restart

You can trigger a graceful rolling restart of an Elasticsearch cluster without changing the cluster spec. The operator reuses the same rolling upgrade path: it uses the Elasticsearch node shutdown API, respects the same ECK upgrade predicates, and restarts one node at a time.

Trigger a rolling restart

To schedule a rolling restart, set a eck.k8s.elastic.co/restart-trigger annotation on the Elasticsearch resource metadata.

Set or change this value to start a rolling restart. The value is propagated to pod annotations and is visible in the Elasticsearch node shutdown API response as the shutdown reason.

You can also set the eck.k8s.elastic.co/restart-allocation-delay annotation to control the shard allocation delay during the restart.

To trigger another rolling restart later, update the restart-trigger value.

Rolling restart progress is visible in the Elasticsearch resource status under In Progress Operations → Upgrade, with node-level messages such as "Deleting pod for rolling restart".

Behavior

Keep the following behaviors in mind when working with the restart-trigger annotation:

Removing the annotation does not trigger a new restart. The operator retains the last trigger value on the pod template.
Removing the annotation does not cancel an in-progress restart. Pods not yet restarted will still restart with the previous trigger value.
Re-applying the same value might not trigger a new restart if all pods already have that value.
The operator may emit a non-blocking admission webhook warning when the annotation is removed or set to an unchanged value.

Example

In the following example, a timestamp trigger is used as the restart-trigger value. This value is visible in the Elasticsearch node shutdown API response as the shutdown reason.

		apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: my-cluster
  annotations:
    eck.k8s.elastic.co/restart-trigger: "2026-01-14T12:00:00Z"
spec:
  version: 9.4.4
  nodeSets:
    - name: default
      count: 3
      config:
        node.roles: ["master", "data", "ingest", "ml"]
        node.store.allow_mmap: false
		
	

Restart allocation delay

The eck.k8s.elastic.co/restart-allocation-delay annotation controls the allocation_delay parameter passed to the Elasticsearch node shutdown API when nodes are taken offline. Any value set on this annotation is used during both upgrades and custom triggered rolling restarts.

Set this annotation on the Elasticsearch resource metadata:

eck.k8s.elastic.co/restart-allocation-delay accepts a duration string such as "5m", "20m", that tells Elasticsearch how long to wait before reallocating shards from a node that is shutting down. If unset, the Elasticsearch default is used. Invalid or negative values are logged and ignored.

By default, when a node begins shutting down, Elasticsearch waits a short period before it starts moving shards to other nodes. Setting a longer allocation_delay avoids unnecessary shard movements during planned restarts where the node is expected to return quickly. Setting a shorter value causes Elasticsearch to start rebalancing sooner, which can be useful if the restart is expected to take a long time.

Example

		apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: my-cluster
  annotations:
    eck.k8s.elastic.co/restart-allocation-delay: "20m"
spec:
  version: 9.4.4
  nodeSets:
    - name: default
      count: 3
      config:
        node.roles: ["master", "data", "ingest", "ml"]
        node.store.allow_mmap: false