Data disksedit

Managed disks can be attached to Data nodes to use as the data directory for the node. The ARM template can attach Standard HDD disks or Premium managed disks, for those VM SKUs that support them:

storageAccountType
The performance tier of managed disks. Standard will use Standard HDD disks, whilst Default will use Premium managed disks for those VM SKUs that support Premium managed disks, and Standard HDD disks for those that do not. The default is Default.
vmDataDiskSize

The size of each attached managed disk. Choose between

XXLarge

4095 GB

XLarge

2048 GB

Large

1024 GB

Medium

512 GB

Small

128 GB

Default is Large.

vmDataDiskCount

The number of managed disks to attach to each data node. The total number of managed disks will be

vmDataNodeCount * vmDataDiskCount

If the number of disks selected is more than can be attached to the data node VM SKU, the maximum number of disks that can be attached for the data node VM SKU size will be used. This is equivalent to

Math.min(vmDataDiskCount, data node VM SKU maximum attached disks)

Must be greater than or equal to 0. Default is the maximum number of disks supported by the data node VM SKU.

Disks are partitioned with fdisk when less than 2TB, and with parted when larger, with an ext4 filesystem and 4096 byte block size.

Data is striped across attached disks per data node in a RAID 0 array, using mdadm on Linux. When only one managed disk is attached, no RAID 0 array is configured. When a value of 0 is specified, the data node will use the temp storage of the VM.

Important

Temp storage, with filesystem /dev/sdb1 mounted on /mnt in Ubuntu, is present on the physical machine hosting the VM. It is ephemeral in nature and not persistent; A VM can move to a different host at any point in time for various reasons, including hardware failures. When this happens, the VM will be created on the new host using the OS disk from the storage account, and new temp storage will be created on the new host.

Using temp storage can be a cost effective way of running an Elasticsearch cluster on Azure with decent performance, so long as you understand the tradeoffs in doing so, by snapshotting frequently and ensuring adequate data redundancy through sufficient replica shards.

Striping data across attached disks is recommended to improve Input/Output operations per second (IOPS) performance, since the IOPS and throughput limit per disk can be combined. The IOPS for Premium disks is higher than for Standard HDD disks, so Premium disks are recommended where application performance is paramount.