Tech Topics

Spinning up a cluster with Elastic's Azure Marketplace template

作者

Last week saw us push out a new release of our Azure Marketplace (ARM) solution template, delivering more features and choices than ever to configure an Elasticsearch cluster deployment within Azure, in a way to suit a multitude of needs. We want to take this opportunity to highlight some of the options available within our offering to demonstrate just how easy it is to get up and running. 

For those unfamiliar with ARM templates, there are essentially two components to them; a UI definition template that defines a step-by-step wizard for gathering all of the inputs required for a deployment and emitting a set of key/value pairs as output, and a deployment template that takes a set of key/value pairs as input and defines all of the resources to create and configure within Azure. It is possible to use the deployment template independently of the UI definition template, for example, with the Azure command line interface (CLI), but the beauty of the UI definition is that not only is it integrated into the Azure portal, it can also take advantage of querying existing resources within an Azure subscription to aid in filling out each step with valid values.

Before we dive into the nitty gritty, we’d like to take a moment to pay homage to the Azure Elasticsearch Quickstart template from which Elastic’s was forked; The quickstart templates are a collection of community-contributed templates for provisioning a multitude of different resources and applications on Azure, and Elastic has been happy to contribute features back to the template to continually improve it.

Getting started

When you want to use the Elastic Stack with existing services running on Azure, it can make sense to also deploy the components to Azure as well, not only to have everything managed from a single dashboard, but also to mitigate egress costs associated with moving data out of Azure data centers. Finding the template on the Azure Marketplace to start a deployment is a simple affair; simply choose + New and search for Elasticsearch to find the “Elasticsearch and Kibana” template published by Elastic (that’s us!).

find_the_template_small.png

The template is a Bring-Your-Own-License (BYOL) model; that is, the template deploys with a 30-day trial license of our commercial X-Pack offering, giving you access to all of the goodness that come with it, including monitoring, security, alerting and graph capabilities. Then, once the trial license expires, you can install your own license to continue using the critical commercial enhancements, or simply uninstall them (although we always recommend having some form of access control on a publicly available cluster!).

Deploy into an existing Virtual Network

Since the initial launch of the template back in December 2015, one of the features most requested has been to allow deployment of a cluster into an existing virtual network and, with the latest template version, we’re pleased to announce this request is now a reality.

existing_virtual_network_small.png

Whilst configuring the Elasticsearch version and name of the cluster, one can also specify whether to set up a new virtual network in the resource group into which all resources will be created or alternatively, use an existing virtual network within the same subscription and location as the current resource group being created. This is particularly useful in situations where you may already have resources deployed in another resource group, for example, a farm of servers running your website, and wish to make a cluster available on the same network, possibly same subnet, for those web servers to make requests to.

Previous incarnations of the template always set up a new virtual network as part of the deployment, deploying master nodes into one subnet and data and client nodes into another subnet. Now, all nodes are deployed into one subnet.

One to 50 data nodes (and more!)

The portal UI provides the choice to configure anywhere from one to 50 data nodes, subscription core quota permitting, defaulting to three data nodes with three dedicated master nodes.

data_nodes_small.png

Depending on your use case, it is possible to forgo dedicated master nodes and opt instead for master eligible data nodes and in doing so, it’s recommended to have at least three data nodes so that the template can configure minimum master nodes to a quorum of the master eligible nodes, which in this case will be each data node.

Similar to choosing the number of data nodes, up to 20 client nodes can also be configured for deployment, forwarding cluster-level requests to the master node and data-related requests to the appropriate data nodes. For clusters larger than 100 nodes, client nodes are mandatory to scaling due to a limitation with the internal load balancer only being able to be attached to one availability set for the backend pool.

The number of data and client nodes are limited in the UI definition to up to 50 and 20, respectively, although in using the deployment template directly it is possible to deploy even bigger clusters; for clusters with more than 100 nodes, virtual machines are deployed into more than one availability set in a round robin fashion, to overcome the maximum capacity of a single availability set within Azure.

big_cluster.png

The Hostname prefix input within the template serves to differentiate one cluster from another when deploying into the same subnet; since virtual machines are dynamically assigned IP addresses within the template, unicast discovery uses hostnames as the list of hosts within the configuration of each node, and Azure networking resources do not prevent a device from being attached to a network with the same hostname as an already connected device, it is crucial to set this to avoid nodes from joining the wrong cluster. Hostnames are also used as the node names within the cluster.

Premium Locally Redundant Storage

For scenarios where performance is paramount, there is the choice of deploying a cluster onto virtual machines that support Premium Locally Redundant Storage (LRS), shared virtual machine disks that offer much greater IOPS than Standard Storage, utilizing solid state drives (SSDs) over Hard disk drives (HDDs). For Standard storage disks on standard tier virtual machines,the maximum 8KB IOPS per persistent disk is 500 with a limit of 20,000 IOPS per Standard storage account. Premium Storage disks on the other hand can offer up to 5000 IOPS per disk, with up to 50 Gbps bandwidth per Premium storage account.

premium_locally_redundant_storage_small.png

With greater performance comes greater cost, so the template allows the choice of the right “horses” (boxes) for the right “courses” (scenarios), allowing different virtual machine sizes for different node roles within the cluster. 

NOTE: For dedicated master nodes that do not store data per se but do need to persist cluster state, the OS disk of each machine is used, which is persisted in Azure Blob storage.

Outside in

In addition to deploying an Elasticsearch cluster, the template can also deploy an instance of Kibana to a separate virtual machine, allowing visualizations and dashboards to be built upon the data in the cluster.

external_access_small.png

Kibana connects to the cluster through an internal load balancer, the internal IP address of which is configured as the default url when installing Sense plugin; no more digging around in the portal to ascertain the right IP address to input to make requests to the cluster!

Finally, an external load balancer can be configured in addition to an internal load balancer for scenarios where external access to the cluster is required.

Protected with Shield

This brings us to an important point; the trial license for X-Pack provides security for the cluster in the form of Shield, with a blade step dedicated to configuring passwords for admin, read-only and Kibana roles. Right now, you need to configure Transport Layer Security yourself for the cluster if you require it, such as when using Kibana or an external load balancer, as the template does not ship with it by default; this is something that we would very much like to add in a future version, to turn things up to 11. Azure provides Application Gateway, a resource that is able to load balance at the application level and perform SSL termination, providing an immediate solution, albeit with a little configuration. 

Making cluster changes

As previously mentioned, the ARM template simply facilitates spinning up a cluster on Azure infrastructure and does not prevent access to individual machines within the cluster when needed, for example, when needing to access logs, install plugins or change configuration. Since the Kibana machine is set up with a public IP address, it is also possible to use it as a jumpbox to gain access to any node in the cluster and for cases where Kibana is not required, a separate jumpbox machine can be configured to serve this purpose.

Accessing the jumpbox can be achieved using ssh, using the username and password/ssh credentials specified in the Basic Settings step

ssh <jumpbox ip>

admin_small.png

Once a connection has been established to the jumpbox, any machine can then be connected to within the cluster using ssh with the same credentials as before and the internal IP address of the target machine. On an Elasticsearch node

  • The bin directory can be found at /usr/share/elasticsearch/bin
  • The configuration file can be found at /etc/elasticsearch/elasticsearch.yml
  • A log of the ARM deployment can be found at /var/log/arm-install.log

Summary

That was a quick whirlwind tour of our ARM template and we hope that it has been useful in demonstrating capabilities and how easy it is to get up and running on Azure. Stay tuned for further improvements and features in the future by following the template repository up on Github.