30 May 2018 Engineering

Understanding your team’s usage with Elastic Cloud Enterprise

By Anurag Gupta

Elastic Cloud Enterprise (ECE), is an orchestration product that allows you to centrally manage, provision, and manage your Elastic Stack deployments from a single pane of glass. Many enterprises have adopted ECE to quickly enable their internal teams to find insight with one of many Elasticsearch’s use cases vs. spending time with Elastic Stack operations. As more and more teams begin to onboard to a centrally managed Elastic Cloud Enterprise environment the need to understanding how and what resources are being used become more and more important.

Elastic Cloud Enterprise comes with a powerful API that you can use to understand what is happening in the environment at any given time. The following blog walks through how you can leverage ECE’s API to gather statistics of running Elasticsearch clusters over time to start to understand different teams usage.

Requirements:

  • Elastic Cloud Enterprise 1.1+
  • Cluster Names with team name

Retrieving information from API

The ECE REST API to retrieve Elasticsearch cluster information offers quite a bit of useful information about running clusters, snapshot status, and historical information of any configuration changes. In this blog post we are going to use the information from this API call and capture it over time to provide the following insights to our ECE admins

  • Total Elasticsearch Memory capacity by internal team
  • Total Elasticsearch Disk usage by internal team
  • Change of Elasticsearch Disk and Memory use over time

Example API Call

Curl command:

curl -k -X GET -u readonly:<READONLY_PASSWORD> <ECE_HOST>:12400/api/v1/clusters/elasticsearch

Output:

An subset of the output of the API call

{
"return_count": 7,
"elasticsearch_clusters": [{
  "cluster_name": "Security-TrafficCluster",
  "plan_info": {
    "healthy": true,
    "history": []
  },
  "snapshots": {
    "healthy": true,
    "count": 0
  },
  "associated_kibana_clusters": [],
  "elasticsearch": {
    "healthy": true,
    "shard_info": {
      "healthy": true,
      "available_shards": [{
        "instance_name": "instance-0000000000",
        "shard_count": 0
      }],
      "unavailable_shards": [{
        "instance_name": "instance-0000000000",
        "shard_count": 0
      }],
      "unavailable_replicas": [{
        "instance_name": "instance-0000000000",
        "replica_count": 0
      }]
    },
    "master_info": {
      "healthy": true,
      "masters": [{
        "master_node_id": "LfyJXzR3Tg6eJo1IA2Vv9w",
        "master_instance_name": "instance-0000000000",
        "instances": ["instance-0000000000"]
      }],
      "instances_with_no_master": []
    },
    "blocking_issues": {
      "healthy": true,
      "cluster_level": [],
      "index_level": []
    }
  },
  "links": {
  },
  "healthy": true,
  "status": "started",
  "topology": {
    "healthy": true,
    "instances": [{
      "disk": {
        "disk_space_available": 131072,
        "disk_space_used": 0
      },
      "maintenance_mode": false,
      "service_running": true,
      "healthy": true,
      "instance_name": "instance-0000000000",
      "service_version": "6.2.4",
      "service_roles": ["master", "data", "ingest"],
      "allocator_id": "192.168.44.11",
      "service_id": "LfyJXzR3Tg6eJo1IA2Vv9w",
      "zone": "ece-zone-1",
      "container_started": true,
      "memory": {
        "instance_capacity": 4096,
        "memory_pressure": 3
      }
    }]
  },
  "metadata": {
    "version": 4,
    "last_modified": "2018-04-23T20:36:15.925Z",
    "endpoint": "2e707a6e6cec43048dd2d14ba0fb49f7.54.227.233.0.ip.es.io"
  },
  "external_links": [],
  "cluster_id": "2e707a6e6cec43048dd2d14ba0fb49f7"
....
....
}]
}


Example Environment

In my example, I have three separate teams using ECE: Security, SalesOps, and AppDev. Altogether these three teams have 7 different clusters deployed. To differentiate between clusters used by the different teams I’ve assigned cluster names following <team-name>-<use-case>Cluster. In the future, we’ll bring easier team support to clusters.

A screenshot of my ECE dashboard is pasted below:

Creating a Watch

As the API gives us a point in time view of the running Elasticsearch clusters we want to periodically call this API to start to understand changes to the clusters. Thankfully, Elastic Cloud Enterprise comes with X-Pack’s powerful Alerting Watcher capability. As X-Pack is automatically enabled for all clusters provisioned in ECE I can provision a brand new dedicated cluster to gathering this insight.

In addition, using watcher I can assign a schedule for how often to poll the ECE API using Watcher’s http input. Once retrieving the API call as a large JSON Array, we can use Watcher’s transform capabilities to separate the JSON array of Elasticsearch clusters into separate documents to be indexed into a set index.


In Kibana go to Management then Watcher. From here click add and use the following Watch definition.

The following watch is set to trigger every 10 minutes and sends the documents into the ece-running-clusters index.

{
 "trigger": {
"schedule": {
     "interval": "10m"
}
 },
 "input": {
"http": {
     "request": {
       "scheme": "http",
       "host": "<INSERT YOUR ECE HOST>",
       "port": 12400,
       "method": "get",
       "path": "/api/v1/clusters/elasticsearch",
       "params": {},
       "headers": {},
       "auth": {
         "basic": {
           "username": "readonly",
"password": "<INSERT YOUR ECE CREDENTIALS>"
         }
       }
     }
}
 },
 "condition": {
"always": {}
 },
 "actions": {
"index_payload": {
     "transform": {
       "script": {
<COMMENT_REMOVE_BEFORE_USING: This watcher takes the payload, a JSON Array, named elasticsearch_clusters and changes the field name to `_doc` to activate Watcher’s multi-document index feature. Thus each cluster is indexed as a separate document allowing us to create the visualizations below/>
         "source": "return ['_doc' : ctx.payload.elasticsearch_clusters]",
         "lang": "painless"
       }
     },
     "index": {
       "index": "ece-running-clusters",
       "doc_type": "ece-running-cluster",
       "execution_time_field": "timestamp"
     }
}
 }
}

Creating Visualizations and Dashboards

Once the watcher is created, you can begin to explore the incoming data. To quickly gain insights though, we’ll create an index pattern in Kibana and create some visualizations for dashboards.

Memory Capacity

In order to understand total memory capacity for a deployed cluster we can create a visualization that summarizes memory capacity per cluster. We’ll choose the Visual Builder visualization and use a Sum aggregation on top of the topology.instances.memory.instance_capacity. We’ll also add a Filter aggregation to show these results in terms of teams total capacity.

To understand a team’s usage and cluster breakdown we can filter by the team names (Security, AppDev, or SalesOps).

 

Disk Space Usage

In addition, to Memory Capacity we can also do the same with the topology.instances.disks.disk_space_used metric to get an overview of Disk Space Usage per cluster.

Memory and Disk Space Use over time

For understanding usage over time we can use a Line Chart or Timelion visualization with both the disk space and memory capacity metrics.

Note: In the following we can see that AppDev’s total capacity grew from 1GB -> 3 GB -> 2 GB. This is because a change to 2 GB was requested and ECE spins up the 2 GB cluster as it drains the connections from the 1 GB cluster, leaving a small period of time where the two clusters being alive overlap.

More fun

The walkthrough goes through an example using the Elasticsearch Cluster info, but you can also add the following to gain more insight

Future of Metering

While the API for ECE gives information about Memory and Disk, we understand that organizations also want to have additional usage information about Network Bandwidth and Usage. In the future we plan to add more usage metrics and have this be an integral feature of ECE.