11 April 2017 User Stories

Software Asset Management with Elasticsearch @ Red Mint Network

By Vianney Bajart

When your company has reached a significant size with an IT infrastructure that includes a substantial number of workstations and servers, it’s sometimes painful to manage your software assets and get the real inventory picture. When the time comes to audit (e.g. to comply with Microsoft’s request), it might be too late if you do not have the analytics to render accurate history data. How many users for a specific software? What is the duration of use per user? Do you have more running instances than the number of purchased licenses? Is anyone running illegal copies of the software? Or, on the contrary, have you purchased more licenses than you actually need?

Ignoring this information can expose you to extra charges due to the payment of useless licenses, billing adjustments and legal fees. We will explain how we enable smart and pragmatic software asset management thanks to Elasticsearch and SDN (Software Defined Networking) telemetry.

Network as data source

Fortunately, many software programs are very verbose. They constantly contact their vendor’s servers for various purposes such as authorization requests, update checking, telemetry or access to cloud services. This generates network traffic that can be dissected by a VNF (Virtual Network Function) in the Internet Service Provider infrastructure and then explored in Elasticsearch.

Knowing the source of the network packet (where the software is installed), its destination and its timestamp, it is possible to know when a software is up just by detecting meaningful events on the network link.

For instance, we have seen that when Adobe® Photoshop® CC or Adobe® Illustrator® CC are running, they open a TLS-encrypted session to ans.oobesaas.adobe.com:

{
    "@timestamp": "2017-03-06T12:42:05.230651762+01:00",
    "track_id": "fa9b4f9539a18daeb7578e47ea2fb0b6544ea527",
    "type": "track",
    "track": {
      "sni": "ans.oobesaas.adobe.com",
      "appname": "SSL",
      "host_hmac": "71b5e991310e75d05c3e242ab1fc86a5dce6f3e6",
      ...
    },
    ...
  }

Here, we use the anonymized field host_hmac to identify the data source (i.e. computer, or tablet, etc.) and the SNI (Server Name Indication) of the certificate extracted from the TLS handshake to identify the destination.

Kibana shows us that this exchange is performed every 9 minutes:

sni-adobe

That’s enough information to determine the number of users on the network and the duration of use over a given time window!

Computing the number of users

In this example, we want to compute the number of users for each day of the last week. First, we filter the data to select only the objects that match the related SNI and the requested date range. The date selector is used for defining a relative range (from now-7d/d to now/d). Then, we use the date histogram aggregation to gather data on daily buckets. Finally, computing the number of users is straightforward using the cardinality aggregation on the host_hmac field.

Let’s query Elasticsearch to get the number of Adobe software users for each day of the last week:

{
  "size": 0,
  "timeout": "1s",
  "aggs": {
    "telemetry": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {"gte": "now-7d/d", "lte": "now/d"}
              }
            }
          ],
          "should": [
            {"match": {"track.sni": "ans.oobesaas.adobe.com"}}
          ],
          "minimum_should_match": 1
        }
      },
      "aggs": {
        "time_slot": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
          },
          "aggs": {
            "user_count": {"cardinality": {"field": "track.host_hmac"}}
          }
        }
      }
    }
  }
}

Computing the duration of use

To compute the duration of use, a Time-To-Live value must be defined. If no signal is seen during this period of time, the software is considered as stopped. In the case of a periodic signal, the TTL must be slightly greater than the signal period to avoid glitches due to network latency or host slowdowns. The example below is using 10-minutes TTL (greater than the observed 9 minutes TTL in Adobe®).

time-zone

Let’s use the scripted metric aggregation to compute the duration of use per user and return the average value.

This type of aggregation allows to implement a Map/Reduce model with a combination of four scripts written in the Painless language.

init

Initialize an empty hash-map to aggregate results per host.

    params._agg = [:];

map

Gather the timestamps of each signal occurrence.

    def host = doc['track.host_hmac'].value;
    def ts = doc['@timestamp'].value;

    /* Gather signal timestamps for each host */
    params._agg.putIfAbsent(host, []);
    params._agg[host].add(ts);

combine

Sort the results of each shard.

    for (ts_list in params._agg.values()) {
        ts_list.sort(Long::compare);
    }
    return params._agg;

reduce

  1. Merge the results
  2. Compute the duration of use per user
  3. Compute and return the average value
    long ttl = 9 * 60 * 1000;
    long total_time = 0;
    def per_host = [:];
    int count;

    /* Merge results */
    for (agg in params._aggs) {
        if (agg == null) {
            continue;
        }
        for (e in agg.entrySet()) {
            def host = e.getKey();
            per_host.putIfAbsent(host, []);
            per_host[host].addAll(e.getValue());
        }
    }

    /* 0-division is evil */
    count = per_host.size();
    if (count == 0) {
        return 0;
    }

    /* Compute the duration of use per host */
    for (ts_list in per_host.values()) {
         long prev = 0;
         ts_list.sort(Long::compare);

         for (ts in ts_list) {
             long delta = ts - prev;
             total_time += delta > ttl ? ttl : delta;
             prev = ts;
         }
    }

    /* Compute average */
    return (total_time / count) / 1000;

At this time, the scripted metric aggregation is still experimental. Pipelining its result to another aggregation seems currently not supported. That’s why the computation of the average is performed by the Painless script and not with the Avg Bucket aggregation.

Data vizualisation

It is possible to combine these two aggregations into a single query:

{
  "aggs" : {
    "time_slot": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "day"
      },
      "aggs": {
        "user_count": {
          "cardinality": {
            "field": "track.host_hmac"
          }
        },
        "time_used": {
          "scripted_metric": {
            "init_script": "...",
            "map_script": "...",
            "combine_script": "...",
            "reduce_script": "..."
          }
        }
      }
   }
 }

The result is easy to parse: for each day of the requested date range, we get the number of users and the average duration of use.

"buckets": [
  {
    "key_as_string": "2017-03-11T00:00:00.000Z",
    "key": 1489190400000,
    "doc_count": 4352,
    "user_count": { "value": 128 },
    "time_used": { "value": 20340 }
  },
  {
    "key_as_string": "2017-03-12T00:00:00.000Z",
    "key": 1489276800000,
    "doc_count": 4860,
    "user_count": { "value": 180 },
    "time_used": { "value": 16080 }
  },
  ...
]

The result of this query is sufficient to display the two metrics into the same mixed bar/line chart built with Chart.js.

dataviz

Mining networking data sources can create powerful analytics for IT when combined with the Elastic Stack. The ongoing revolution towards Software Defined Networking and Virtual Network Functions is a great way to deliver new data services to customers in a data-as-a-service paradigm. Extracting the signal from the noise, giving it a meaning with Elasticsearch and presenting the data as a valuable perspective is what we do, and we hope you enjoyed reading this blog article. Vianney Bajart @ Red Mint Network.

Adobe Photoshop CC and Adobe Illustrator CC are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries.