Tech Topics

Tuning Go Apps with Metricbeat

The Elastic Stack can be easily leveraged to monitor Go applications. It allows to do things like analyzing memory usage (memory leaks anyone?), performing long-term monitoring, tuning and capturing diagnostics. Beats in particular, the lightweight data shippers in the Stack, are designed to sit alongside the applications and are a natural fit for this kind of monitoring.

Metricbeat is a Beat specialized in shipping service and/or server metrics, but also happens to be written in Go. It ships in a relatively small package (only about 10MB), and does not bring any additional dependencies with it. While its CPU overhead and memory footprint are also very light, it ships with modules for a variety of services such as:

  • Apache
  • Couchbase
  • Docker
  • HAProxy
  • Kafka
  • MongoDB
  • MySQL
  • Nginx
  • PostgreSQL
  • Prometheus
  • Redis
  • System
  • ZooKeeper

If the service you’re looking for is not listed, don’t worry: Metricbeat is extensible and you can easily implement a module (and this post is proof of that!). We’d like to introduce you to the Golang Module for Metricbeat. It has merged into the master branch of elastic/beats, and is expected to be released in version 6.0.

Sneak preview

Here’s a Kibana Dashboard that captures the potential of the Golang Module for Metricbeat:

Kibana Dashboard

The top panel shows a summary of the heap usage, which gives us a general idea of ​​Go's memory usage and GC status:

  • System Total Memory: the total number of bytes obtained from the system
  • Bytes allocated: overall bytes allocated, including memory that has since been freed
  • GC cycles: the number of garbage collections (GC) that occurred
  • GC limit: when heap memory allocation reaches this limit, the garbage collector is started. This can be different in each GC

The middle panel has three charts in it with a breakdown of:

  • heap memory
  • system memory
  • object statistics

Heap Allocated represents both memory in use and not yet reclaimed, while Heap Inuse is obviously the size of objects that are active. Heap Idle accounts for objects that have been allocated but can be reclaimed as free memory.

The two charts in the bottom panel are the GC time and GC count statistics. CPU Fraction represents the percentage of CPU time spent on GC. The greater the value, the more frequently GC occurs, in other words more time wasted on GC. The trend seems upward and pretty steep, but the range of values is between 0.41% and 0.52% so not too worrisome. Normally the GC ratio warns an optimization in the code when it goes into the single digits.

Memory leaks

With this information we will be able to know in much detail about Go's memory usage, distribution and GC implementation. For instance if we wanted to analyze whether there is a memory leak, we could check if the memory usage and heap memory allocation are somewhat stable. If for example GC Limit and Byte Allocation are clearly rising, it could be due to a memory leak.

Historical information gives us great granularity in analyzing memory usage and GC patterns across different versions, or even commits!

Great, now how do I get it?

expvar

First things first, we need to enable Go's expvar service. expvar is a package in Go's standard library that exposes internal variables or statistics. Its usage is very simple, it's basically just a matter of importing the package in the application. It will automatically detect and register to an existing HTTP server:

import _ "expvar"

If no HTTP server is available, the code below allows us to start one on port 6060:

func metricsHandler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json; charset=utf-8")

    first := true
    report := func(key string, value interface{}) {
        if !first {
            fmt.Fprintf(w, ",\n")
        }
        first = false
        if str, ok := value.(string); ok {
            fmt.Fprintf(w, "%q: %q", key, str)
        } else {
            fmt.Fprintf(w, "%q: %v", key, value)
        }
    }

    fmt.Fprintf(w, "{\n")
    expvar.Do(func(kv expvar.KeyValue) {
        report(kv.Key, kv.Value)
    })
    fmt.Fprintf(w, "\n}\n")
}

func main() {
   mux := http.NewServeMux()
   mux.HandleFunc("/debug/vars", metricsHandler)
   endpoint := http.ListenAndServe("localhost:6060", mux)
}

The path registered by default is /debug/vars, we can access it at http://localhost:6060/debug/vars. It will expose data in JSON format, by default provides Go's runtime.Memstats but of course we can also register our own variables.

Go Metricbeat!

Now that we have an application with expvar, we can use Metricbeat to get this information into Elasticsearch. The installation of Metricbeat is very simple, it's just a matter of downloading a package. Before starting Metricbeat we just need modify the configuration file (metricbeat.yml) to enable the golang module:

metricbeat.modules:
- module : golang
  metricsets: ["heap"]
  enabled: true
  period: 10s
  hosts: ["localhost:6060"]
  heap.path: "/debug/vars"

The above configuration enables the Go monitoring module to poll for data every 10 seconds from heap.path. The other info that matters in the configuration file is the output:

output.elasticsearch:
  hosts: ["localhost: 9200"]

Now assuming Elasticsearch is already running, we can finally start Metricbeat:

./metricbeat -e -v

Now we are in business! Elasticsearch should have data, we can now start Kibana and customize the visualization for our needs. For this type of analysis Timelion is a particulary good fit, and to get started quickly we can leverage the existing sample Kibana Dashobards for Metricbeat.

More than memory

In addition to monitoring the existing memory information, through expvar we can expose some additional internal information. For example we could do something like:

var inerInt int64 = 1024
pubInt := expvar.NewInt("your_metric_key")
pubInt.Set(inerInt)
pubInt.Add(2)

It's also possible to expose Metricbeat's internal stats, so it can basically can monitor itself. It can be done via the -httpprof option:

./metricbeat -httpprof "127.0.0.1:6060" -e -v

Now we can navigate to http://127.0.0.1:6060/debug/vars and see statistics about the Elasticsearch output such as output.elasticsearch.events.acked, which represents the message sent to Elasticsearch for which Metricbeat received an ACK:

{
"output.events.acked": 1088,
"output.write.bytes": 1027455,
"output.write.errors": 0,
"output.messages.dropped": 0,
"output.elasticsearch.publishEvents.call.count": 24,
"output.elasticsearch.read.bytes": 12215,
"output.elasticsearch.read.errors": 0,
"output.elasticsearch.write.bytes": 1027455,
"output.elasticsearch.write.errors": 0,
"output.elasticsearch.events.acked": 1088,
"output.elasticsearch.events.not_acked": 0,
"output.kafka.events.acked": 0,
"output.kafka.events.not_acked": 0,
"output.kafka.publishEvents.call.count": 0,
"output.logstash.write.errors": 0,
"output.logstash.write.bytes": 0,
"output.logstash.events.acked": 0,
"output.logstash.events.not_acked": 0,
"output.logstash.publishEvents.call.count": 0,
"output.logstash.read.bytes": 0,
"output.logstash.read.errors": 0,
"output.redis.events.acked": 0,
"output.redis.events.not_acked": 0,
"output.redis.read.bytes": 0,
"output.redis.read.errors": 0,
"output.redis.write.bytes": 0,
"output.redis.write.errors": 0,
"Beat.memstats.memory_total": 155721720,
"Beat.memstats.memory_alloc": 3632728,
"Beat.memstats.gc_next": 6052800,
"cmdline": ["./metricbeat","-httpprof=127.0.0.1:6060","-e","-v"],
"fetches": {"system-cpu": {"events": 4, "failures": 0, "success": 4}, "system-filesystem": {"events": 20, "failures": 0, "success": 4}, "system-fsstat": {"events": 4, "failures": 0, "success": 4}, "system-load": {"events": 4, "failures": 0, "success": 4}, "system-memory": {"events": 4, "failures": 0, "success": 4}, "system-network": {"events": 44, "failures": 0, "success": 4}, "system-process": {"events": 1008, "failures": 0, "success": 4}},
"libbeat.config.module.running": 0,
"libbeat.config.module.starts": 0,
"libbeat.config.module.stops": 0,
"libbeat.config.reloads": 0,
"Memstats": { "alloc": 3637704, "TotalAlloc": 155,
[...]

We Metricbeat exposing its own metrics, we can modify its configuration to use both sets of metrics. We can do so by adding a new expvar type:

- module : golang
  metricsets: ["heap","expvar"]
  enabled: true
  period: 1s
  hosts: ["localhost:6060"]
  heap.path: "/debug/vars"
  expvar:
    namespace: "metricbeat"
    path: "/debug/vars"

As you can see we also used the namespace parameter and set it to metricbeat. We can now restart Metricbeat and we should start seeing the new metric.

Timelion

We can take the output.elasticsearch.events.acked and output.elasticsearch.events.not_acked fields and use a simple Timelion expression to plot successes and failures in messages from Metricbeat to Elasticsearch:

.es(
"metricbeat*",
metric="max:golang.metricbeat.output.elasticsearch.events.acked"
).derivative().label("Elasticsearch Success"),
.es(
"metricbeat*",
metric="max:golang.metricbeat.output.elasticsearch.events.not_acked"
).derivative().label("Elasticsearch Failed")

Here's the result in Kibana:

ACKs chart

From the chart the channel between Metricbeat and Elasticsearch appears to be stable and no messages were lost.

Finally, we can also compare the Metricbeat memory stats around the same time on the dashboard:

Kibana Dashboard 2

Coming up in Beats 6

This module will be released with Beats 6.0, but you can start using it right now by cloning (or forking ;) the Beats repo on GitHub and building the binary yourself.

Banner image credit: golang.org