Tech Topics

Elastic Stack monitoring with Metricbeat via Logstash or Kafka

In a previous blog post, we introduced a new method of monitoring the Elastic Stack with Metricbeat. Using Metricbeat to externally collect monitoring information about Elastic Stack products improves the reliability of monitoring those products. It also provides flexibility with how the monitoring data may be routed to the Elasticsearch monitoring cluster. In this blog post we drill a bit deeper into the second aspect by showing how users can route the monitoring data collected by Metricbeat via Logstash or Kafka to the monitoring cluster. So if you are already using the logstash or kafka output in your Metricbeat configuration for your business data, you can continue to use those outputs for routing your Elastic Stack monitoring data as well.

Let's start where the previous blog post left off. It introduced the following architecture for monitoring Elastic Stack products with Metricbeat.


Note that every Metricbeat instance is monitoring an instance or node of an Elastic Stack product. To do this, the correct Metricbeat modules ( *-xpack configuration variant) must be enabled. For example, to monitor a Logstash node, the logstash-xpack module must be enabled.

In this architecture, every Metricbeat instance ships data to a monitoring cluster. This implies that there needs to be network connectivity from the Metricbeat hosts to the monitoring cluster hosts.

Sometimes, however, it might be desirable to minimize the number of ingress points into Elasticsearch. It might be more desirable to funnel all the stack monitoring traffic emanating from the Metricbeat instances into a Logstash instance and then have it forward the data to the monitoring cluster. In this blog post, we’ll take a look at implementing such an architecture for stack monitoring with Metricbeat.

Adding Logstash to your stack monitoring data flow

First, we'll setup a Logstash pipeline that receives stack monitoring data from Metricbeat and forwards it on to the monitoring cluster. This pipeline is shown below and its parts are explained further.

input {
  beats {
    port => 5044
  }
}
filter {
  # Boilerplate for compatibility across Beats versions
  mutate {
    rename => { "[@metadata][id]" => "[@metadata][_id]" }
  }
}
output {
  if [@metadata][index] =~ /^.monitoring-*/ {
    # route stack monitoring data to monitoring Elasticsearch cluster
    if [@metadata][_id] {
      elasticsearch {
        index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
        document_id => "%{[@metadata][_id]}"
        hosts => ["https://node1:9200"]
      }
    } else {
      elasticsearch{
        index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
        hosts => ["https://node1:9200"]
      }
    }
  } else {
    # route non-stack monitoring data
  }
}

At a high level this pipeline:

  • Uses the beats input plugin to read the stack monitoring data sent by Metricbeat.
  • Uses the elasticsearch output plugin to send the stack monitoring data to the monitoring cluster.

Note the ladder of if-else statements in the output section of the pipeline. The top-level if-else allows you to separate data intended for stack monitoring — to be indexed into .monitoring-* indices — from other data potentially being collected by the same Metricbeat instances, e.g. if you've enabled the system module.

Inside the if clause for stack monitoring data, there is a nested if-else statement. This construction ensures that any IDs, if set, on stack monitoring data events coming from Metricbeat are passed through to the _id field when indexing the data into the monitoring cluster. This is essential, in particular, for Elasticsearch shard monitoring data to be indexed correctly. Without this construction, the Stack Monitoring UI for Elasticsearch will incorrectly show an ever-increasing number of shards over time!

Configuring Metricbeat to ship to Logstash

Once you have the Logstash pipeline set up, you will need to configure your Metricbeat instances to send their data to the Logstash host instead of directly to the monitoring cluster.

output.logstash:
  hosts: [ "logstash_hostname:5044" ]

A variation on this setup might be to introduce Kafka between Metricbeat and Logstash. In this case the Logstash pipeline would look pretty much the same as above, except you would use the kafka input plugin instead of the beats one. And correspondingly, you would configure your Metricbeat instances to send their data to the Kafka cluster instead of Logstash.

Wrapping up

Hopefully this post has given you a concrete implementation for routing your Elastic Stack monitoring data from Metricbeat to Elasticsearch via Logstash (or Kafka). I also hope it clearly demonstrated the kind of flexibility that becomes possible by using Metricbeat for externally collecting monitoring data from Elastic Stack products. 

If you have questions or run into any issues with this setup, feel free to post them on discuss.elastic.co. And if you want to see what you get with Stack Monitoring, head over to our interactive demo and give it a try.