Product

Elasticsearch Observability: Embracing Prometheus and OpenMetrics Standards for Metrics

In this blog we will cover the following:

  • Why open standards are important
  • The Prometheus exposition format
  • How Elastic views observability
  • Three ways Elasticsearch can consume Prometheus metrics
  • An example of how to collect and visualize metrics exposed by the Prometheus Redis exporter

Open Standards

At opensource.com, there's an informative resource entitled: "What are Open Standards?". There are many great points in that document, but to me, coming from many years in ops, these are the ones that resonate:

  1. Availability: Open standards are available for all to read and implement.
  2. Maximize end-user choice
  3. No discrimination (vendor neutrality): Open standards and the organizations that administer them do not favor one implementor over another.
  4. No Intentional Secrets: The standard must not withhold any detail necessary for interoperable implementation.

Those compelling reasons why open standards are good, now let's talk about why the Prometheus exposition format is the basis for OpenMetrics. In his talks at PromCon 2018 and KubeCon + CloudNativeCon North America 2018, Richard Hartmann summed up the reasons for creating an open standard influenced by the Prometheus exposition format:

  • Most data formats are proprietary, hard to implement, or both
  • Prometheus has become a de-facto standard in cloud-native metric monitoring
  • Ease of exposition data has lead to an explosion in compatible metrics endpoints
  • Prometheus' exposition format is based on a lot of operational experience, but has been designed between few people
  • Some other projects and vendors are torn about adopting something from a "competing" product

Prometheus exposition format

You can read about the exposition format in the Prometheus Github repo. For now, let's just look at an example. I have an exporter, Oliver006's Redis exporter, publishing metrics at port 9121 at the /metrics endpoint. I am only showing information about the Redis "instantaneous ops per second" metric here. There are three lines for the reading:

  1. Help text
  2. The type of metric (gauge in this case)
  3. The Redis server being measured (localhost port 6379), and its current reading (9 ops per sec)

roscigno-prometheus-exposition-format.png

    Observability at Elastic

    I encourage you to read about how Elastic views observability, but here is my favorite line from the post:

    The goal of designing and building an 'observable' system is to make sure that when it is run in production, operators responsible for it can detect undesirable behaviors (e.g., service downtime, errors, slow responses) and have actionable information to pin down root cause in an effective manner (e.g., detailed event logs, granular resource usage information, and application traces).

    That statement, which I wholeheartedly support, tells me that we need all of the logs, metrics, and trace information to run, repair, and manage the services we provide. Prometheus is a very important part of observability because of its widespread adoption and active community. The OpenMetrics standard will only increase the value by removing barriers, whether they are real or perceived, to adoption of a common sense "born in ops" metrics format.

    Most people I speak with are very familiar with the Elastic Stack, or ELK, for logging. If you did not know that the Elastic Stack is also great for metrics and APM check us out for metrics and APM / distributed tracing.

    The main reasons we see interest in deep integration between Elastic Stack and Prometheus way of exporting metrics are:

    • Combining metrics with logs and APM in Elasticsearch and correlating them in Kibana. Check out a user story from NS1 about combining logs and metrics in Elastic Stack.
    • Using Elasticsearch as long term storage for metrics collected by Prometheus Server, which does not currently support clustering natively.
    • Achieving a global view of your metrics across geographically dispersed Prometheus instances.

    The rest of the blog describes in detail how we approach these integrations.

    A sample exporter

    My demo environment is running in Google Kubernetes Engine (GKE), so I am running my application, Metricbeat, and the Prometheus exporter, all in Kubernetes. This is a bit of Oliver006's manifest for deploying a Redis exporter as a sidecar alongside the Redis image. As you can see, the exporter is publishing at port 9121, which is the default assigned port number for the Prometheus Redis exporter.

    ...
      - name: redis-exporter
        image: oliver006/redis_exporter:v0.17.2
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121
    ...
    

    Full source in GitHub

    Scraping metrics with the Metricbeat Prometheus module

    Metricbeat is Elastic's lightweight shipper for metrics. The Prometheus module that ships with Metricbeat can gather metrics in three ways:

    1. Connect to the Prometheus server at port 9090 and pull already collected metrics using the Prometheus Federation API (to get the metrics that Prometheus is collecting)
    2. Connect to the Prometheus server on port 9090 using the /metrics endpoint (Prometheus self monitoring)
    3. Connect to Prometheus exporters individually and parse the exposition format

    Why would you choose one approach over another? It depends on your level of comfort with Prometheus Server.

    • If you already have Prometheus Server set up to scrape metrics and would like to directly query these metrics for integration purposes, you may start with options (1) and (2).
    • If, on the other hand, you don't already have Prometheus Server or don't mind scraping your exporters in parallel by multiple tools, you may choose option (3).

    Note: Some of the above Metricbeat functionality is beta in Metricbeat version 7.0. We would encourage you to download the 7.0 beta or copy the container links from https://www.docker.elastic.co/ and run the beta in a non-production environment.

    Prometheus Federation API

    In general, federation is used to enable scaling, bring datasets together, or make a copy of the data available in a different location (for disaster recovery). Prometheus server provides a /federation endpoint, and Elastic connects to this endpoint to copy the metrics collected by Prometheus for all of the above reasons.

    ...
      - module: prometheus
        period: 10s
        hosts: ["prometheus-service.monitoring.svc.cluster.local:9090"]
        metrics_path: '/federate'
        query:
          'match[]': '{__name__!=""}'
    ...
    

    Full source in GitHub

    In the example above, the query is set to "anything that has a non-blank name". You may not want to grab everything, and the Prometheus docs have information on how to write a more restrictive match condition. The example also is connecting to the Prometheus server every ten seconds, my demo server is only collecting from a few pods and kube-state-metrics, but you may want to change the interval.

    Prometheus self monitoring

    Prometheus provides a /metrics endpoint, just like the exporters. This is so that you can collect metrics about the Prometheus server. This is configured like so:

    ...
      - module: prometheus
        period: 10s
        hosts: ["prometheus-service.monitoring.svc.cluster.local:9090"]
        metrics_path: /metrics
    ...
    

    Full source in GitHub

    Prometheus exporter scraping

    This bit of YAML from a manifest to deploy a Metricbeat DaemonSet tells Metricbeat to autodiscover things with kubernetes.labels.app == redis and read metrics in from port 9121 of that pod. Recall that the containerPort set for the Redis exporter container is 9121.

    ...
    - condition.equals:
        kubernetes.annotations.prometheus.io/scrape: "true"
      config:
        - module: prometheus
          period: 10s
          # Redis pods
          hosts: ["${data.host}:9121"]
          metrics_path: /metrics
    ...
    

    Once Metricbeat is deployed, any pods that satisfy the condition kubernetes.labels.app == redis have the Prometheus module applied to them and the metrics are collected from the exporter sidecar at port 9121.

    But, metadata makes the k8s world go round, right? Let's do more with metadata and the Beats autodiscover feature. Look at this replacement for the above bit of YAML:

    ...
    - condition.equals:
        kubernetes.annotations.prometheus.io/scrape: "true"
      config:
        - module: prometheus
          period: 10s
          hosts: ["${data.host}:${data.kubernetes.annotations.prometheus.io/port}"]
          metrics_path: /metrics
    ...
    

    Full source in GitHub

    Now instead of looking for exporters for Redis pods, we are looking for exporters for any pod with an annotation of kubernetes.annotations.prometheus.io/scrape set to true. This is also how Prometheus autodiscovery is set up. Generally, Metricbeat autodiscover is driven by an annotation in the elastic.co namespace, but since we are talking about reading from Prometheus exporters we should honor the standard k8s annotations associated with Prometheus. If you look at the hosts list from above:

    hosts: ["${data.host}:${data.kubernetes.annotations.prometheus.io/port}"]

    You can see that we are no longer hard coding port 9121, as that is the port for the Redis exporter. The annotation prometheus.io/port is set to the port number for the exporter. For completeness, here is a bit of the guestbook.yaml where these annotations were set:

    ...
    kind: Deployment
    metadata:
      name: redis-master
    spec:
      replicas: 1
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "9121"
          labels:
            app: redis
    ...
    

    Full source in GitHub

    Did I mention that metadata makes the k8s world go round? Wasn't that a song in the late 70's?

    Visualizing to gain insight

    Bringing the data into the Elastic Stack is great, but you have to be able to interact with the data. In the video below we will see how to approach building a useful visualization using Redis metrics scraped by Prometheus (and then imported into the Elastic Stack) and Kubernetes events collected from kube-state-metrics with Metricbeat directly.

    If you would like to follow along with the video and have detailed instructions, see the example repo.

    Back to Observability

    In the last section we created a Kibana visualization for a key Redis metric (instantaneous ops per second) exposed by the Oliver006's Redis exporter. Our next step would be to collect logs and then create a dashboard and combine logs and metrics together across our applications.

    To learn about collecting logs in a Kubernetes environment I would suggest that you follow the instructions in the elastic/examples GitHub repo. In just a few minutes you can have Filebeat, Metricbeat, and Packetbeat collecting data and publishing to Elasticsearch. There are sample dashboards that ship with the different Beats, and you should feel free to create your own visualizations for Prometheus data and mix the visualizations together to create your own dashboards for the way you work. And if you run into any issues or want to talk observability, reach out on the Discuss forums.