Tech Topics

Istio monitoring with Elastic Observability

Istio is an open source service mesh that can be used by developers and operators to successfully control, secure, and connect services together in the world of distributed microservices. While Istio is a powerful tool for teams, it's also important for administrators to have full visibility into its health. In this blog post, we'll take a look at monitoring Istio and its microservices with Elastic Observability.

As the Istio docs mention:

Istio makes it easy to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, with few or no code changes in service code. You add Istio support to services by deploying a special sidecar proxy throughout your environment that intercepts all network communication between microservices, then configure and manage Istio using its control plane functionality, which includes:

  • Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
  • Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault injection.
  • A pluggable policy layer and configuration API supporting access controls, rate limits and quotas.
  • Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress.
  • Secure service-to-service communication in a cluster with strong identity-based authentication and authorization.

Prior to version 1.5, Istio was built with a microservice architecture and its control plane and management components consisted of multiple microservices: Pilot, Mixer, Galley, and Citadel.

Metricbeat had support for monitoring these microservices, but in version 1.5 Istio changed its architecture to a monolithic approach and now the control plane comes with a single application called istiod. Pilot, Galley, and Citadel are now part of istiod, while the functionality of Mixer, which was responsible for collecting traffic metrics from the Envoy proxies, is now provided by Istio proxies directly. The current architecture of Istio looks like this:

Istio Architecture

Crafting a monitoring solution with Elastic

While versions prior to 1.5 were already supported by the multiple metricsets of the Istio Metricbeat module, in this blog we will focus more on the support for the newer versions when it comes to Istio running on Kubernetes.

Control plane metrics

As shown in the Istio architecture illustration above, we only have one resource from which we can collect control plane metrics. Istiod provides a Prometheus exporter from which we can collect Prometheus metrics.

In order to consume metrics from the Prometheus endpoint, we need a metricset to collect these metrics properly, filter them out, and store them accordingly. This can easily be achieved by creating a lightweight metricset based on the Prometheus module leveraging powerful options like metric filtering and using histograms and types.

Let’s look at the definition of this new lightweight metricset:

  module: prometheus 
  metricset: collector 
    metrics_path: /metrics 
      include: ["citadel_*", "galley_*", "pilot_*"] 
      exclude: ["^up$"] 
    use_types: true 
    rate_counters: true

This defines the path from which the collector metricset will scrape metrics and filters out metrics that we don’t need, as well as enabling rates and types so that the data will be properly stored in Elasticsearch, allowing us to make the most out of it.

The way to configure this metricset when deploying metricbeat on a Kubernetes cluster looks like this:

- module: istio 
  metricsets: ['istiod'] 
  period: 10s 
  hosts: ['istiod.istio-system:15014']

Where istiod is the name of the Kubernetes service exposing the Istiod Pod and istio-system is the namespace where it’s deployed.

And that’s all! We already have the istiod metricset to collect the metrics from istiod, which also comes with a pre-built dashboard to provide an overview of the service mesh’s control plane, complete with several visualizations that you can use in your own custom dashboards:

Overview Dashboard

Data plane metrics

Now that we are collecting metrics from the control plane with the istiod metricset, we can extend our monitoring by collecting metrics from the data plane. This will give us a powerful overview of the traffic between the services that are managed by Istio.

As mentioned earlier, Mixer was the microservice responsible for collecting and providing these data plane metrics. But after version 1.5, these metrics are collected and exposed directly from Istio proxies using a Prometheus exporter.

All we need to do is to specify another lightweight metricset, similar to what we did for istiod, to collect these additional metrics:

  module: prometheus 
  metricset: collector 
    metrics_path: /stats/prometheus 
      include: ["istio_*"] 
      exclude: ["^up$"] 
    use_types: true 
    rate_counters: true

Same as before, we set the metrics_path, and we only keep the metrics we want and store them using types. 

There’s one piece missing though: we don’t know how to reach these proxy containers as we don’t know their IP addresses. Even if we knew the IPs of these containers before deploying Metricbeat, we would not be able to collect data from services that would be deployed after Metricbeat was started. We need a way to automatically identify these containers and start collecting metrics as they start — the perfect job for the autodiscover feature of Metricbeat. This means that we define an autodiscover condition to identify these containers and whenever a new Istio-proxy sidecar container is spotted, Metricbeat will automatically enable the proxy metricset and start collecting data from it.

Here is an example of this autodiscover configuration:

    - type: kubernetes 
      node: ${NODE_NAME} 
        - condition: 
            - module: istio 
              metricsets: ["proxy"] 
              hosts: "${}:15090"

And there we are! We are collecting metrics from all of the Istio sidecar containers running on the cluster and we are able to identify any new one of them on the fly. This is the proxy metricset of the Istio module, which also comes with a prebuilt dashboard:

Traffic Dashboard

Additionally, we can leverage graph analytics in Kibana to explore correlations between our data and the services. For instance, with the graph below, we can see an overview of how our services are connected to each other and how strongly they are related with http status codes. A service with a strong relationship with a 500 status code would indicate an issue that we should investigate.

Service Mesh Graph

Monitoring Istio today

If you want to start monitoring your Istio service mesh, download Metricbeat 7.11 and get started exploring your metrics efficiently with Elasticsearch and Kibana. The fastest way to deploy your cluster is to spin up a free trial of Elasticsearch Service. And if you have any questions, remember that we are always happy to help on the Discuss forums.