Elastic Observability Labs - Articles by Lorenzo Soligo

TLS Certificate Monitoring with the OpenTelemetry Collector

Fri, 09 Jan 2026 00:00:00 GMT

In modern distributed systems, TLS certificates are the glue that holds everything together while keeping it safe. Certificates aren't only used for encrypting user traffic; they are fundamental building blocks of trust for your entire system.

Indeed, an expired certificate is not just a minor technical glitch. It is a direct hit on your most critical systems:

Your CI/CD pipeline grinds to a halt because it can not trust the internal image registry.
Your Single Sign-On (SSO) system fails, locking all your internal users out.
Your external clients see scary browser warnings, shattering user trust and forcing support tickets.
Your SLOs burn due to services not being able to communicate with one another.

In Kubernetes, certificates are usually dynamically generated and auto-renewed by tools like cert-manager. In more unlucky scenarios, certificates might be tucked away inside Secrets and ConfigMaps, leading to challenges while inventorying them. It is neither hard nor unheard of to have a dozen critical certificates and no centralized way to know when they are about to expire.

Additionally, only monitoring the certificates for external Load Balancers might lead to huge internal risks, since many certificates never get exposed to external users.

In this blog post, we will guide you through establishing comprehensive, cluster-wide certificate monitoring using the OpenTelemetry Collector, the x509-certificate-exporter, and Elastic Observability.

Classical approach: HTTP monitoring

The classical approach to monitor TLS certificate expiration in the Elastic Observability is by treating it like any other service availability check. Historically, this was accomplished using Heartbeat or, more recently, Elastic Observability's Synthetics. These tools perform an external check against a public HTTPS endpoint and automatically extract the certificate's validity dates, allowing you to configure a Synthetics TLS certificate rule in Kibana to trigger an alert when expiration is within a specified threshold (e.g., 30 days).

While effective for external-facing services, this "classical" approach has two major shortcomings when dealing with Kubernetes:

It only works for certificates exposed via HTTP(S), meaning you cannot use this for internal services, databases, or message queues using other protocols. In other words, this won't work to monitor common, critical TLS certificates such as Kafka's.
The monitoring agent must have network access to the endpoint. In a segmented or private Kubernetes environment, deploying agents with the necessary access often introduces unnecessary complexity or security risks.

To gain true cluster-wide visibility, we need to inspect the certificates at their source: inside Kubernetes Secrets or ConfigMaps.

A Kubernetes-native approach: monitor Secrets and ConfigMaps

Monitoring TLS certificate expiration directly within Kubernetes Secrets and ConfigMaps is the only reliable way to gain visibility into internal, non-HTTP-exposed certificates, such as those used for service meshes, internal registries, or databases. In this section, we will use the OpenTelemetry Collector to monitor certificate expiration.

The OpenTelemetry Collector provides a mechanism to read up-to-date information from the Kubernetes API, including Secrets, via the k8sobjects receiver. However, this receiver only fetches raw TLS certificate resource data, which the OpenTelemetry Transformation Language (OTTL) can not properly parse. Therefore, we need to use a dedicated exporter to collect the certificate data and expose the results in a digestible format.

The industry-standard solution

As mentioned above, simply reading certificate information from the Kubernetes API is not a feasible solution. We will therefore use a specialized, lightweight exporter (specifically, the popular x509-certificate-exporter) to collect TLS certificate data and expose the results, allowing the OpenTelemetry Collector's Prometheus receiver to seamlessly scrape the data and send it to Elastic Observability. This approach immediately and easily enables us to monitor both certificates generated by cert-manager and self-managed ones, such as the ones created for ECK.

A fully working configuration example and a script to set up a complete local development environment is available here. Feel free to use it to follow along as you read through this guide and try out the examples. Please note that, while this repository uses the Elastic Distribution of OpenTelemetry (EDOT), it can be easily adapted to use the OpenTelemetry Collector.

Helm Chart Configuration

We configured the x509-certificate-exporter with the official Helm Chart and used the following minimal configuration:

secretsExporter:
  secretTypes:
  - type: kubernetes.io/tls
    key: tls.crt
  # For ECK that uses different secret types
  - type: Opaque
    key: tls.crt
  - type: Opaque
    key: ca.crt
  configMapKeys:
  - tls.crt
  - ca.crt

# Create a service to have a stable endpoint for scraping metrics
service:
  create: true
  # -- TCP port to expose the Service on
  port: 9793

# Disable prometheus service monitor and prometheus rules
prometheusServiceMonitor:
  create: false
prometheusRules:
  create: false

We refer to the reference values.yaml to get insights in the plethora of configuration options.

OpenTelemetry Collector Configuration

Afterward, we configured the OpenTelemetry Collector to scrape the metrics from the service:

prometheus/cert-expiration:
  config:
    scrape_configs:
      - job_name: "cert-expiration"
        scrape_interval: 60m
        static_configs:
          - targets:
              - "x509-certificate-exporter.monitoring.svc.cluster.local:9793"

We deliberately used a long scrape interval of 60 minutes, because certificate expiration is a low-frequency concern.

Visualizing the data in Kibana

Once the data is ingested, we can explore it using Discover. We can select the metrics-* Data View and search for our data with the filter data_stream.dataset : "prometheusreceiver.otel".

An example document looks like the following:

{
  "@timestamp": "2025-12-19T09:43:45.317Z",
  "_metric_names_hash": "7d113f55b70019d9",
  "attributes": {
    "issuer_CN": "tls-cert.example.com",
    "issuer_O": "TLS Cert",
    "secret_key": "tls.crt",
    "secret_name": "tls-cert-secret",
    "secret_namespace": "test-certs",
    "serial_number": "250887723804527203192865532237673843132727735771",
    "subject_CN": "tls-cert.example.com",
    "subject_O": "TLS Cert"
  },
  "data_stream": {
    "dataset": "prometheusreceiver.otel",
    "namespace": "default",
    "type": "metrics"
  },
  "metrics": {
    "x509_cert_expired": 0,
    "x509_cert_not_after": 1768488242,
    "x509_cert_not_before": 1765896242
  },
  "resource": {
    "attributes": {
      "server.address": "x509-certificate-exporter.monitoring.svc.cluster.local",
      "server.port": "9793",
      "service.instance.id": "x509-certificate-exporter.monitoring.svc.cluster.local:9793",
      "service.name": "cert-expiration",
      "url.scheme": "http"
    }
  },
  "scope": {
    "name": "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver",
    "version": "9.2.2"
  }
}

The core metric reported by the x509-certificate-exporter is x509_cert_not_after that represent the Unix Epoch timestamp (in seconds) of the certificate's expiration date. This metric has some attributes associated with it. In the case of Secrets, the following attributes are relevant:

secret_namespace: The namespace of the Secret containing the certificate.
secret_name: The name of the Secret containing the certificate.
secret_key: The specific key within the Secret where the certificate is stored.

In the case of ConfigMaps, we can infer the attributes of interest from the filepath attribute.

Finally, we can leverage ES|QL to compute the remaining days until expiration. In the following examples, we will use the TS command, which is optimized and recommended for interacting with time-series data.

For Secrets:

TS metrics-*
| WHERE metrics.x509_cert_not_after is not NULL
| STATS expiration_date = MAX(LAST_OVER_TIME(metrics.x509_cert_not_after)) by attributes.secret_namespace, attributes.secret_name, attributes.secret_key
| EVAL remaining_days = DATE_DIFF("days", NOW(), TO_DATETIME (1000 * expiration_date))
| EVAL expiration_date = TO_DATETIME(1000 * expiration_date)
| SORT expiration_date ASC

And for ConfigMaps:

TS metrics-*
| WHERE metrics.x509_cert_not_after IS NOT NULL
| WHERE attributes.filepath IS NOT NULL
| DISSECT attributes.filepath "k8s/%{namespace}/%{configmap}"
| WHERE configmap != "kube-root-ca.crt" // Filter out the Kubernetes API server certificate's signing CA
| STATS expiration_date = MAX(LAST_OVER_TIME(metrics.x509_cert_not_after)) by namespace, configmap, filename
| EVAL remaining_days = DATE_DIFF("days", NOW(), TO_DATETIME (1000 * expiration_date))
| EVAL expiration_date = TO_DATETIME(1000 * expiration_date)
| SORT expiration_date ASC

Based on these core queries, we can easily build a dashboard that shows the remaining days until expiration for all the certificates in the cluster:

and create alerts about certificates that are about to expire by adding a condition after the query:

WHERE remaining_days < 30

Conclusion

In this blog post, we explored how to monitor TLS certificate expiration within a Kubernetes cluster using the OpenTelemetry Collector. We discussed the limitations of traditional HTTP-based monitoring approaches and introduced a Kubernetes-native solution leveraging the x509-certificate-exporter to extract certificate expiration data directly from Kubernetes Secrets and ConfigMaps. This method provides comprehensive visibility into all certificates used within the cluster, including those not exposed via HTTP(S).

For the sake of simplicity, we just focused on monitoring certificate expiration with the OpenTelemetry Collector on Kubernetes. However, this approach can be easily applied with classical Elastic Agent by leveraging the Prometheus input package (read more on how to use input packages here) and can be also extended to monitor certificates on virtual machines or bare-metal servers by deploying the x509-certificate-exporter there.

Finally, is worth knowing that Elastic Observability, offers an officially supported distribution of the OpenTelemetry Collector, called Elastic Distributions of OpenTelemetry (EDOT).

If you are an Elastic user, you could consider using EDOT Collector to monitor certificates with OpenTelemetry: since it is supported by Elastic Observability, it will be easier to manage and keep up to date. Alternatively you can use upstream OTel compnents also.

What's next?

Now that Elastic supports Rule Templates and OpenTelemetry content packs, our near-term objective is to contribute to the integration repository to make the setup of certificate monitoring even easier for our users. Stay tuned for more updates on this!

Check out other resources on Elastic's OpenTelemetry

Elastic's OTLP EndPoint

Elastic's EDOT PHP Contribution

Opentelemetry SDK Central Management with EDOT

Also sign up for Elastic Cloud and try out your application with OpenTelemetry in Elastic

Monitoring Proxmox VE deployments with Elastic Observability

Wed, 23 Jul 2025 00:00:00 GMT

In this blog post, you will learn how to leverage Elastic Observability to monitor Proxmox VE and the software running on top of it, both in the form of Linux Containers (LXCs) and Virtual Machines (VMs).

Why use Elastic Observability with Proxmox?

Here at Elastic, we are passionate about efficiently managing and monitoring infrastructure and applications. Many of us have fun playing with home labs, oftentimes running Proxmox VE, a powerful open-source virtualization platform used to run virtual machines and Linux Containers (LXCs) with ease. While Proxmox provides robust tools for managing virtualized resources, gaining deep insights into the performance and health of your LXCs, VMs, and hosts requires a comprehensive monitoring solution. This blog post will guide you through leveraging the power of Elastic Observability, in conjunction with Elastic Agent, to effectively monitor your Proxmox VE deployment, ensuring optimal performance and proactive issue resolution thanks to Kibana Alerts.

The homelab setup

Our homelab setup centers around an Intel N100 mini PC, serving as the host for Proxmox VE. This setup is simple and minimal, yet effective for showcasing a few interesting capabilities. On top of this mini PC, we run several Linux Containers (LXCs) for various services, along with a dedicated virtual machine for Home Assistant.

Elastic Agent installation and configuration

Before beginning, it is worth noting that there are numerous ways to install and configure the Elastic Agent. For the sake of simplicity, we will showcase a setup in which only one instance of the Elastic Agent is running on the host machine. The Elastic Agent reports to an Elastic Cloud Observability deployment and is managed via Fleet, which makes it tremendously easy to upgrade and re-configure it whenever needed.

Diving into the host

Kibana offers various panes that make it nice and easy to learn about a system's health at a quick glance.

As a first step, let's take a look at the Infrastructure > Hosts page in Kibana:

Hosts Kibana page for our Proxmox host" />

Here we can see various information about our Proxmox VE host (i.e. the mini PC). The top processes running on it are presented, including processes running in LXCs such as pia-daemon. We can also see a kvm process, specifically running a Home Assistant virtual machine, and a Proxmox pve-firewall process.

Let's now take a look at Universal Profiling > Flamegraph. This graph shows how much CPU time is consumed by different stack traces from processes running on the host system. You can drill down into specific processes using the search bar at the top. For instance, you can filter by kvm to only see information regarding this specific process.

Flamegraph Kibana page for our Proxmox host" />

The Observability AI Assistant

All the Kibana panes we visited so far have proved to be highly interesting, but they struggle to answer urgent questions such as:

did anything happen in our mini PC recently?
was there any significant change in functionality?
is there any precious information hidden among the thousands of data points collected?

The Elastic Observability AI Assistant helps us by answering these questions in natural language. By default, on Elastic Cloud, it uses the Elastic-managed LLM connector, which means users do not need to configure anything to get started with it. It just works!

Let's go to the Observability > AI Assistant pane in Kibana and let's try to ask a generic prompt such as: "please give me an overview of the health of my prox host".

Let's then wait a minute so that it can dig into the data... et voilà, here comes lots of relevant information in the form of graphs and natural language explanations. The Observability AI Assistant understood our question, went through all the data for our Proxmox host, ran data analytics on it, and reported back in a matter of seconds!

Alerting upon disruption with Kibana Alerts

As a final step, let's try to define a Kibana Alert to help us understand whether our host is overloaded. Let's head to Observability > Alerts > Rules and create a new rule. We will create a Custom Threshold rule that will fire if CPU usage for the host is higher than 80% on average for the last 15 minutes. Kibana will send us an email in case the rule fires. The rule is also configured to fire if no data appears for the last 15 minutes, which is extremely helpful as it would imply the presence of some issues to be debugged: broken network or no electricity in the house, a faulty Agent deployment, or even a hardware issue with the mini PC.

Conclusion

In this blog post we showcased how to effectively use the Elastic Stack to monitor Proxmox VE deployments. If you would like to try out such a setup first-hand, you are more than welcome to enjoy Elastic Cloud's 14-days free trial.

In future blog posts, we will investigate how to dig deeper into LXCs and VMs to gather even more information from our home lab and create more tailored alerts. Stay tuned!