<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Articles by Lorenzo Soligo</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Wed, 13 May 2026 18:20:09 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Articles by Lorenzo Soligo</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[TLS Certificate Monitoring with the OpenTelemetry Collector]]></title>
            <link>https://www.elastic.co/observability-labs/blog/edot-certificate-monitoring</link>
            <guid isPermaLink="false">edot-certificate-monitoring</guid>
            <pubDate>Fri, 09 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to monitor TLS certificate expiration in Kubernetes clusters using the OpenTelemetry Collector, ensuring comprehensive visibility into both external and internal certificates, using Elastic Observability
]]></description>
            <content:encoded><![CDATA[<p>In modern distributed systems, TLS certificates are the glue that holds
everything together while keeping it safe. Certificates aren't only used for
encrypting user traffic; they are fundamental building blocks of trust for your
entire system.</p>
<p>Indeed, an expired certificate is <em>not</em> just a minor technical glitch.
It is a direct hit on your most critical systems:</p>
<ul>
<li>
<p>Your CI/CD pipeline grinds to a halt because it can not trust the internal
image registry.</p>
</li>
<li>
<p>Your Single Sign-On (SSO) system fails, locking all your internal users out.</p>
</li>
<li>
<p>Your external clients see scary browser warnings, shattering user trust and
forcing support tickets.</p>
</li>
<li>
<p>Your SLOs burn due to services not being able to communicate with one another.</p>
</li>
</ul>
<p>In Kubernetes, certificates are usually dynamically generated and auto-renewed
by tools like <code>cert-manager</code>. In more unlucky scenarios, certificates might be
tucked away inside <code>Secrets</code> and <code>ConfigMaps</code>, leading to challenges while
inventorying them. It is neither hard nor unheard of to have a dozen critical
certificates and no centralized way to know when they are about to expire.</p>
<p>Additionally, only monitoring the certificates for external Load Balancers might
lead to huge <em>internal</em> risks, since many certificates never get exposed to
external users.</p>
<p>In this blog post, we will guide you through establishing comprehensive,
cluster-wide certificate monitoring using the OpenTelemetry Collector,
the <a href="https://github.com/enix/x509-certificate-exporter">x509-certificate-exporter</a>,
and Elastic Observability.</p>
<h2>Classical approach: HTTP monitoring</h2>
<p>The classical approach to monitor TLS certificate expiration in the Elastic
Observability is by treating it like any other service availability check. Historically,
this was accomplished using Heartbeat or, more recently, Elastic Observability's Synthetics.
These tools perform an external check against a public HTTPS endpoint and
automatically extract the certificate's validity dates, allowing you to
configure a
<a href="https://www.elastic.co/docs/solutions/observability/incident-management/create-tls-certificate-rule">Synthetics TLS certificate rule</a>
in Kibana to trigger an alert when expiration is within a specified threshold
(e.g., 30 days).</p>
<p>While effective for external-facing services, this &quot;classical&quot; approach has two
major shortcomings when dealing with Kubernetes:</p>
<ul>
<li>
<p>It only works for certificates exposed via HTTP(S), meaning you cannot use
this for internal services, databases, or message queues using other protocols.
In other words, this won't work to monitor common, critical TLS certificates
such as Kafka's.</p>
</li>
<li>
<p>The monitoring agent must have network access to the endpoint. In a segmented
or private Kubernetes environment, deploying agents with the necessary access
often introduces unnecessary complexity or security risks.</p>
</li>
</ul>
<p>To gain true cluster-wide visibility, we need to inspect the certificates at
their source: <em>inside</em> Kubernetes Secrets or ConfigMaps.</p>
<h2>A Kubernetes-native approach: monitor Secrets and ConfigMaps</h2>
<p>Monitoring TLS certificate expiration directly within Kubernetes Secrets and
ConfigMaps is the only reliable way to gain visibility into internal,
non-HTTP-exposed certificates, such as those used for service meshes, internal
registries, or databases. In this section, we will use the OpenTelemetry Collector to
monitor certificate expiration.</p>
<p>The OpenTelemetry Collector provides a mechanism to read
up-to-date information from the Kubernetes API, including Secrets, via the
<a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver">k8sobjects receiver</a>.
However, this receiver only fetches <em>raw</em> TLS certificate resource data,
which the OpenTelemetry Transformation Language (OTTL) can not properly parse.
Therefore, we need to use a dedicated exporter to collect the certificate data
and expose the results in a digestible format.</p>
<h3>The industry-standard solution</h3>
<p>As mentioned above, simply reading certificate information from the Kubernetes API
is not a feasible solution. We will therefore use a specialized,
lightweight exporter (specifically, the popular
<a href="https://github.com/enix/x509-certificate-exporter">x509-certificate-exporter</a>)
to collect TLS certificate data and expose the results,
allowing the OpenTelemetry Collector's Prometheus receiver to seamlessly
scrape the data and send it to Elastic Observability.
This approach immediately and easily enables us to monitor both certificates
generated by <code>cert-manager</code> and self-managed ones, such as the ones created for
ECK.</p>
<p>A fully working configuration example and a script to set up a complete local
development environment is available <a href="https://github.com/elastic/edot-certificate-monitoring-blog-post">here</a>.
Feel free to use it to follow along as you read through this guide and try out the examples.
Please note that, while this repository uses the Elastic Distribution of OpenTelemetry (EDOT),
it can be easily adapted to use the OpenTelemetry Collector.</p>
<h4>Helm Chart Configuration</h4>
<p>We configured the <code>x509-certificate-exporter</code> with the official Helm Chart and
used the following minimal configuration:</p>
<pre><code class="language-yaml">secretsExporter:
  secretTypes:
  - type: kubernetes.io/tls
    key: tls.crt
  # For ECK that uses different secret types
  - type: Opaque
    key: tls.crt
  - type: Opaque
    key: ca.crt
  configMapKeys:
  - tls.crt
  - ca.crt

# Create a service to have a stable endpoint for scraping metrics
service:
  create: true
  # -- TCP port to expose the Service on
  port: 9793

# Disable prometheus service monitor and prometheus rules
prometheusServiceMonitor:
  create: false
prometheusRules:
  create: false
</code></pre>
<p>We refer to the reference values.yaml to get insights in the plethora of
configuration options.</p>
<h4>OpenTelemetry Collector Configuration</h4>
<p>Afterward, we configured the OpenTelemetry Collector to scrape the metrics from the
service:</p>
<pre><code class="language-yaml">prometheus/cert-expiration:
  config:
    scrape_configs:
      - job_name: &quot;cert-expiration&quot;
        scrape_interval: 60m
        static_configs:
          - targets:
              - &quot;x509-certificate-exporter.monitoring.svc.cluster.local:9793&quot;
</code></pre>
<p>We deliberately used a long scrape interval of 60 minutes, because certificate
expiration is a low-frequency concern.</p>
<h4>Visualizing the data in Kibana</h4>
<p>Once the data is ingested, we can explore it using Discover. We can select the
<code>metrics-*</code> Data View and search for our
data with the filter <code>data_stream.dataset : &quot;prometheusreceiver.otel&quot;</code>.</p>
<p>An example document looks like the following:</p>
<pre><code class="language-json">{
  &quot;@timestamp&quot;: &quot;2025-12-19T09:43:45.317Z&quot;,
  &quot;_metric_names_hash&quot;: &quot;7d113f55b70019d9&quot;,
  &quot;attributes&quot;: {
    &quot;issuer_CN&quot;: &quot;tls-cert.example.com&quot;,
    &quot;issuer_O&quot;: &quot;TLS Cert&quot;,
    &quot;secret_key&quot;: &quot;tls.crt&quot;,
    &quot;secret_name&quot;: &quot;tls-cert-secret&quot;,
    &quot;secret_namespace&quot;: &quot;test-certs&quot;,
    &quot;serial_number&quot;: &quot;250887723804527203192865532237673843132727735771&quot;,
    &quot;subject_CN&quot;: &quot;tls-cert.example.com&quot;,
    &quot;subject_O&quot;: &quot;TLS Cert&quot;
  },
  &quot;data_stream&quot;: {
    &quot;dataset&quot;: &quot;prometheusreceiver.otel&quot;,
    &quot;namespace&quot;: &quot;default&quot;,
    &quot;type&quot;: &quot;metrics&quot;
  },
  &quot;metrics&quot;: {
    &quot;x509_cert_expired&quot;: 0,
    &quot;x509_cert_not_after&quot;: 1768488242,
    &quot;x509_cert_not_before&quot;: 1765896242
  },
  &quot;resource&quot;: {
    &quot;attributes&quot;: {
      &quot;server.address&quot;: &quot;x509-certificate-exporter.monitoring.svc.cluster.local&quot;,
      &quot;server.port&quot;: &quot;9793&quot;,
      &quot;service.instance.id&quot;: &quot;x509-certificate-exporter.monitoring.svc.cluster.local:9793&quot;,
      &quot;service.name&quot;: &quot;cert-expiration&quot;,
      &quot;url.scheme&quot;: &quot;http&quot;
    }
  },
  &quot;scope&quot;: {
    &quot;name&quot;: &quot;github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver&quot;,
    &quot;version&quot;: &quot;9.2.2&quot;
  }
}
</code></pre>
<p>The core metric reported by the <code>x509-certificate-exporter</code> is
<code>x509_cert_not_after</code> that represent the Unix Epoch timestamp (in seconds) of the certificate's
expiration date. This metric has some attributes associated with it.
In the case of <code>Secrets</code>, the following attributes are relevant:</p>
<ul>
<li><code>secret_namespace</code>: The namespace of the Secret containing the certificate.</li>
<li><code>secret_name</code>: The name of the Secret containing the certificate.</li>
<li><code>secret_key</code>: The specific key within the Secret where the certificate is stored.</li>
</ul>
<p>In the case of <code>ConfigMaps</code>, we can infer the attributes of interest
from the <code>filepath</code> attribute.</p>
<p>Finally, we can leverage ES|QL to compute the remaining days until expiration.
In the following examples, we will use the <a href="https://www.elastic.co/docs/reference/query-languages/esql/commands/ts"><code>TS</code> command</a>,
which is optimized and recommended for interacting with time-series data.</p>
<p>For <code>Secrets</code>:</p>
<pre><code class="language-sql">TS metrics-*
| WHERE metrics.x509_cert_not_after is not NULL
| STATS expiration_date = MAX(LAST_OVER_TIME(metrics.x509_cert_not_after)) by attributes.secret_namespace, attributes.secret_name, attributes.secret_key
| EVAL remaining_days = DATE_DIFF(&quot;days&quot;, NOW(), TO_DATETIME (1000 * expiration_date))
| EVAL expiration_date = TO_DATETIME(1000 * expiration_date)
| SORT expiration_date ASC
</code></pre>
<p>And for <code>ConfigMaps</code>:</p>
<pre><code class="language-sql">TS metrics-*
| WHERE metrics.x509_cert_not_after IS NOT NULL
| WHERE attributes.filepath IS NOT NULL
| DISSECT attributes.filepath &quot;k8s/%{namespace}/%{configmap}&quot;
| WHERE configmap != &quot;kube-root-ca.crt&quot; // Filter out the Kubernetes API server certificate's signing CA
| STATS expiration_date = MAX(LAST_OVER_TIME(metrics.x509_cert_not_after)) by namespace, configmap, filename
| EVAL remaining_days = DATE_DIFF(&quot;days&quot;, NOW(), TO_DATETIME (1000 * expiration_date))
| EVAL expiration_date = TO_DATETIME(1000 * expiration_date)
| SORT expiration_date ASC
</code></pre>
<p>Based on these core queries, we can easily build a dashboard that shows the
remaining days until expiration for all the certificates in the cluster:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/edot-certificate-monitoring/dashboard.png" alt="Kibana Certificate Expiration Dashboard" /></p>
<p>and create alerts about certificates that are about to expire by adding a
condition after the query:</p>
<pre><code class="language-sql">WHERE remaining_days &lt; 30
</code></pre>
<h3>Conclusion</h3>
<p>In this blog post, we explored how to monitor TLS certificate expiration
within a Kubernetes cluster using the OpenTelemetry Collector.
We discussed the limitations of traditional HTTP-based monitoring
approaches and introduced a Kubernetes-native solution leveraging the
<code>x509-certificate-exporter</code> to extract certificate expiration data directly from
Kubernetes Secrets and ConfigMaps. This method provides comprehensive visibility
into all certificates used within the cluster, including those not exposed via
HTTP(S).</p>
<p>For the sake of simplicity, we just focused on monitoring certificate expiration
with the OpenTelemetry Collector on Kubernetes. However, this approach can be easily applied
with classical Elastic Agent by leveraging the
<a href="https://www.elastic.co/docs/reference/integrations/prometheus_input">Prometheus input package</a>
(read more on how to use input packages
<a href="https://www.elastic.co/observability-labs/blog/customize-data-ingestion-input-packages">here</a>)
and can be also extended to monitor certificates on virtual machines or
bare-metal servers by deploying the <code>x509-certificate-exporter</code> there.</p>
<p>Finally, is worth knowing that Elastic Observability, offers an officially supported
distribution of the OpenTelemetry Collector,
called <a href="https://www.elastic.co/observability-labs/blog/elastic-distributions-opentelemetry">Elastic Distributions of OpenTelemetry (EDOT)</a>.</p>
<p>If you are an Elastic user, you could consider using EDOT Collector to monitor certificates with
OpenTelemetry: since it is supported by Elastic Observability, it will be easier to manage and keep up to date. Alternatively you can use upstream OTel compnents also.</p>
<h3>What's next?</h3>
<p>Now that Elastic supports
<a href="https://www.elastic.co/docs/reference/fleet/alerting-rule-templates">Rule Templates</a>
and <a href="https://www.elastic.co/docs/solutions/observability/apm/opentelemetry">OpenTelemetry content packs</a>,
our near-term objective is to contribute to the integration repository to make
the setup of certificate monitoring even easier for our users.
Stay tuned for more updates on this!</p>
<p>Check out other resources on Elastic's OpenTelemetry</p>
<p><a href="https://www.elastic.co/observability-labs/blog/elastic-managed-otlp-endpoint-for-opentelemetry">Elastic's OTLP EndPoint</a></p>
<p><a href="https://www.elastic.co/observability-labs/blog/opentelemetry-accepts-elastics-donation-of-edot">Elastic's EDOT PHP Contribution</a></p>
<p><a href="https://www.elastic.co/observability-labs/blog/elastic-distribution-opentelemetry-sdk-central-configuration-opamp">Opentelemetry SDK Central Management with EDOT</a></p>
<p>Also sign up for <a href="https://cloud.elastic.co">Elastic Cloud</a> and try out your application with OpenTelemetry in Elastic</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/edot-certificate-monitoring/edot-certificate-monitoring.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Monitoring Proxmox VE deployments with Elastic Observability]]></title>
            <link>https://www.elastic.co/observability-labs/blog/monitoring-proxmox-ve-with-elastic</link>
            <guid isPermaLink="false">monitoring-proxmox-ve-with-elastic</guid>
            <pubDate>Wed, 23 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Monitoring Proxmox VE deployments, VMs, and Linux Containers with Elastic Observability.]]></description>
            <content:encoded><![CDATA[<p>In this blog post, you will learn how to leverage Elastic Observability to monitor Proxmox VE and the software running on top of it, both in the form of Linux Containers (LXCs) and Virtual Machines (VMs).</p>
<h2>Why use Elastic Observability with Proxmox?</h2>
<p>Here at Elastic, we are passionate about efficiently managing and monitoring infrastructure and applications. Many of us have fun playing with home labs, oftentimes running Proxmox VE, a powerful open-source virtualization platform used to run virtual machines and Linux Containers (LXCs) with ease. While Proxmox provides robust tools for managing virtualized resources, gaining deep insights into the performance and health of your LXCs, VMs, and hosts requires a comprehensive monitoring solution. This blog post will guide you through leveraging the power of Elastic Observability, in conjunction with Elastic Agent, to effectively monitor your Proxmox VE deployment, ensuring optimal performance and proactive issue resolution thanks to Kibana Alerts.</p>
<h2>The homelab setup</h2>
<p>Our homelab setup centers around an Intel N100 mini PC, serving as the host for Proxmox VE. This setup is simple and minimal, yet effective for showcasing a few interesting capabilities. On top of this mini PC, we run several Linux Containers (LXCs) for various services, along with a dedicated virtual machine for Home Assistant.</p>
<h2>Elastic Agent installation and configuration</h2>
<p>Before beginning, it is worth noting that there are numerous ways to install and configure the Elastic Agent. For the sake of simplicity, we will showcase a setup in which only one instance of the Elastic Agent is running on the host machine. The Elastic Agent reports to an Elastic Cloud Observability deployment and is managed via Fleet, which makes it tremendously easy to upgrade and re-configure it whenever needed.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/fleet-prox.jpg" alt="The Elastic Integrations enabled for our Proxmox host" /></p>
<h2>Diving into the host</h2>
<p>Kibana offers various panes that make it nice and easy to learn about a system's health at a quick glance.</p>
<p>As a first step, let's take a look at the <code>Infrastructure &gt; Hosts</code> page in Kibana:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/kibana-infrastructure-hosts-proxmox.jpg" alt="The Infrastructure &gt; Hosts Kibana page for our Proxmox host" /></p>
<p>Here we can see various information about our Proxmox VE host (i.e. the mini PC). The top processes running on it are presented, including processes running in LXCs such as <code>pia-daemon</code>. We can also see a <code>kvm</code> process, specifically running a Home Assistant virtual machine, and a Proxmox <code>pve-firewall</code> process.</p>
<p>Let's now take a look at <code>Universal Profiling &gt; Flamegraph</code>. This graph shows how much CPU time is consumed by different stack traces from processes running on the host system. You can drill down into specific processes using the search bar at the top. For instance, you can filter by <code>kvm</code> to only see information regarding this specific process.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/universal-profiling-flamegraph-kvm.jpg" alt="The Universal Profiling &gt; Flamegraph Kibana page for our Proxmox host" /></p>
<h2>The Observability AI Assistant</h2>
<p>All the Kibana panes we visited so far have proved to be highly interesting, but they struggle to answer urgent questions such as:</p>
<ul>
<li>did anything happen in our mini PC recently?</li>
<li>was there any significant change in functionality?</li>
<li>is there any precious information hidden among the thousands of data points collected?</li>
</ul>
<p>The Elastic Observability AI Assistant helps us by answering these questions in natural language. By default, on Elastic Cloud, it uses the Elastic-managed LLM connector, which means users do not need to configure anything to get started with it. It just works!</p>
<p>Let's go to the <code>Observability &gt; AI Assistant</code> pane in Kibana and let's try to ask a generic prompt such as: &quot;please give me an overview of the health of my <code>prox</code> host&quot;.</p>
<p>Let's then wait a minute so that it can dig into the data... et voilà, here comes lots of relevant information in the form of graphs and natural language explanations. The Observability AI Assistant understood our question, went through all the data for our Proxmox host, ran data analytics on it, and reported back in a matter of seconds!</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/observability-ai-assistant-1.jpg" alt="The Observability AI Assistant's first reply" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/observability-ai-assistant-2.jpg" alt="The Observability AI Assistant's second reply" /></p>
<h2>Alerting upon disruption with Kibana Alerts</h2>
<p>As a final step, let's try to define a Kibana Alert to help us understand whether our host is overloaded. Let's head to <code>Observability &gt; Alerts &gt; Rules</code> and create a new rule. We will create a Custom Threshold rule that will fire if CPU usage for the host is higher than 80% on average for the last 15 minutes. Kibana will send us an email in case the rule fires. The rule is also configured to fire if no data appears for the last 15 minutes, which is extremely helpful as it would imply the presence of some issues to be debugged: broken network or no electricity in the house, a faulty Agent deployment, or even a hardware issue with the mini PC.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/rule-cpu-over-80.jpg" alt="The Kibana Alerting Rule for CPU being over 80 percent" /></p>
<h2>Conclusion</h2>
<p>In this blog post we showcased how to effectively use the Elastic Stack to monitor Proxmox VE deployments. If you would like to try out such a setup first-hand, you are more than welcome to enjoy <a href="https://www.elastic.co/cloud/cloud-trial-overview">Elastic Cloud's 14-days free trial</a>.</p>
<p>In future blog posts, we will investigate how to dig deeper into LXCs and VMs to gather even more information from our home lab and create more tailored alerts. Stay tuned!</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/monitoring-proxmox-ve-with-elastic/article-image.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>