<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Articles by Christos Kalkanis</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Mon, 08 Jun 2026 15:18:17 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Articles by Christos Kalkanis</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[Elastic contributes its Universal Profiling agent to OpenTelemetry]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elastic-profiling-agent-acceptance-opentelemetry</link>
            <guid isPermaLink="false">elastic-profiling-agent-acceptance-opentelemetry</guid>
            <pubDate>Thu, 06 Jun 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Elastic is advancing the adoption of OpenTelemetry with the contribution of its universal profiling agent. Elastic is committed to ensuring a vendor-agnostic ingestion and collection of observability and security telemetry through OpenTelemetry.]]></description>
            <content:encoded><![CDATA[<p>Following great collaboration between Elastic and OpenTelemetry's profiling community, which included a thorough review process, the OpenTelemetry community has accepted Elastic's donation of our continuous profiling agent. This marks a significant milestone in helping establish profiling as the fourth telemetry signal in OpenTelemetry. Elastic’s eBPF-based continuous profiling agent observes code across different programming languages and runtimes, third-party libraries, kernel operations, and system resources with low CPU and memory overhead in production. SREs can now benefit from these capabilities: quickly identifying performance bottlenecks, maximizing resource utilization, reducing carbon footprint, and optimizing cloud spend.
Over the past year, we have been instrumental in <a href="https://opentelemetry.io/blog/2023/ecs-otel-semconv-convergence/">enhancing OpenTelemetry's Semantic Conventions</a> with the donation of Elastic Common Schema (ECS), contributing to the OpenTelemetry Collector and language SDKs, and have been working with OpenTelemetry’s Profiling Special Interest Group (SIG) to lay the foundation necessary to make profiling stable.</p>
<p>With today’s acceptance, we are officially contributing our continuous profiler technology to OpenTelemetry. We will also dedicate a team of profiling domain experts to co-maintain and advance the profiling capabilities within OTel.</p>
<p>We want to thank the OpenTelemetry community for the great and constructive cooperation on the donation proposal. We look forward to jointly establishing continuous profiling as an integral part of OpenTelemetry.</p>
<h2>What is continuous profiling?</h2>
<p>Profiling is a technique used to understand the behavior of a software application by collecting information about its execution. This includes tracking the duration of function calls, memory usage, CPU usage, and other system resources.</p>
<p>However, traditional profiling solutions have significant drawbacks limiting adoption in production environments:</p>
<ul>
<li>Significant cost and performance overhead due to code instrumentation</li>
<li>Disruptive service restarts</li>
<li>Inability to get visibility into third-party libraries</li>
</ul>
<p>Unlike traditional profiling, which is often done only in a specific development phase or under controlled test conditions, continuous profiling runs in the background with minimal overhead. This provides real-time, actionable insights without replicating issues in separate environments. SREs, DevOps, and developers can see how code affects performance and cost, making code and infrastructure improvements easier.</p>
<h2>Contribution of production-grade features</h2>
<p>Elastic Universal Profiling is a whole-system, always-on, continuous profiling solution that eliminates the need for code instrumentation, recompilation, on-host debug symbols or service restarts. Leveraging eBPF, Elastic Universal Profiling profiles every line of code running on a machine, including application code, kernel, and third-party libraries. The solution measures code efficiency in three dimensions, CPU utilization, CO2, and cloud cost, to help organizations manage efficient services by minimizing computational waste.</p>
<p>The Elastic profiling agent facilitates identifying non-optimal code paths, uncovering &quot;unknown unknowns&quot;, and provides comprehensive visibility into the runtime behavior of all applications. Elastic’s continuous profiling agent supports various runtimes and languages, such as C/C++, Rust, Zig, Go, Java, Python, Ruby, PHP, Node.js, V8, Perl, and .NET.</p>
<p>Additionally, organizations can meet sustainability objectives by minimizing computational wastage, ensuring seamless alignment with their strategic <a href="https://en.wikipedia.org/wiki/Environmental,_social,_and_corporate_governance">ESG</a> goals.</p>
<h2>Benefits to OpenTelemetry</h2>
<p>This contribution not only boosts the standardization of continuous profiling for observability but also accelerates the practical adoption of profiling as the fourth key signal in OTel. Customers get a vendor-agnostic way of collecting profiling data and enabling correlation with existing signals, like tracing, metrics, and logs, opening <a href="https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation">new potential for observability insights and a more efficient troubleshooting experience</a>. </p>
<p>OTel-based continuous profiling unlocks the following possibilities for users:</p>
<ul>
<li>Improved customer experience: delivering consistent service quality and performance through continuous profiling ensures customers have an application that performs optimally, remains responsive, and is reliable.</li>
</ul>
<ul>
<li>Maximize gross margins: Businesses can optimize their cloud spend and improve profitability by reducing the computational resources needed to run applications. Whole system continuous profiling identifies the most expensive functions (down to the lines of code) across diverse environments that may span multiple cloud providers. In the cloud context, every CPU cycle saved translates to money saved. </li>
</ul>
<ul>
<li>Minimize environmental impact: energy consumption associated with computing is a growing concern (source: <a href="https://energy.mit.edu/news/energy-efficient-computing/">MIT Energy Initiative</a> ). More efficient code translates to lower energy consumption, reducing carbon (CO2) footprint. </li>
</ul>
<ul>
<li>Accelerate engineering workflows: continuous profiling provides detailed insights to help troubleshoot complex issues faster, guide development, and improve overall code quality.</li>
</ul>
<ul>
<li>Improved vendor neutrality and increased efficiency: an OTel eBPF-based profiling agent removes the need to use proprietary APM agents and offers a more efficient way to collect profiling telemetry.</li>
</ul>
<p>With these benefits, customers can now manage the overall application’s efficiency on the cloud while ensuring their engineering teams optimize it.</p>
<h2>What comes next?</h2>
<p>While the acceptance of Elastic’s donation of the profiling agent marks a significant milestone in the evolution of OTel’s eBPF-based continuous profiling capabilities, it represents the beginning of a broader journey. Moving forward, we will continue collaborating closely with the OTel Profiling and Collector SIGs to ensure seamless integration of the profiling agent within the broader OTel ecosystem. During this phase, users can test early preview versions of the OTel profiling integration by following the directions in the <a href="https://github.com/elastic/otel-profiling-agent/">otel-profiling-agent</a> repository.</p>
<p>Elastic remains deeply committed to OTel’s vision of enabling cross-signal correlation. We plan to further contribute to the community by sharing our innovative research and implementations, specifically those facilitating the correlation between profiling data and distributed traces, across several OTel language SDKs and the profiling agent.</p>
<p>We are excited about our <a href="https://opentelemetry.io/blog/2023/ecs-otel-semconv-convergence/">growing relationship with OTel</a> and the opportunity to donate our profiling agent in a way that benefits both the Elastic community and the broader OTel community. Learn more about <a href="https://www.elastic.co/observability/opentelemetry">Elastic’s OpenTelemetry support</a> and learn how to contribute to the ongoing profiling work in the community.</p>
<h2>Additional Resources</h2>
<p>Additional details on Elastic’s Universal Profiling can be found in the <a href="https://www.elastic.co/observability-labs/blog/elastic-profiling-agent-acceptance-opentelemetry-faq">FAQ</a>.</p>
<p>For insights into observability, visit Observability labs where OTel specific articles are also available.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elastic-profiling-agent-acceptance-opentelemetry/profiling-acceptance.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Elastic Universal Profiling agent, a continuous profiling solution, is now open source]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elastic-universal-profiling-agent-open-source</link>
            <guid isPermaLink="false">elastic-universal-profiling-agent-open-source</guid>
            <pubDate>Mon, 15 Apr 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[At Elastic, open source isn't just philosophy, it's our DNA. Dive into the future with our open-sourced Universal Profiling agent, revolutionizing software efficiency and sustainability.]]></description>
            <content:encoded><![CDATA[<p>Elastic Universal Profiling™ agent is now open source! The industry’s most advanced fleetwide continuous profiling solution empowers users to identify performance bottlenecks, reduce cloud spend, and minimize their carbon footprint. This post explores the history of the agent, its move to open source, and its future integration with OpenTelemetry.</p>
<h2>Elastic Universal Profiling™ Agent goes open source under Apache 2</h2>
<p>At Elastic, open source is more than just a philosophy — it's our DNA. We believe the benefits of whole-system continuous profiling extend far beyond performance optimization. It's a win for businesses and the planet alike. For instance, since launching Elastic Universal Profiling in general availability (GA), we've observed a wide variety of use cases from customers.</p>
<p>These range from customers relying fully on Universal Profiling's <a href="https://www.elastic.co/guide/en/observability/current/universal-profiling.html#profiling-differential-views-intro">differential flame graphs and topN functions</a> for insights during release management to utilizing AI assistants for quickly optimizing expensive functions. This includes using profiling data to identify the optimal energy-efficient cloud region to run certain workloads. Additionally, customers are using insights that Universal Profiling provides to build evidence to challenge cloud provider bills. As it turns out, cloud providers' in-VM agents can consume a significant portion of the CPU time, which customers are billed for.</p>
<p>In a move that will empower the community to take advantage of continuous profiling's benefits, <strong>we're thrilled to announce that the Elastic Universal Profiling agent</strong> , a pioneering eBPF-based continuous profiling agent, <strong>is now open source under the Apache 2 license!</strong></p>
<p>This move democratizes <strong>hyper-scaler efficiency for everyone</strong> , opening exciting new possibilities for the future of continuous profiling, as well as its role in observability and <strong>OpenTelemetry</strong>.</p>
<h2>Implementation of the OpenTelemetry (OTel) Profiling protocol</h2>
<p>Our commitment to open source goes beyond just the agent itself. We recently <a href="https://www.elastic.co/blog/elastic-donation-proposal-to-contribute-profiling-agent-to-opentelemetry">announced our intent to donate</a> the agent to OpenTelemetry and have further solidified this goal by implementing the experimental <a href="https://github.com/open-telemetry/oteps/blob/main/text/profiles/0239-profiles-data-model.md">OTel Profiling data model</a>. This allows the open-sourced eBPF-based continuous profiling agent to communicate seamlessly with OpenTelemetry backends.</p>
<p>But that's not all! We've also launched an innovative feature that <a href="https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation">correlates profiling data with OpenTelemetry distributed traces</a>. This powerful capability offers a deeper level of insight into application performance, enabling the identification of bottlenecks with greater precision. Upon donating the Profiling agent to OTel, Elastic will also contribute critical components that enable distributed trace correlation within the <a href="https://github.com/elastic/elastic-otel-java">Elastic distribution of the OTel Java agent</a> to the upstream OTel Java SDK. This underscores Elastic Observability's commitment to both open source and the support of open standards like OpenTelemetry while pushing the boundaries of what is possible in observability.</p>
<h2>What does this mean for Elastic Universal Profiling customers?</h2>
<p>We'd like to express our <strong>immense gratitude to all our customers</strong> who have been part of this journey, from the early stages of private beta to GA. Your feedback has been invaluable in shaping Universal Profiling into the powerful product it is today.</p>
<p>By open-sourcing the Universal Profiling agent and contributing it to OpenTelemetry, we're fostering a win-win situation for both you and the broader community. This move opens doors for innovation and collaboration, ultimately leading to a more robust and versatile whole-system continuous profiling solution for everyone.</p>
<p>Furthermore, we're actively working on exciting novel ways to integrate Universal Profiling seamlessly within Elastic Observability. Expect further announcements soon, outlining how you can unlock even greater value from your profiling data within a unified observability experience in a way that has never been done before.</p>
<p>The open-sourced agent is using the recently released (experimental) OTel Profiling <a href="https://github.com/open-telemetry/opentelemetry-proto/pull/534">signal</a>. As a precaution, we recommend not using it in production environments.</p>
<p>Please continue using the official Elastic distribution of the Universal Profiling agent until the agent is formally accepted by OTel and the protocol reaches a stable phase. There's no need to take any action at this time, and we will ensure to have a smooth transition plan in place for you.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-universal-profiling-agent-open-source/image1.png" alt="1 - Elastic Universal Profiling" /></p>
<h2>What does this mean for the OpenTelemetry community?</h2>
<p>OpenTelemetry is adopting continuous profiling as a key signal. By open-sourcing the eBPF-based profiling agent and working towards donating it to OTel, Elastic is making it possible to accelerate the standardization of continuous profiling within OpenTelemetry. This move has a massive impact on the observability community, empowering everyone to continuously profile their systems with a standardized protocol.</p>
<p>This is particularly timely as <a href="https://www.bbc.co.uk/news/technology-32335003">Moore's Law</a> slows down and cloud computing takes hold, making computational efficiency critical for businesses.</p>
<p>Here's how whole-system continuous profiling benefits you:</p>
<ul>
<li>
<p><strong>Maximize gross margins:</strong> By reducing the computational resources needed to run applications, businesses can optimize their cloud spend and improve profitability. Whole-system continuous profiling is one way of identifying the most expensive applications (down to the lines of code) across diverse environments that may span multiple cloud providers. This principle aligns with the familiar adage, <em>&quot;a penny saved is a penny earned.&quot;</em> In the cloud context, every CPU cycle saved translates to money saved.</p>
</li>
<li>
<p><strong>Minimize environmental impact:</strong> Energy consumption associated with computing is a growing concern (source: <a href="https://energy.mit.edu/news/energy-efficient-computing/">MIT Energy Initiative</a>). More efficient code translates to lower energy consumption, contributing to a reduction in carbon footprint.</p>
</li>
<li>
<p><strong>Accelerate engineering workflows:</strong> Continuous profiling provides detailed insights to help debug complex issues faster, guide development, and improve overall code quality.</p>
</li>
</ul>
<p>This is where Elastic Universal Profiling comes in — designed to help organizations run efficient services by minimizing computational wastage. To this end, it measures code efficiency in three dimensions: <strong>CPU utilization</strong> , <strong>CO</strong>** 2 <strong>, and</strong> cloud cost**.</p>
<p>Elastic's journey with continuous profiling began by joining forces with <a href="https://www.elastic.co/about/press/elastic-and-optimyze-join-forces-to-deliver-continuous-profiling-of-infrastructure-applications-and-services">optimyze.cloud</a> –– this became the foundation for <a href="https://www.elastic.co/observability/universal-profiling">Elastic Universal Profiling</a>. We are excited to see this product evolve into its next growth phase in the open-source world.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-universal-profiling-agent-open-source/image2.png" alt="2 - car manufacturers" /></p>
<h2>Ready to give it a spin?</h2>
<p>As Elastic Universal Profiling transitions into this new open source era, the potential for transformative impact on performance optimization, cost efficiency, and environmental sustainability is immense. Elastic's approach — balancing innovation with responsibility — paves the way for a future where technology not only powers our world but does so in a way that is sustainable and accessible to all.</p>
<p>Get started with the open source Elastic Universal Profiling agent today! <a href="https://github.com/elastic/otel-profiling-agent/">Download it directly from GitHub</a> and follow the instructions in the repository.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-universal-profiling-agent-open-source/image3.png" alt="3 - dripping graph and data" /></p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elastic-universal-profiling-agent-open-source/tree_tunnel.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry Profiles Signal Enters Alpha: Elastic’s Continuous Commitment to Profiling]]></title>
            <link>https://www.elastic.co/observability-labs/blog/otel-profiling-alpha</link>
            <guid isPermaLink="false">otel-profiling-alpha</guid>
            <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[OpenTelemetry Profiles has officially reached Alpha, entrenching profiling as the fourth observability signal. Elastic's core contribution of its eBPF profiling agent, continued OpenTelemetry Profiles signal work and commitment to a vendor-agnostic ecosystem are driving this industry-wide standard forward.]]></description>
            <content:encoded><![CDATA[<p>Following intensive collaboration between Elastic and the OpenTelemetry community, we are thrilled to announce that the OpenTelemetry Profiles signal has officially entered public Alpha.
This milestone is a testament to the community's dedication and marks a significant step towards establishing profiling as the fourth key observability signal in OpenTelemetry, alongside logs, metrics and traces.</p>
<p>As a core contributor, Elastic is proud to have accelerated this effort by previously donating its Universal Profiling™ eBPF-based continuous profiling agent to OpenTelemetry.
This production-grade agent enables whole-system visibility across all applications, covering a multitude of programming languages and runtimes including third-party libraries and kernel operations with minimal overhead.
It allows SREs and developers to quickly identify performance bottlenecks, maximize resource utilization, and optimize cloud spend.</p>
<p>Additionally, over the last two years, Elastic has been heavily contributing to the OpenTelemetry Collector, Semantic Conventions and Profiling Special Interest Groups (SIGs) to lay the technical foundation for the promotion of Profiles to Alpha.</p>
<p>This Alpha milestone not only boosts the standardization of continuous profiling but also accelerates the practical adoption of profiling as the fourth key signal in observability.
Customers now have a vendor-agnostic way of collecting profiling data and enabling correlation with existing signals, like logs, metrics and traces, unveiling new potential for observability insights and a more efficient troubleshooting experience.</p>
<h2>What is continuous profiling?</h2>
<p>Profiling is a technique used to understand the behavior of a software application by collecting information about its execution.
This includes tracking the duration of function calls, memory usage, CPU usage, and other system resources.</p>
<p>However, traditional profiling solutions have significant drawbacks limiting adoption in production environments:</p>
<ul>
<li>Significant cost and performance overhead due to code instrumentation</li>
<li>Disruptive service restarts</li>
<li>Inability to get visibility into third-party libraries</li>
</ul>
<p>Unlike traditional profiling, which is often done only in a specific development phase or under controlled test conditions, continuous profiling runs in the background with minimal overhead, eliminating the need for service restarts or manual intervention.
This provides real-time, actionable insights without replicating issues in separate environments.
SREs, DevOps, and developers can see how code affects performance and cost, making code and infrastructure improvements easier.</p>
<h2>Elastic's contribution: Powering the Alpha</h2>
<p>The Elastic-donated profiler now forms the reference eBPF-based profiler implementation within OpenTelemetry: <a href="https://github.com/open-telemetry/opentelemetry-ebpf-profiler/">opentelemetry-ebpf-profiler</a>.
With the Alpha release, the eBPF profiler operates as an OpenTelemetry Collector receiver and contains numerous improvements such as automatic Go symbolization and support for new language runtimes.
Operating as an OpenTelemetry Collector receiver enables the profiler to seamlessly leverage existing OpenTelemetry processing and filtering pipelines.</p>
<p>For example, the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor">k8sattributesprocessor</a> can use the <code>container.id</code> resource attribute to automatically enrich every profile with its corresponding Kubernetes context.
This means you don't just see a raw stack trace; you see exactly which namespace, pod, and deployment produced it.</p>
<pre><code class="language-yaml">receivers:
  # Profiling receiver
  profiling: {}

processors:
  k8sattributes:
    passthrough: false 
    pod_association:
      - sources:
          - from: resource_attribute
            name: container.id
    extract:
      metadata:
        - &quot;k8s.namespace.name&quot;
        - &quot;k8s.deployment.name&quot;
        - &quot;k8s.replicaset.name&quot;
        - &quot;k8s.statefulset.name&quot;
        - &quot;k8s.daemonset.name&quot;
        - &quot;k8s.node.name&quot;
        - &quot;k8s.pod.name&quot;
        - &quot;k8s.pod.ip&quot;
        - &quot;k8s.pod.uid&quot;
</code></pre>
<p>Besides improvements to the eBPF profiler, Elastic has made significant contributions to:</p>
<ul>
<li>Correlating profiles with the information produced by OpenTelemetry eBPF instrumentation (<a href="https://opentelemetry.io/docs/zero-code/obi/">OBI</a>), a powerful auto-instrumentation tool that can enable distributed tracing.</li>
<li><a href="https://github.com/open-telemetry/opentelemetry-specification/pull/4719">Process Context Sharing OTEP</a> which is designed to bridge the gap between application SDKs and the profiler. This mechanism will allow OpenTelemetry SDKs to &quot;publish&quot; their resource attributes (like <code>service.name</code>) into a small, standardized memory region. Because this data is stored in the process's own memory map, the eBPF Profiler can instantly discover and associate it with its corresponding Profile.</li>
<li>Semantic conventions and integration of OpenTelemetry Profiles with Google's pprof format (transparent conversion)</li>
<li>OpenTelemetry Collector processing pipelines, allowing it to better integrate with the profiling receiver</li>
</ul>
<h2>Elastic's Next-Generation Profiling Development</h2>
<p>Elastic remains deeply committed to OpenTelemetry's vision and is pushing the boundaries of what is possible with profiling data.
We are dedicating a team of profiling domain experts to co-maintain and advance profiling capabilities within OpenTelemetry, while simultaneously working on groundbreaking features built on this new open standard.</p>
<p>Exciting areas of internal profiling-specific development include:</p>
<ul>
<li>OpenTelemetry Profiles derived Metrics: We are developing innovative ways to automatically generate actionable performance metrics directly from the raw OTel Profiles data, providing a new dimension for infrastructure modeling and alerting.</li>
<li>Rapid Integration with the Elastic Stack: We are making swift progress on first-class support for OTLP Profiles within the Elastic Stack, ensuring seamless ingestion (the ebpf-profiler receiver is already integrated with the <a href="https://github.com/elastic/elastic-agent/tree/main/internal/edot#components">Elastic Distributions of OpenTelemetry (EDOT) collector</a>), storage, and visualization of this new signal alongside your existing logs, metrics and traces.</li>
<li>AI-Powered Workflows: We are leveraging the deep insights provided by continuous profiling data to power new AI-driven workflows, enabling automatic root-cause analysis, anomaly detection, and intelligent optimization suggestions for both code and infrastructure.</li>
</ul>
<p>While the Alpha release marks a significant milestone, it is just the beginning.
We encourage the community to start testing early preview versions of the OTel Profiles integration and contribute to the ongoing profiling work.
To get started with an actual, local deployment, you can use the <a href="https://github.com/open-telemetry/opentelemetry-ebpf-profiler">OpenTelemetry eBPF profiler</a> in combination with a self-hosted <a href="https://www.elastic.co/docs/solutions/observability">Elastic Observability Stack</a> or <a href="https://github.com/elastic/devfiler">devfiler</a>, a standalone desktop application that acts as an OpenTelemetry Profiles compliant backend aimed at experimentation and development.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/otel-profiling-alpha/header.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Self-Driving Observability: From Stacktraces to Profiling-Derived Metrics]]></title>
            <link>https://www.elastic.co/observability-labs/blog/otel-profiling-metrics</link>
            <guid isPermaLink="false">otel-profiling-metrics</guid>
            <pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Profiling-derived metrics turn raw stacktraces into time-series KPIs, unlock continuous profiling for every user and lay the foundation for an observability system that detects, investigates, and acts on its own.]]></description>
            <content:encoded><![CDATA[<p>Continuous profiling has come a long way. With the <a href="https://opentelemetry.io/blog/2026/profiles-alpha/">OpenTelemetry Profiles signal entering Alpha</a> and the <a href="https://github.com/open-telemetry/opentelemetry-ebpf-profiler">OpenTelemetry eBPF profiler</a> — donated by Elastic — now operating as a first-class OpenTelemetry Collector receiver, low-overhead, whole-system profiling on Linux is finally available to every OpenTelemetry user. No instrumentation, no recompilation, no service restarts. Just deploy the profiler and get visibility from the kernel, through native code, all the way up into HotSpot, Python, V8, .NET, Go, PHP, Perl, BEAM Erlang and Ruby runtimes.</p>
<p>The processing pipeline is straightforward: The profiler samples every CPU core on the system at a fixed rate
(19Hz by default), unwinds execution stacks, symbolizes the resulting stacktraces and ships the profiles to
a backend like Elasticsearch.</p>
<p>And then... the user has to figure out what to do with them.</p>
<p>That last step is where continuous profiling has historically faced adoption challenges, as
the path from &quot;profiling is on&quot; to &quot;profiling is useful&quot; is steeper than it should be.</p>
<h2>Four barriers to adoption</h2>
<ul>
<li>
<p><strong>Storage cost:</strong> Full stacktraces, even after deduplication and clever storage schemas, are expensive to store at fleet scale. That cost makes continuous profiling an opt-in feature in practice: a lot of potential users never enable it, and the ones who do, tend to enable it only on a subset of hosts.</p>
</li>
<li>
<p><strong>Query friction:</strong> A normalized stacktrace schema is optimized for ingestion and storage but complicates ad-hoc questions. &quot;How much CPU time does my service spend in TLS?&quot; is a simple question that may require intricate ES|QL or custom code in order to be answered.</p>
</li>
<li>
<p><strong>AI-hostile data:</strong> Normalized stacktrace data (typically involving multiple levels of indirection) resists straightforward algorithmic analysis. LLMs in particular struggle with it and necessitate further data transformations into representations more amenable to LLM processing.</p>
</li>
<li>
<p><strong>UX barrier:</strong> Flamegraphs are extremely useful when you know how to read them but intimidating when you don't.</p>
</li>
</ul>
<p>These four barriers compound: storage cost limits coverage, the UX barrier limits who benefits from coverage, query friction limits what questions users can ask and the AI-hostile data representation limits what the system can do when users don't know what questions to ask.</p>
<h2>How profiling-derived metrics work: classify at the edge</h2>
<p>The core idea is simple: instead of sending full stacktraces all the way to a backend and asking the user to make sense of them there, we classify and count at the edge, inside an OpenTelemetry Collector pipeline, and emit ordinary OpenTelemetry time-series counters. The profiling logic itself doesn't change; it's still the OpenTelemetry eBPF profiler running inside the OpenTelemetry Collector. All the new work happens in a stateless connector inside the Collector: the connector inspects each stacktrace produced by the profiler, classifies its frames into one or more categories and increments counters.</p>
<p>We've released <a href="https://github.com/elastic/opentelemetry-collector-components/tree/main/connector/profilingmetricsconnector"><code>profilingmetricsconnector</code></a> as part of Elastic's <code>opentelemetry-collector-components</code> repository. It sits between the OpenTelemetry eBPF profiler receiver and any metrics exporter, and turns symbolized stacktraces into named, aggregated counters with attributes.</p>
&lt;div align=&quot;center&quot;&gt;
![profilingmetricsconnector pipeline](/assets/images/otel-profiling-metrics/profilingmetricsconnector-pipeline.svg)
&lt;/div&gt;
<p>Because the profilingmetricsconnector lives inside the standard OpenTelemetry Collector pipeline, every metric it produces flows through the same processors as the rest of your telemetry. In the following example, the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md"><code>resourcedetectionprocessor</code></a> enriches each counter with host-derived attributes.</p>
<pre><code class="language-yaml">connectors:
  profilingmetrics:
    flush_interval: 30s

receivers:
  profiling: {}

exporters:
  elasticsearch:
    endpoints:
      - # ENDPOINT
    api_key: # API_KEY
    mapping:
      mode: otel

processors:
  resourcedetection:
    detectors: [&quot;system&quot;]
    system:
      hostname_sources: [&quot;os&quot;]
      resource_attributes:
        host.name:
          enabled: true
        host.id:
          enabled: false
        host.arch:
          enabled: true
        os.description:
          enabled: true
        os.type:
          enabled: true

service:
  pipelines:
    profiles:
      receivers: [ profiling ]
      exporters: [ profilingmetrics ]
    metrics:
      receivers: [ profilingmetrics ]
      processors: [resourcedetection]
      exporters: [ elasticsearch ]
</code></pre>
<h2>Profiling-derived CPU metrics: what gets emitted</h2>
<p>The connector ships with a set of pre-baked counters built from useful classification rules. Each metric is a count of stacktrace samples whose leaf frame matched a particular category, with the frequency value standing in for CPU consumption.</p>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Classifies</th>
<th>Attached metadata</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>kernel.count</code></td>
<td>Kernel leaf frames</td>
<td><code>syscall</code>, <code>category</code> (<code>disk/rw</code>, <code>ipc/rw</code>, <code>network/{tcp,udp,other}/rw</code>, <code>memory</code>, <code>synchronization</code>, …)</td>
</tr>
<tr>
<td><code>native.count</code></td>
<td>Native C/C++/Rust leaf frames</td>
<td>shared library name (<code>libcrypto</code>, <code>libclrjit</code>, <code>libsystemd</code>, …)</td>
</tr>
<tr>
<td><code>hotspot.count</code>, <code>go.count</code>, <code>python.count</code>, …</td>
<td>Runtime-specific leaf frames</td>
<td>runtime-specific attributes</td>
</tr>
</tbody>
</table>
<p>The kernel categorization is worth a closer look as a modern Linux kernel has more than 400 system calls. However, most of what shows up in CPU stacktraces falls into a handful of subsystems: filesystem read/write, network read/write, memory management, scheduling, synchronization. Some syscalls (e.g. <code>read</code>, <code>write</code>) are ambiguous on their own and only become specific when one examines more frames down the stack: <code>ext4_file_read_iter</code> points to filesystem, <code>tcp_v4_rcv</code> to network. The connector handles this disambiguation as part of frame iteration.</p>
&lt;div align=&quot;center&quot;&gt;
![Kernel CPU breakdown by category in Kibana](/assets/images/otel-profiling-metrics/kibana-kernel-cpu-by-category.png)
&lt;/div&gt;
<p>Native frames typically lack symbolic information beyond shared library names, but those names are still informative: <code>libssl</code> and <code>libcrypto</code> mean cryptographic work as part of OpenSSL or one of its variants; <code>libz</code> means compression; <code>libclrjit</code> means the .NET JIT is busy. We don't need to enumerate libraries statically as the connector dynamically generates <code>shlib_name</code> attribute values using the trimmed library name (e.g. <code>libssl</code> not <code>libssl.so.3</code>) for clean cardinality.</p>
<p>Currently, for each stacktrace, the connector computes a <strong>Self CPU</strong> count (the leaf frame matched the category) corresponding to exclusive CPU usage. A complication exists for fine-grained kernel categories like <code>network/tcp/write</code> where the actual leaf frame is usually a device-driver call that we can't meaningfully match. We deal with that by trying to match frames further up the stack (e.g. <code>tcp_sendmsg</code> is enough to correctly classify the sample).</p>
<p>Users can also add their own categories by specifying a frame pattern (e.g. a function or package) and the connector will generate counters for them.</p>
<h2>Benefits of profiling-derived metrics for observability</h2>
<p>This shift looks small from the outside — &quot;we're emitting counters&quot; — but it changes almost everything about how profiling fits into an observability stack.</p>
<ul>
<li>
<p><strong>Orders of magnitude less storage:</strong> A counter aggregated over a 5-second (or 30-second or one-minute) window is dramatically cheaper than the full stacktraces it distills. The pre-aggregation interval is configurable with the trade-off being time resolution rather than categorization fidelity. For most &quot;where is my CPU being spent?&quot; questions, 30 seconds is plenty.</p>
</li>
<li>
<p><strong>On by default:</strong> Because the storage cost is now in line with regular metrics, profiling-derived metrics can be on for everyone, on every host, from the moment the profiler is deployed. Users get a CPU breakdown by runtime, syscall, kernel category and shared library on day one.</p>
</li>
<li>
<p><strong>Standard dashboards:</strong> These are ordinary OpenTelemetry time-series counters and can be visualized ad-hoc using stacked bar graphs, pie charts, top-N panels or any other visualization Kibana supports out of the box. The same Lens and TSDB-backed views for application metrics work here.</p>
</li>
</ul>
&lt;div align=&quot;center&quot;&gt;
![User CPU by frame type over time in Kibana](/assets/images/otel-profiling-metrics/kibana-user-cpu-over-time.png)
&lt;/div&gt;
<ul>
<li>
<p><strong>AI and query-friendly:</strong> Standard time-series data is trivially consumable by ES|QL, ML jobs, anomaly detectors and by LLMs. &quot;Show me the top services by <code>network/udp/write</code> time, filtered to the payments namespace, over the last six hours&quot; is one query that is not only simple for the system to answer but also simple for an LLM to generate.</p>
</li>
<li>
<p><strong>Cross-signal correlation:</strong> Because the metrics flow through the standard OpenTelemetry Collector pipeline, they pick up the same resource attributes (e.g. <code>service.name</code>, <code>k8s.pod.name</code>, <code>host.name</code>, <code>deployment.environment</code>) that logs, other metrics and traces already carry.</p>
</li>
<li>
<p><strong>Instant value, with a path to more detail:</strong> A user who just wants to know &quot;what's burning my CPU?&quot; gets a meaningful answer without ever opening a flamegraph. A user who wants to dig deeper still has the full eBPF profiler underneath, ready to hand back complete stacktraces when they're warranted.</p>
</li>
</ul>
<h2>User-programmable profiling and adaptive sampling</h2>
<p>The longer-term direction is for the profiler to stop being something users <em>consume</em> and start being something they <em>program</em>. User-defined metrics are the first step in this direction, complemented by on-demand (full) profiling and adaptive sampling.</p>
<p>Profiling-derived metrics or other signals can act as a trigger for on-demand profiling where the system enables full profiling on a specific host or service to capture complete stacktraces. In that way, the full profiling processing and storage cost is paid only when it matters.</p>
<p>We can apply the same idea to the sampling rate. 19Hz is a sensible baseline for steady state but when the metrics signal an interesting event or an anomaly, the system can automatically ramp to 100Hz or higher to capture high-fidelity data for the time window during which it's relevant. It can then ramp down to baseline.</p>
<h2>How profiling-derived metrics enable self-driving observability</h2>
<p>Most observability stacks today use an open-loop model: the profiler emits data with a fixed configuration. Then a human looks at flamegraphs and dashboards, potentially correlates with logs, other metrics and traces, forms a hypothesis and triggers a deeper investigation. Every link in this chain requires a human decision. Nothing feeds back into the profiler at speed and the system cannot act on its own observations.</p>
<p>Profiling-derived metrics close that loop.</p>
<ol>
<li>
<p>A &quot;significant host events&quot; metric, an anomaly on <code>network/udp/write</code> or a spike in <code>native.count/libz</code>: something crosses a threshold.</p>
</li>
<li>
<p>The profiler adjusts in response: sampling rate increases, full profiling turns on for the affected hosts.</p>
</li>
<li>
<p>The richer data is correlated against logs, traces, and other metrics by an LLM, by a human or both. The same resource attributes that make cross-signal correlation easy for the user make it easy for the system.</p>
</li>
<li>
<p>A root cause is identified. A remediation is suggested or applied. The metric returns to baseline and the loop continues.</p>
</li>
</ol>
<p>This is what we mean when we talk about <em>self-driving observability</em>. The profiler is no longer just an instrument that someone wields. It is the sensory organ of an autonomous feedback loop: a system that observes itself, decides what to look at more closely and adjusts its own configuration in response to what it sees.</p>
<h2>What's next: inclusive CPU, off-CPU metrics, and runtime-specific profiling</h2>
<p>Any piece of data visible in a stacktrace can be a metric source and several extensions are already on the roadmap.</p>
<ul>
<li>
<p><strong>Inclusive-CPU metrics:</strong> Today's pre-baked counters attribute CPU at the leaf frame (exclusive-CPU). Inclusive-CPU metrics will attribute the entire call chain which is useful when you care about the total cost of a function call — the function plus everything it transitively calls — not just the work done directly in its own body.</p>
</li>
<li>
<p><strong>Runtime-specific metrics:</strong> GC time per runtime, JSON/Protobuf serialization, RPC frameworks, FFI boundaries. The kinds of questions every team eventually asks about their language runtime, answered by default.</p>
</li>
<li>
<p><strong>Off-CPU metrics:</strong> On-CPU profiling tells you where you're spending CPU but Off-CPU profiling tells you where you're <em>not</em> (e.g. blocked on I/O, locks). The same classification logic applies, with the only change being the source signal.</p>
</li>
</ul>
<p>Profiling-derived metrics are an active area of work within Elastic and the <a href="https://github.com/elastic/opentelemetry-collector-components/tree/main/connector/profilingmetricsconnector">profilingmetricsconnector</a> is the place to start if you want to play with this today. A ready-made <a href="https://www.elastic.co/docs/reference/integrations/profilingmetrics_otel">Kibana integration</a> ships dashboards for all the metrics described above.</p>
<p>If you're already using Elastic's continuous profiling, expect these metrics to show up as first-class citizens in the Elastic stack. If you're not, this is a very low-friction way in as no flamegraph expertise is required and storage
cost is minimal.</p>
<p>The flamegraph isn't going anywhere, but for the first time, it isn't the <em>only</em> way profiling yields results.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/otel-profiling-metrics/header.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>