Elastic Observability Labs - Articles by Christos Kalkanis

Elastic contributes its Universal Profiling agent to OpenTelemetry

Thu, 06 Jun 2024 00:00:00 GMT

Following great collaboration between Elastic and OpenTelemetry's profiling community, which included a thorough review process, the OpenTelemetry community has accepted Elastic's donation of our continuous profiling agent. This marks a significant milestone in helping establish profiling as the fourth telemetry signal in OpenTelemetry. Elastic’s eBPF-based continuous profiling agent observes code across different programming languages and runtimes, third-party libraries, kernel operations, and system resources with low CPU and memory overhead in production. SREs can now benefit from these capabilities: quickly identifying performance bottlenecks, maximizing resource utilization, reducing carbon footprint, and optimizing cloud spend. Over the past year, we have been instrumental in enhancing OpenTelemetry's Semantic Conventions with the donation of Elastic Common Schema (ECS), contributing to the OpenTelemetry Collector and language SDKs, and have been working with OpenTelemetry’s Profiling Special Interest Group (SIG) to lay the foundation necessary to make profiling stable.

With today’s acceptance, we are officially contributing our continuous profiler technology to OpenTelemetry. We will also dedicate a team of profiling domain experts to co-maintain and advance the profiling capabilities within OTel.

We want to thank the OpenTelemetry community for the great and constructive cooperation on the donation proposal. We look forward to jointly establishing continuous profiling as an integral part of OpenTelemetry.

What is continuous profiling?

Profiling is a technique used to understand the behavior of a software application by collecting information about its execution. This includes tracking the duration of function calls, memory usage, CPU usage, and other system resources.

However, traditional profiling solutions have significant drawbacks limiting adoption in production environments:

Significant cost and performance overhead due to code instrumentation
Disruptive service restarts
Inability to get visibility into third-party libraries

Unlike traditional profiling, which is often done only in a specific development phase or under controlled test conditions, continuous profiling runs in the background with minimal overhead. This provides real-time, actionable insights without replicating issues in separate environments. SREs, DevOps, and developers can see how code affects performance and cost, making code and infrastructure improvements easier.

Contribution of production-grade features

Elastic Universal Profiling is a whole-system, always-on, continuous profiling solution that eliminates the need for code instrumentation, recompilation, on-host debug symbols or service restarts. Leveraging eBPF, Elastic Universal Profiling profiles every line of code running on a machine, including application code, kernel, and third-party libraries. The solution measures code efficiency in three dimensions, CPU utilization, CO2, and cloud cost, to help organizations manage efficient services by minimizing computational waste.

The Elastic profiling agent facilitates identifying non-optimal code paths, uncovering "unknown unknowns", and provides comprehensive visibility into the runtime behavior of all applications. Elastic’s continuous profiling agent supports various runtimes and languages, such as C/C++, Rust, Zig, Go, Java, Python, Ruby, PHP, Node.js, V8, Perl, and .NET.

Additionally, organizations can meet sustainability objectives by minimizing computational wastage, ensuring seamless alignment with their strategic ESG goals.

Benefits to OpenTelemetry

This contribution not only boosts the standardization of continuous profiling for observability but also accelerates the practical adoption of profiling as the fourth key signal in OTel. Customers get a vendor-agnostic way of collecting profiling data and enabling correlation with existing signals, like tracing, metrics, and logs, opening new potential for observability insights and a more efficient troubleshooting experience.

OTel-based continuous profiling unlocks the following possibilities for users:

Improved customer experience: delivering consistent service quality and performance through continuous profiling ensures customers have an application that performs optimally, remains responsive, and is reliable.

Maximize gross margins: Businesses can optimize their cloud spend and improve profitability by reducing the computational resources needed to run applications. Whole system continuous profiling identifies the most expensive functions (down to the lines of code) across diverse environments that may span multiple cloud providers. In the cloud context, every CPU cycle saved translates to money saved.

Minimize environmental impact: energy consumption associated with computing is a growing concern (source: MIT Energy Initiative ). More efficient code translates to lower energy consumption, reducing carbon (CO2) footprint.

Accelerate engineering workflows: continuous profiling provides detailed insights to help troubleshoot complex issues faster, guide development, and improve overall code quality.

Improved vendor neutrality and increased efficiency: an OTel eBPF-based profiling agent removes the need to use proprietary APM agents and offers a more efficient way to collect profiling telemetry.

With these benefits, customers can now manage the overall application’s efficiency on the cloud while ensuring their engineering teams optimize it.

What comes next?

While the acceptance of Elastic’s donation of the profiling agent marks a significant milestone in the evolution of OTel’s eBPF-based continuous profiling capabilities, it represents the beginning of a broader journey. Moving forward, we will continue collaborating closely with the OTel Profiling and Collector SIGs to ensure seamless integration of the profiling agent within the broader OTel ecosystem. During this phase, users can test early preview versions of the OTel profiling integration by following the directions in the otel-profiling-agent repository.

Elastic remains deeply committed to OTel’s vision of enabling cross-signal correlation. We plan to further contribute to the community by sharing our innovative research and implementations, specifically those facilitating the correlation between profiling data and distributed traces, across several OTel language SDKs and the profiling agent.

We are excited about our growing relationship with OTel and the opportunity to donate our profiling agent in a way that benefits both the Elastic community and the broader OTel community. Learn more about Elastic’s OpenTelemetry support and learn how to contribute to the ongoing profiling work in the community.

Additional Resources

Additional details on Elastic’s Universal Profiling can be found in the FAQ.

For insights into observability, visit Observability labs where OTel specific articles are also available.

Elastic Universal Profiling agent, a continuous profiling solution, is now open source

Mon, 15 Apr 2024 00:00:00 GMT

Elastic Universal Profiling™ agent is now open source! The industry’s most advanced fleetwide continuous profiling solution empowers users to identify performance bottlenecks, reduce cloud spend, and minimize their carbon footprint. This post explores the history of the agent, its move to open source, and its future integration with OpenTelemetry.

Elastic Universal Profiling™ Agent goes open source under Apache 2

At Elastic, open source is more than just a philosophy — it's our DNA. We believe the benefits of whole-system continuous profiling extend far beyond performance optimization. It's a win for businesses and the planet alike. For instance, since launching Elastic Universal Profiling in general availability (GA), we've observed a wide variety of use cases from customers.

These range from customers relying fully on Universal Profiling's differential flame graphs and topN functions for insights during release management to utilizing AI assistants for quickly optimizing expensive functions. This includes using profiling data to identify the optimal energy-efficient cloud region to run certain workloads. Additionally, customers are using insights that Universal Profiling provides to build evidence to challenge cloud provider bills. As it turns out, cloud providers' in-VM agents can consume a significant portion of the CPU time, which customers are billed for.

In a move that will empower the community to take advantage of continuous profiling's benefits, we're thrilled to announce that the Elastic Universal Profiling agent , a pioneering eBPF-based continuous profiling agent, is now open source under the Apache 2 license!

This move democratizes hyper-scaler efficiency for everyone , opening exciting new possibilities for the future of continuous profiling, as well as its role in observability and OpenTelemetry.

Implementation of the OpenTelemetry (OTel) Profiling protocol

Our commitment to open source goes beyond just the agent itself. We recently announced our intent to donate the agent to OpenTelemetry and have further solidified this goal by implementing the experimental OTel Profiling data model. This allows the open-sourced eBPF-based continuous profiling agent to communicate seamlessly with OpenTelemetry backends.

But that's not all! We've also launched an innovative feature that correlates profiling data with OpenTelemetry distributed traces. This powerful capability offers a deeper level of insight into application performance, enabling the identification of bottlenecks with greater precision. Upon donating the Profiling agent to OTel, Elastic will also contribute critical components that enable distributed trace correlation within the Elastic distribution of the OTel Java agent to the upstream OTel Java SDK. This underscores Elastic Observability's commitment to both open source and the support of open standards like OpenTelemetry while pushing the boundaries of what is possible in observability.

What does this mean for Elastic Universal Profiling customers?

We'd like to express our immense gratitude to all our customers who have been part of this journey, from the early stages of private beta to GA. Your feedback has been invaluable in shaping Universal Profiling into the powerful product it is today.

By open-sourcing the Universal Profiling agent and contributing it to OpenTelemetry, we're fostering a win-win situation for both you and the broader community. This move opens doors for innovation and collaboration, ultimately leading to a more robust and versatile whole-system continuous profiling solution for everyone.

Furthermore, we're actively working on exciting novel ways to integrate Universal Profiling seamlessly within Elastic Observability. Expect further announcements soon, outlining how you can unlock even greater value from your profiling data within a unified observability experience in a way that has never been done before.

The open-sourced agent is using the recently released (experimental) OTel Profiling signal. As a precaution, we recommend not using it in production environments.

Please continue using the official Elastic distribution of the Universal Profiling agent until the agent is formally accepted by OTel and the protocol reaches a stable phase. There's no need to take any action at this time, and we will ensure to have a smooth transition plan in place for you.

What does this mean for the OpenTelemetry community?

OpenTelemetry is adopting continuous profiling as a key signal. By open-sourcing the eBPF-based profiling agent and working towards donating it to OTel, Elastic is making it possible to accelerate the standardization of continuous profiling within OpenTelemetry. This move has a massive impact on the observability community, empowering everyone to continuously profile their systems with a standardized protocol.

This is particularly timely as Moore's Law slows down and cloud computing takes hold, making computational efficiency critical for businesses.

Here's how whole-system continuous profiling benefits you:

Maximize gross margins: By reducing the computational resources needed to run applications, businesses can optimize their cloud spend and improve profitability. Whole-system continuous profiling is one way of identifying the most expensive applications (down to the lines of code) across diverse environments that may span multiple cloud providers. This principle aligns with the familiar adage, "a penny saved is a penny earned." In the cloud context, every CPU cycle saved translates to money saved.
Minimize environmental impact: Energy consumption associated with computing is a growing concern (source: MIT Energy Initiative). More efficient code translates to lower energy consumption, contributing to a reduction in carbon footprint.
Accelerate engineering workflows: Continuous profiling provides detailed insights to help debug complex issues faster, guide development, and improve overall code quality.

This is where Elastic Universal Profiling comes in — designed to help organizations run efficient services by minimizing computational wastage. To this end, it measures code efficiency in three dimensions: CPU utilization , CO** 2 , and cloud cost**.

Elastic's journey with continuous profiling began by joining forces with optimyze.cloud –– this became the foundation for Elastic Universal Profiling. We are excited to see this product evolve into its next growth phase in the open-source world.

Ready to give it a spin?

As Elastic Universal Profiling transitions into this new open source era, the potential for transformative impact on performance optimization, cost efficiency, and environmental sustainability is immense. Elastic's approach — balancing innovation with responsibility — paves the way for a future where technology not only powers our world but does so in a way that is sustainable and accessible to all.

Get started with the open source Elastic Universal Profiling agent today! Download it directly from GitHub and follow the instructions in the repository.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.