Auto-instrumentation of .NET applications with OpenTelemetry

observability-launch-series-4-net-auto.jpg

In the fast-paced universe of software development, especially in the cloud-native realm, DevOps and SRE teams are increasingly emerging as essential partners in application stability and growth.

DevOps engineers continuously optimize software delivery, while SRE teams act as the stewards of application reliability, scalability, and top-tier performance. The challenge? These teams require a cutting-edge observability solution, one that encompasses full-stack insights, empowering them to rapidly manage, monitor, and rectify potential disruptions before they culminate into operational challenges.

Observability in our modern distributed software ecosystem goes beyond mere monitoring — it demands limitless data collection, precision in processing, and the correlation of this data into actionable insights. However, the road to achieving this holistic view is paved with obstacles, from navigating version incompatibilities to wrestling with restrictive proprietary code.

Enter OpenTelemetry (OTel), with the following benefits for those who adopt it:

  • Escape vendor constraints with OTel, freeing yourself from vendor lock-in and ensuring top-notch observability.
  • See the harmony of unified logs, metrics, and traces come together to provide a complete system view.
  • Improve your application oversight through richer and enhanced instrumentations.
  • Embrace the benefits of backward compatibility to protect your prior instrumentation investments.
  • Embark on the OpenTelemetry journey with an easy learning curve, simplifying onboarding and scalability.
  • Rely on a proven, future-ready standard to boost your confidence in every investment.
  • Explore manual instrumentation, enabling customized data collection to fit your unique needs.
  • Ensure monitoring consistency across layers with a standardized observability data framework.
  • Decouple development from operations, driving peak efficiency for both.

Given this context, OpenTelemetry emerges as an unmatched observability solution for cloud-native software, seamlessly enabling tracing, monitoring, and debugging. One of its strengths is the ability to auto-instrument applications, allowing developers the luxury of collecting invaluable telemetry without delving into code modifications.

In this post, we will dive into the methodology to instrument a .NET application using Docker, blending the best of both worlds: powerful observability without the code hassles.

What's covered?

  • How APM works with .NET using CLR Profiler functionality
  • Creating a Docker image for a .NET application with the OpenTelemetry instrumentation baked in
  • Installing and running the OpenTelemetry .NET Profiler for automatic instrumentation

How APM works with .NET using CLR Profiler functionality

Before we delve into the details, let's clear up some confusion around .NET Profilers and CPU Profilers like Elastic®’s Universal Profiling tool — we don’t want to get these two things mixed up, as they have very different purposes.

When discussing profiling tools, especially in the context of .NET, it's not uncommon to encounter confusion between a ".NET profiler" and a "CPU profiler." Though both are used to diagnose and optimize applications, they serve different primary purposes and operate at different levels. Let's clarify the distinction:

.NET Profiler

  1. Scope: Specifically targets .NET applications. It is designed to work with the .NET runtime (i.e., the Common Language Runtime (CLR)).

  2. Functionality: 

    • Code level insights: Provides detailed insights into the .NET application's operation, such as method call counts, memory allocations, and garbage collection events.
    • IL instrumentation: Can modify the Intermediate Language (IL) code of a .NET application at runtime, allowing for deeper monitoring and diagnostics.
    • Runtime events: Tracks events in the CLR like JIT compilation, class loading, and thread execution.
  3. Use cases:

    • Performance optimization of .NET applications
    • Memory leak detection
    • Understanding application behavior at the code level

CPU Profiler

  1. Scope: More general than a .NET profiler. It can profile any application, irrespective of the language or runtime, as long as it runs on the CPU being profiled.

  2. Functionality: 

    • CPU utilization: Monitors the time spent by the CPU on executing different functions or methods, indicating which parts of the application are most CPU-intensive.
    • Sampling: Typically, takes periodic samples to see which functions are being executed at those moments. Over time, a profile of CPU usage by function or method emerges.
    • Call stacks: Can often provide call stacks to show how specific functions or methods were invoked.
  3. Use cases: 

    • Identifying CPU bottlenecks in any application
    • Optimizing overall application performance
    • Getting a high-level view of where the CPU spends most of its time

While both .NET profilers and CPU profilers aid in optimizing and diagnosing application performance, their approach and depth differ. A .NET profiler offers deep insights specifically into the .NET ecosystem, allowing for fine-grained analysis and instrumentation. In contrast, a CPU profiler provides a broader view, focusing on CPU usage patterns across any application, regardless of its development platform.

It's worth noting that for comprehensive profiling of a .NET application, you might use both: the .NET profiler to understand code-level behaviors specific to .NET and the CPU profiler to get an overview of CPU resource utilization.

Now that we've cleared that up, let's focus on the .NET Profiler, which we are discussing in this blog for automatic instrumentation of .NET applications. First, let's familiarize ourselves with some foundational concepts and terminologies relevant to a .NET Profiler:

  • CLR (Common Language Runtime): CLR is a core component of the .NET framework, acting as the execution engine for .NET apps. It provides key services like memory management, exception handling, and type safety. 
  • Profiler API: .NET provides a set of APIs for profiling applications. These APIs let tools and developers monitor or manipulate .NET applications during runtime. 
  • IL (Intermediate Language): After compiling, .NET source code turns into IL, a low-level, platform-agnostic representation. This IL code is then compiled just-in-time (JIT) into machine code by the CLR during application execution.
  • JIT compilation: JIT stands for just-in-time. In .NET, the CLR compiles IL to native code just before its execution.

Now, let's explore how automatic instrumentation works using CLR Profiler.

Automatic instrumentation in .NET, much like Java's bytecode instrumentation, revolves around modifying the behavior of your application's methods during runtime, without changing the actual source code.

Here’s a step-by-step breakdown:

  1. Attach the profiler: When launching your .NET application, you'll have to specify to load the profiler. The CLR checks for the presence of a profiler by reading environment variables. If it finds one, the CLR initializes the profiler before any user code is executed.

  2. Use Profiler API to monitor events: The Profiler API allows a profiler to monitor various events. For instance, method JIT compilation events can be tracked. When a method is about to be JIT compiled, the profiler gets notified.

  3. Manipulate IL code: Upon getting notified of a JIT compilation, the profiler can manipulate the IL code of the method. Using the Profiler API, the profiler can insert, delete, or replace IL instructions. This is analogous to how Java agents modify bytecode. For example, if you want to measure a method's execution time, you'd modify the IL to insert calls to start and stop a timer at the beginning and end of the method, respectively.

  4. Execution of transformed code: Once the IL has been modified, the JIT compiler will translate it into machine code. The application will then execute this machine code, which includes the additions made by the profiler.

  5. Gather and report data: The added instrumentation can collect various data, such as method execution times or call counts. This data can then be relayed to an application performance management (APM) tool, which can provide insights, visualizations, and alerts based on the data.

In essence, automatic instrumentation with CLR Profiler is about modifying the behavior of your .NET methods at runtime. This is invaluable for monitoring, diagnosing, and fine-tuning the performance of .NET applications without intruding on the application's actual source code.

Prerequisites

  • A basic understanding of Docker and .NET
  • Elastic Cloud
  • Docker installed on your machine (we recommend docker desktop)

View the example source code

The full source code, including the Dockerfile used in this blog, can be found on GitHub. The repository also contains the same application without instrumentation. This allows you to compare each file and see the differences.

The following steps will show you how to instrument this application and run it on the command line or in Docker. If you are interested in a more complete OTel example, take a look at the docker-compose file here, which will bring up the full project.

Step-by-step guide

This blog assumes you have an Elastic Cloud account — if not, follow the instructions to get started on Elastic Cloud.

Step 1. Base image setup

Start with the .NET runtime image for the base layer of our Dockerfile:

FROM ${ARCH}mcr.microsoft.com/dotnet/aspnet:7.0. AS base
WORKDIR /app
EXPOSE 8000

Here, we're setting up the application's runtime environment. 

Step 2. Building the .NET application

This feature of Docker is just the best. Here, we compile our .NET application using the SDK image. In the bad old days, we used to build on a different platform and then put the compiled code into the Docker container. This way, we are much more confident our build will replicate from a developer’s desktop and into production by using Docker all the way through.

FROM --platform=$BUILDPLATFORM mcr.microsoft.com/dotnet/sdk:8.0-preview AS build
ARG TARGETPLATFORM

WORKDIR /src
COPY ["login.csproj", "./"]
RUN dotnet restore "./login.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "login.csproj" -c Release -o /app/build

This section ensures that our .NET code is properly restored and compiled.

Step 3. Publishing the application

Once built, we'll publish the app:

FROM build AS publish
RUN dotnet publish "login.csproj" -c Release -o /app/publish

Step 4. Preparing the final image

Now, let's set up the final runtime image:

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish 

Step 5. Installing OpenTelemetry

We'll install dependencies and download the OpenTelemetry auto-instrumentation script:

RUN apt-get update && apt-get install -y zip curl
RUN mkdir /otel
RUN curl -L -o /otel/otel-dotnet-install.sh https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/releases/download/v0.7.0/otel-dotnet-auto-install.sh
RUN chmod +x /otel/otel-dotnet-install.sh

Step 6. Configure OpenTelemetry

Designate where OpenTelemetry should reside and execute the installation script. Note that the ENV OTEL_DOTNET_AUTO_HOME is required as the script looks for it:

ENV OTEL_DOTNET_AUTO_HOME=/otel
RUN /bin/bash /otel/otel-dotnet-install.sh

Step 7. Additional configuration

Make sure the auto-instrumentation and platform detection scripts are executable and run the platform detection script.

COPY platform-detection.sh /otel/
RUN chmod +x /otel/instrument.sh
RUN chmod +x /otel/platform-detection.sh && /otel/platform-detection.sh

This platform detection script will check if the Docker build is for ARM64 and implement a workaround to get the OpenTelemetry instrumentation to work on MacOS. If you happen to be running locally on MacOS M1 or M2 processors, you will be grateful for this script.

Step 8. Entry point setup

Lastly, set the Docker image's entry point to both source the OpenTelemetry instrumentation, which sets up the environment variables required to bootstrap the .NET Profiler, and then we start our .NET application:

ENTRYPOINT ["/bin/bash", "-c", "source /otel/instrument.sh && dotnet login.dll"]

Step 9. Running the Docker image with environment variables

To build and run the Docker image, you'd typically follow these steps:

Build the Docker image

First, you'd want to build the Docker image from your Dockerfile. Let's assume the Dockerfile is in the current directory, and you'd like to name/tag your image dotnet-login-otel-image.

   docker build -t dotnet-login-otel-image .

Run the Docker image

After building the image, you'd run it with the specified environment variables. For this, the docker run command is used with the -e flag for each environment variable.

   docker run \
       -e OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer ${ELASTIC_APM_SECRET_TOKEN}" \
       -e OTEL_EXPORTER_OTLP_ENDPOINT="${ELASTIC_APM_SERVER_URL}" \
       -e OTEL_METRICS_EXPORTER="otlp" \
       -e OTEL_RESOURCE_ATTRIBUTES="service.version=1.0,deployment.environment=production" \
       -e OTEL_SERVICE_NAME="dotnet-login-otel-auto" \
       -e OTEL_TRACES_EXPORTER="otlp" \
       dotnet-login-otel-image

Make sure that ${ELASTIC_APM_SECRET_TOKEN} and ${ELASTIC_APM_SERVER_URL} are set in your shell environment, and replace them with their actual values from the cloud as shown below.
Getting Elastic Cloud variables

You can copy the endpoints and token from Kibana® under the path `/app/home#/tutorial/apm`.

apm agents

You can also use an environment file with docker run --env-file to make the command less verbose if you have multiple environment variables. 

Once you have this up and running, you can ping the endpoint for your instrumented service (in our case, this is /login), and you should see the app appear in Elastic APM, as shown below:

services

It will begin by tracking throughput and latency critical metrics for SREs to pay attention to.

dotnet-login-otel-auto-1

Digging in, we can see an overview of all our Transactions.

dotnet-login-otel-auto-2

And look at specific transactions:

specific transactions

There is clearly an outlier here, where one transaction took over 200ms. This is likely to be due to the .NET CLR warming up. Click on Logs, and we see that logs are also brought over. The OTel Agent will automatically bring in logs and correlate them with traces for you:

otel agent

Wrapping up

With this Dockerfile, you've transformed your simple .NET application into one that's automatically instrumented with OpenTelemetry. This will aid greatly in understanding application performance, tracing errors, and gaining insights into how users interact with your software.

Remember, observability is a crucial aspect of modern application development, especially in distributed systems. With tools like OpenTelemetry, understanding complex systems becomes a tad bit easier.

In this blog, we discussed the following:

  • How to auto-instrument .NET with OpenTelemetry. 
  • Using standard commands in a Docker file, auto-instrumentation was done efficiently and without adding code in multiple places enabling manageability.
  • Using OpenTelemetry and its support for multiple languages, DevOps and SRE teams can auto-instrument their applications with ease gaining immediate insights into the health of the entire application stack and reduce mean time to resolution (MTTR).

Since Elastic can support a mix of methods for ingesting data, whether it be using auto-instrumentation of open-source OpenTelemetry or manual instrumentation with its native APM agents, you can plan your migration to OTel by focusing on a few applications first and then using OpenTelemety across your applications later on in a manner that best fits your business needs.

Don’t have an Elastic Cloud account yet? Sign up for Elastic Cloud and try out the auto-instrumentation capabilities that I discussed above. I would be interested in getting your feedback about your experience in gaining visibility into your application stack with Elastic.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.