How auto instrumentation with the next iteration of the Elastic APM .NET Agent works

blog-thumbnail-apm-waterfall.png

In this blog post, we will look at the next iteration of the Elastic APM .NET Agent and its new auto instrumentation capabilities. We will also discuss what a profiler-based agent is, and what the profiler-based version of the Elastic .NET APM Agent is capable of.

APM .NET agent profiler auto instrumentation

We're pleased to announce the first public beta release of Profiler-based auto instrumentation for our .NET APM agent. With this approach, the agent can automatically instrument .NET Framework, .NET Core and .NET applications, without requiring code changes or recompilation. A profiler-based approach uses the CLR Profiling APIs to rewrite methods of interest at runtime. This powerful capability allows the .NET APM agent to instrument and trace code that would otherwise be unfeasible. With this first public beta release, auto instrumentation is supported for:

  • Kafka
  • RabbitMQ
  • Common ADO.NET providers for SQL Server, PostgreSQL, MySQL, Oracle, and Sqlite
  • ASP.NET Full Framework

with plans to expand support for other popular technologies in future.

A Kafka example

The Kafka .NET client is a good example of a library that is unfeasible to instrument without a profile- based approach. In a typical application, the Kafka client library is used to create producers, to write messages to Kafka, and consumers, to consume messages from Kafka.

A typical producer looks as follows:

```c#
var config = new ProducerConfig
{
    BootstrapServers = "localhost:9091,localhost:9092",
    ClientId = Dns.GetHostName(),
};

using var producer = new ProducerBuilder<Null, string>(config).Build();
await producer.ProduceAsync("messages", new Message<Null, string> { Value = "message 1" });
```

For the purposes of Elastic APM, when a message is sent to a messaging system like Kafka, if there is an active APM transaction, the sending operation should be captured as an APM messaging span. The Kafka client however does not provide a mechanism to know when the produce operation starts or finishes.

A typical consumer looks as follows:

```c#
var cts = new CancellationTokenSource();
var config = new ConsumerConfig
{
    BootstrapServers = "localhost:9091,localhost:9092",
    GroupId = "consumer",
    AutoOffsetReset = AutoOffsetReset.Earliest
};

using var consumer = new ConsumerBuilder<Ignore, string>(config).Build();
var topic = "messages";

consumer.Subscribe(topic);

while (!cts.IsCancellationRequested)
{
    var consumeResult = consumer.Consume(cts.Token);
    var message = consumeResult.Message.Value;

    // perform handling on the message...
}

consumer.Close();
```

A consumer subscribes to a topic and consumes messages in a loop until cancellation is requested. For Elastic APM, when a message is received from a messaging system like Kafka, the message handling flow should be captured in an APM messaging transaction. Similar to the “produce” operation, the Kafka client does not provide a mechanism to know when a “consume” operation starts or finishes.

With profiler auto instrumentation configured for an application, both Kafka produce and consume operations can be automatically traced. We've put together a sample solution containing two projects to demonstrate:

  1. An ASP.NET Core 6 minimal API with endpoints to send a single or multiple messages to Kafka, and an endpoint to read messages from a Sqlite database.

  2. A .NET 6 console application to consume messages from Kafka, and write each message to a Sqlite database.

Both projects are compiled into docker images, and a docker compose yml file is provided to start Kafka and both applications. If an Elastic stack is created on Elastic Cloud, and the APM server URL and secret token populated in the .env file of the solution, when all the services are started with

```sh
docker compose up -d
```

then the services should be running, which can be checked with

```sh
 docker compose ps
```

Now, if a collection of messages is sent to the ASP.NET Core application that is mapped to port 5000 on localhost

```sh
curl -XPOST http://localhost:5000/messages/bulk -H 'Content-Type: application/json' -d '
[{"value":"message 1"},
 {"value":"message 2"},
 {"value":"message 3"},
 {"value":"message 4"},
 {"value":"message 5"}]
'
```

The APM UI captures the API request as a transaction, and the produce operations to Kafka as spans, as well as a transaction for each message consumed from Kafka by the console application, with a span for each Sqlite database operation:

The consume transactions form distributed traces for the produce operations; profiler auto instrumentation automatically propagates the Trace Context in Kafka record headers.

All of this is achieved without making code changes or recompiling the applications. The ASP.NET Core minimal API takes a dependency on the Elastic.Apm.AspNetCore package to register for ASP.NET diagnostic events, to create transactions for requests, but we expect to perform this registration with the profiler in future, removing the need to reference this package.

A closer look at profiling

CLR profilers are typically written in C++, leveraging source and header files from the .NET runtime repository that define the COM interfaces for the Profiling APIs, along with common helper functions, to compile a profiler.

We decided to explore what a CLR profiler written in Rust could look like. Rust has great interoperability with C and C++, with a good dependency and package management story. In addition, Rust would allow the parts of the profiler that are not part of its COM foreign function interface (FFI) boundary to be written with memory safety in mind, encapsulating the unsafe parts in the FFI boundary.

A profiler is loaded by the runtime into the same process as the application that is being profiled, using environment variables to identify the profiler library. The profiler implements the ICorProfilerCallback* notification interfaces to be notified by the runtime when particular events happen. Upon initialization, the profiler is passed an ICorProfilerInfo* interface by the runtime that can be subsequently used by the profiler to take some action when particular events happen, such as rewriting Intermediate Language (IL) instructions.

COM interfaces

Representing COM interfaces in Rust is succinctly handled with the com crate. For example, the COM interface for ICorProfilerFunctionControl is defined as follows

```rust
interfaces! {
    #[uuid("F0963021-E1EA-4732-8581-E01B0BD3C0C6")]
    pub unsafe interface ICorProfilerFunctionControl: IUnknown {
        pub fn SetCodegenFlags(&self, flags: DWORD) -> HRESULT;
        pub fn SetILFunctionBody(
            &self,
            cbNewILMethodHeader: ULONG,
            pbNewILMethodHeader: LPCBYTE,
        ) -> HRESULT;
        pub fn SetILInstrumentedCodeMap(
            &self,
            cILMapEntries: ULONG,
            rgILMapEntries: *const COR_IL_MAP,
        ) -> HRESULT;
    }
}
```

The order of functions and unique identifier must match the COM interface definition for ICorProfilerFunctionControl. The interfaces! macro takes care of generating the vtable structures and function pointers that form the COM machinery. The generated ICorProfilerFunctionControl struct can then be augmented to create a more idiomatic Rust API to call COM functions; here a pointer and length are determined from a slice of bytes, and used to set the IL instructions of a function

```rust
impl ICorProfilerFunctionControl {
    pub fn set_il_function_body(&self, new_method: &[u8]) -> Result<(), HRESULT> {
        let len = new_method.len() as ULONG;
        let ptr = new_method.as_ptr();
        let hr = unsafe { self.SetILFunctionBody(len, ptr) };
        if FAILED(hr) {
            Err(hr)
        } else {
            Ok(())
        }
    }
}
```

Method rewriting

When the profiler is initialized, a file is read that contains details about the methods of interest for instrumentation. These are identified by assembly name, version, method name and signature.

Upon initialization, the profiler subscribes to module load events. When a module load has finished, information about the module is retrieved and checked against the collection of methods of interest. If methods of interest are found within the module, their identifiers are sent to a separate thread in which a request to reJIT compile the method is made.

At the point at which the runtime will reJIT a method, the runtime calls the profiler's ICorProfilerCallback4::GetReJITParameters implementation, allowing the profiler the opportunity to rewrite the IL instructions of the method.

In its simplest form, rewriting IL instructions for the purposes of APM instrumentation requires:

  1. Inserting a call instruction before the original instructions, to a _begin_ method in a managed assembly that is shipped with the profiler. The _begin_ method will start an APM transaction or span and is passed the method arguments to be able to pull information from them and set properties on the trace.
  2. Enclosing the original instructions in a try/catch exception handling block, to capture any thrown exception as part of the APM transaction or span
  3. Capturing the return value of the original instructions to return at the end of the rewritten instructions
  4. Inserting a call instruction after the original instructions, to an _end_ method in a managed assembly that is shipped with the profiler. The _end_ method will end the APM transaction or span.
  5. Returning the return value, if there is one.

Referring back to the Kafka example, the producer.ProduceAsync() instructions are rewritten from:

```
2542064073096   ldloca.s 0
2542064073098       call 167772699  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Create()
2542064073103      stfld 167772700
2542064073108   ldloca.s 0
2542064073110    ldarg.0
2542064073111      stfld 167772701
2542064073116   ldloca.s 0
2542064073118    ldarg.1
2542064073119      stfld 167772702
2542064073124   ldloca.s 0
2542064073126    ldarg.2
2542064073127      stfld 167772703
2542064073132   ldloca.s 0
2542064073134    ldarg.3
2542064073135      stfld 167772704
2542064073140   ldloca.s 0
2542064073142  ldc.i4.m1
2542064073143      stfld 167772705
2542064073148   ldloca.s 0
2542064073150     ldflda 167772700
2542064073155   ldloca.s 0
2542064073157       call 721420323  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start(1 argument{s})
2542064073162   ldloca.s 0
2542064073164     ldflda 167772700
2542064073169       call 167772707  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.get_Task()
2542064073174        ret
```

to

```
.try {
  .try {
    .try {
      2542064073096       call 721420425  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetInvoker.GetDefaultValue()
      2542064073101    stloc.1
      2542064073102       call 167773072  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetReturn`1.GetDefault()
      2542064073107    stloc.3
      2542064073108     ldnull
      2542064073109    stloc.2
      2542064073110    ldarg.0
      2542064073111    ldarg.1
      2542064073112    ldarg.2
      2542064073113    ldarg.3
      2542064073114       call 721420426  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetInvoker.BeginMethod(4 argument{s})
      2542064073119    stloc.s 4
      2542064073121    leave.s 2542064073130
    }
    .catch {
      2542064073123       call 721420427  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetInvoker.LogException(1 argument{s})
      2542064073128    leave.s 2542064073130
    }
    2542064073130   ldloca.s 0
    2542064073132       call 167772699  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Create()
    2542064073137      stfld 167772700
    2542064073142   ldloca.s 0
    2542064073144    ldarg.0
    2542064073145      stfld 167772701
    2542064073150   ldloca.s 0
    2542064073152    ldarg.1
    2542064073153      stfld 167772702
    2542064073158   ldloca.s 0
    2542064073160    ldarg.2
    2542064073161      stfld 167772703
    2542064073166   ldloca.s 0
    2542064073168    ldarg.3
    2542064073169      stfld 167772704
    2542064073174   ldloca.s 0
    2542064073176  ldc.i4.m1
    2542064073177      stfld 167772705
    2542064073182   ldloca.s 0
    2542064073184     ldflda 167772700
    2542064073189   ldloca.s 0
    2542064073191       call 721420323  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.Start(1 argument{s})
    2542064073196   ldloca.s 0
    2542064073198     ldflda 167772700
    2542064073203       call 167772707  | System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.get_Task()
    2542064073208    stloc.1
    2542064073209    leave.s 2542064073243
  }
  .catch {
    2542064073211    stloc.2
    2542064073212    rethrow
  }
}
.finally {
  .try {
    2542064073214    ldarg.0
    2542064073215    ldloc.1
    2542064073216    ldloc.2
    2542064073217    ldloc.s 4
    2542064073219       call 721420428  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetInvoker.EndMethod(4 argument{s})
    2542064073224    stloc.3
    2542064073225   ldloca.s 3
    2542064073227       call 167773076  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetReturn`1.GetReturnValue()
    2542064073232    stloc.1
    2542064073233    leave.s 2542064073242
  }
  .catch {
    2542064073235       call 721420427  | Elastic.Apm.Profiler.Managed.CallTarget.CallTargetInvoker.LogException(1 argument{s})
    2542064073240    leave.s 2542064073242
  }
  2542064073242 endfinally
}
2542064073243    ldloc.1
2542064073244        ret
```

Since a profiler retrieves the IL instructions as a pointer to a collection of bytes, the first step is to parse the original method bytes into a collection of instructions that can be manipulated. The instruction set supported by the .NET runtime is documented in the ECMA-335 Common Language Infrastructure (CLI) specification. Once parsed, instructions can be rewritten. The rules for rewriting are not well documented, but they're essentially as follows when new instructions are added.

For a _tiny_ method:

  1. The method header code size must be updated with the length of the new instructions. If the new code size exceeds 64 bytes, the _tiny_ method must be expanded to a _fat_ method
  2. If local variables are added, the _tiny_ method must be expanded to a _fat_ method
  3. If exception handling clauses are added, the _tiny_ method must be expanded to a _fat_ method

For a _fat_ method:

  1. The method header code size must be updated with the length of the new instructions
  2. The method header max stack must be updated with the stack size of the new instructions
  3. If there are exception handling clauses, the offset and length must be updated for each clause that is affected by the offset at which new instructions are inserted. If the exception handling clause is a _small_ version, and the offset or length overflows the size of the offset and length allowed for a _small_ exception handling clause, it must be expanded to a _fat_ exception handling clause.

More generally:

  1. For short branching instructions, the offset must be updated for each instruction that is affected by the offset at which new instructions are inserted. If the newly calculated offset for a short branching instruction overflows the size of the offset that the short branching instruction allows, the short branching instruction must be expanded to its larger branching instruction counterpart. Differences in instruction length must be reflected by updating the method header code size and exception handling clause offsets and lengths.
  2. For branching instructions, the offset must be updated for each instruction that is affected by the offset at which new instructions are inserted.

Once rewritten, a pointer to the bytes of the new instructions is passed back to the runtime to reJIT compile. As you can see, there's quite a bit to it, but thankfully it's all taken care of by the profiler.

Summary

We're keen to hear feedback on the profiler auto instrumentation public beta. Please try it out by downloading the elastic_apm_profiler_*.zip from the GitHub release page, and let us know your thoughts, particularly if you are using technologies that are supported by the currently available instrumentations.

We're excited about the future capabilities that this approach can provide; it opens up the potential to allow customers to trace their own code without code changes in future, similar to the APM Java agent's trace methods feature, and to be able to deploy and configure .NET auto instrumentation to target hosts with Elastic agent.

A special thank you to Microsoft for open sourcing the .NET runtime, and to open source profiler implementations for sharing knowledge of profiling internals. The Elastic APM profiler auto instrumentation is released under Apache 2.0 license.