Understanding APM: How to add extensions to the OpenTelemetry Java Agent

flexible-implementation-1680X980.png

Without code access, SREs and IT Operations cannot always get the visibility they need

As an SRE, have you ever had a situation where you were working on an application that was written with non-standard frameworks, or you wanted to get some interesting business data from an application (number of orders processed for example) but you didn’t have access to the source code? 

We all know this can be a challenging scenario resulting in visibility gaps, inability to fully trace code end to end, and missing critical business monitoring data that is useful for understanding the true impact of issues. 

How can we solve this? One way we discussed in the following three blogs:

This is where we develop a plugin for the Elastic® APM Agent to help get access to critical business data for monitoring and add tracing where none exists.
  
What we will discuss in this blog is how you can do the same with the
OpenTelemetry Java Agent using the Extensions framework.

Basic concepts: How APM works

Before we continue, let's first understand a few basic concepts and terms.

  • Java Agent: This is a tool that can be used to instrument (or modify) the bytecode of class files in the Java Virtual Machine (JVM). Java agents are used for many purposes like performance monitoring, logging, security, and more.
  • Bytecode: This is the intermediary code generated by the Java compiler from your Java source code. This code is interpreted or compiled on the fly by the JVM to produce machine code that can be executed.
  • Byte Buddy: Byte Buddy is a code generation and manipulation library for Java. It is used to create, modify, or adapt Java classes at runtime. In the context of a Java Agent, Byte Buddy provides a powerful and flexible way to modify bytecode. Both the Elastic APM Agent and the OpenTelemetry Agent use Byte Buddy under the covers.

Now, let's talk about how automatic instrumentation works with Byte Buddy:

Automatic instrumentation is the process by which an agent modifies the bytecode of your application's classes, often to insert monitoring code. The agent doesn't modify the source code directly, but rather the bytecode that is loaded into the JVM. This is done while the JVM is loading the classes, so the modifications are in effect during runtime.

Here's a simplified explanation of the process:

  1. Start the JVM with the agent: When starting your Java application, you specify the Java agent with the -javaagent command line option. This instructs the JVM to load your agent before the main method of your application is invoked. At this point, the agent has the opportunity to set up class transformers.

  2. Register a class file transformer with Byte Buddy: Your agent will register a class file transformer with Byte Buddy. A transformer is a piece of code that is invoked every time a class is loaded into the JVM. This transformer receives the bytecode of the class and it can modify this bytecode before the class is actually used.

  3. Transform the bytecode: When your transformer is invoked, it will use Byte Buddy's API to modify the bytecode. Byte Buddy allows you to specify your transformations in a high-level, expressive way rather than manually writing complex bytecode. For example, you could specify a certain class and method within that class that you want to instrument and provide an "interceptor" that will add new behavior to that method.

    1. For instance, let's say you want to measure the execution time of a method. You would instruct Byte Buddy to target the specific class and method and then provide an interceptor that wraps the method call with timing code. Every time this method is invoked, your interceptor is called first, measures the start time, then it calls the original method, and finally measures the end time and prints the duration.

  4. Use the transformed classes: Once the agent has set up its transformers, the JVM continues to load classes as usual. Each time a class is loaded, your transformers are invoked, allowing them to modify the bytecode. Your application then uses these transformed classes as if they were the original ones, but they now have the extra behavior that you've injected through your interceptor.

flowchart process

In essence, automatic instrumentation with Byte Buddy is about modifying the behavior of your Java classes at runtime, without needing to alter the source code directly. This is especially useful for cross-cutting concerns like logging, monitoring, or security, as it allows you to centralize this code in your Java Agent, rather than scattering it throughout your application.

Application, prerequisites, and config

There is a really simple application in this GitHub repository that is used throughout this blog. What it does is it simply asks you to input some text and then it counts the number of words. 

It’s also listed below:

package org.davidgeorgehope;
import java.util.Scanner;
import java.util.logging.Logger;

public class Main {
    private static Logger logger = Logger.getLogger(Main.class.getName());

    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        while (true) {
            System.out.println("Please enter your sentence:");
            String input = scanner.nextLine();
            Main main = new Main();
            int wordCount = main.countWords(input);
            System.out.println("The input contains " + wordCount + " word(s).");
        }
    }
    public int countWords(String input) {

        try {
            Thread.sleep(10000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

        if (input == null || input.isEmpty()) {
            return 0;
        }

        String[] words = input.split("\\s+");
        return words.length;
    }
}

For the purposes of this blog, we will be using Elastic Cloud to capture the data generated by OpenTelemetry — follow the instructions here to get started on Elastic Cloud.

Once you are started with Elastic Cloud, go grab the OpenTelemetry config from the APM pages:

apm agents

You will need this later. 

Finally, download the OpenTelemetry Agent.

Firing up the application and OpenTelemetry

If you start out with this simple application, build it and run it like so with the OpenTelemetry Agent, filling in the appropriate variables with those you got from earlier.

java -javaagent:opentelemetry-javaagent.jar -Dotel.exporter.otlp.endpoint=XX -Dotel.exporter.otlp.headers=XX -Dotel.metrics.exporter=otlp -Dotel.logs.exporter=otlp -Dotel.resource.attributes=XX -Dotel.service.name=your-service-name -jar simple-java-1.0-SNAPSHOT.jar

You will find nothing happens. The reason for this is that the OpenTelemetry Agent has no way of knowing what to monitor. The way that APM with automatic instrumentation works is that it “knows” about standard frameworks, like Spring or HTTPClient, and is able to get visibility by “injecting” trace code into those standard frameworks automatically.

It has no knowledge of org.davidgeorgehope.Main from our simple Java application.

Luckily, there is a way we can add this using the OpenTelemetry Extensions framework.

The OpenTelemetry Extension

In the repository above, aside from the simple-java application, there is also a plugin for Elastic APM and an extension for OpenTelemetry. The relevant files for OpenTelemetry Extension are located hereWordCountInstrumentation.java and WordCountInstrumentationModule.java .

You’ll notice that OpenTelemetry Extensions and Elastic APM Plugins both make use of Byte Buddy, which is a common library for code instrumentation. There are some key differences in the way the code is bootstrapped, though.

The WordCountInstrumentationModule class extends an OpenTelemtry specific class InstrumentationModule, whose purpose is to describe a set of TypeInstrumentation that need to be applied together to correctly instrument a specific library. The WordCountInstrumentation class is one such instance of a TypeInstrumentation

Type instrumentations grouped in a module share helper classes, muzzle runtime checks, and applicable class loader criteria, and can only be enabled or disabled as a set. 

This is a little bit different from how the Elastic APM Plugin works because the default method to to inject code with OpenTelemetry is inline (which is the default) with OpenTelemetry, and you can inject dependencies into the core application classloader using the InstrumentationModule configurations (as shown below). The Elastic APM method is safer as it allows isolation of helper classes and makes it easier to debug with normal IDEs we are contributing this method to OpenTelemetry. Here we inject the TypeInstrumentation class and the WordCountInstrumentation class into the classloader.

    @Override
    public List<String> getAdditionalHelperClassNames() {
        return List.of(WordCountInstrumentation.class.getName(),"io.opentelemetry.javaagent.extension.instrumentation.TypeInstrumentation");
    }

The other interesting part of the TypeInstrumentation class is the setup.

Here we give our instrumentation “group” a name. An InstrumentationModule needs to have at least one name. The user of the javaagent can suppress a chosen instrumentation by referring to it by one of its names. The instrumentation module names use kebab-case. 

    public WordCountInstrumentationModule() {
        super("wordcount-demo", "wordcount");
    }

Apart from this, we see methods in this class to specify the order of loading this relative to other instrumentation if needed, and we specify the class that extends TypeInstrumention and are responsible for the main bulk of the instrumentation work. 

Let's take a look at that WordCountInstrumention class, which extends TypeInstrumention now:

// The WordCountInstrumentation class implements the TypeInstrumentation interface.
// This allows us to specify which types of classes (based on some matching criteria) will have their methods instrumented.

public class WordCountInstrumentation implements TypeInstrumentation {

    // The typeMatcher method is used to define which classes the instrumentation should apply to.
    // In this case, it's the "org.davidgeorgehope.Main" class.
    @Override
    public ElementMatcher<TypeDescription> typeMatcher() {
        logger.info("TEST typeMatcher");
        return ElementMatchers.named("org.davidgeorgehope.Main");
    }

    // In the transform method, we specify which methods of the classes matched above will be instrumented, 
    // and also the advice (a piece of code) that will be added to these methods.
    @Override
    public void transform(TypeTransformer typeTransformer) {
        logger.info("TEST transform");
        typeTransformer.applyAdviceToMethod(namedOneOf("countWords"),this.getClass().getName() + "$WordCountAdvice");
    }

    // The WordCountAdvice class contains the actual pieces of code (advices) that will be added to the instrumented methods.
    @SuppressWarnings("unused")
    public static class WordCountAdvice {
        // This advice is added at the beginning of the instrumented method (OnMethodEnter).
        // It creates and starts a new span, and makes it active.
        @Advice.OnMethodEnter(suppress = Throwable.class)
        public static Scope onEnter(@Advice.Argument(value = 0) String input, @Advice.Local("otelSpan") Span span) {
            // Get a Tracer instance from OpenTelemetry.
            Tracer tracer = GlobalOpenTelemetry.getTracer("instrumentation-library-name","semver:1.0.0");
            System.out.print("Entering method");

            // Start a new span with the name "mySpan".
            span = tracer.spanBuilder("mySpan").startSpan();

            // Make this new span the current active span.
            Scope scope = span.makeCurrent();

            // Return the Scope instance. This will be used in the exit advice to end the span's scope.
            return scope; 
        }

        // This advice is added at the end of the instrumented method (OnMethodExit).
        // It first closes the span's scope, then checks if any exception was thrown during the method's execution.
        // If an exception was thrown, it sets the span's status to ERROR and ends the span.
        // If no exception was thrown, it sets a custom attribute "wordCount" on the span, and ends the span.
        @Advice.OnMethodExit(onThrowable = Throwable.class, suppress = Throwable.class)
        public static void onExit(@Advice.Return(readOnly = false) int wordCount,
                                  @Advice.Thrown Throwable throwable,
                                  @Advice.Local("otelSpan") Span span,
                                  @Advice.Enter Scope scope) {
            // Close the scope to end it.
            scope.close();

            // If an exception was thrown during the method's execution, set the span's status to ERROR.
            if (throwable != null) {
                span.setStatus(StatusCode.ERROR, "Exception thrown in method");
            } else {
                // If no exception was thrown, set a custom attribute "wordCount" on the span.
                span.setAttribute("wordCount", wordCount);
            }

            // End the span. This makes it ready to be exported to the configured exporter (e.g. Elastic).
            span.end();
        }
    }
}

The target class for our instrumentation is defined in the typeMatch method, and the method we want to instrument is defined in the transform method. We are targeting the Main class and the countWords method.

As you can see, we have an inner class here that does most of the work of defining an onEnter and onExit method, which tells us what to do when we enter the countWords method and when we exit the countWords method. 

In the onEnter method, we set up a new OpenTelemetry span, and in the onExit method, we end the span. If the method successfully ends, we also grab the wordcount and append that to the attribute. 

Now let's take a look at what happens when we run this. The good news is that we have made this extremely simple by providing a dockerfile for your use to do all the work for you.

Pulling this all together

Clone the GitHub repository if you have not already done so, and before continuing, let’s take a quick look at the dockerfile we are using.

# Build stage
FROM maven:3.8.7-openjdk-18 as build

COPY simple-java /home/app/simple-java
COPY opentelemetry-custom-instrumentation /home/app/opentelemetry-custom-instrumentation

WORKDIR /home/app/simple-java
RUN mvn install

WORKDIR /home/app/opentelemetry-custom-instrumentation
RUN mvn install

# Package stage
FROM maven:3.8.7-openjdk-18
COPY --from=build /home/app/simple-java/target/simple-java-1.0-SNAPSHOT.jar /usr/local/lib/simple-java-1.0-SNAPSHOT.jar
COPY --from=build /home/app/opentelemetry-custom-instrumentation/target/opentelemetry-custom-instrumentation-1.0-SNAPSHOT.jar /usr/local/lib/opentelemetry-custom-instrumentation-1.0-SNAPSHOT.jar

WORKDIR /

RUN curl -L -o opentelemetry-javaagent.jar https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

COPY start.sh /start.sh
RUN chmod +x /start.sh

ENTRYPOINT ["/start.sh"]

This dockerfile works in two parts: during the docker build process, we build the simple-java application from source followed by the custom instrumentation. After this, we download the latest OpenTelemetry Java Agent. During runtime, we simple execute the start.sh file described below:

#!/bin/sh
java \
-javaagent:/opentelemetry-javaagent.jar \
-Dotel.exporter.otlp.endpoint=${SERVER_URL} \
-Dotel.exporter.otlp.headers="Authorization=Bearer ${SECRET_KEY}" \
-Dotel.metrics.exporter=otlp \
-Dotel.logs.exporter=otlp \
-Dotel.resource.attributes=service.name=simple-java,service.version=1.0,deployment.environment=production \
-Dotel.service.name=your-service-name \
-Dotel.javaagent.extensions=/usr/local/lib/opentelemetry-custom-instrumentation-1.0-SNAPSHOT.jar \
-Dotel.javaagent.debug=true \
-jar /usr/local/lib/simple-java-1.0-SNAPSHOT.jar

There are two important things to note with this script: the first is that we start the javaagent parameter set to the opentelemetry-javaagent.jar — this will start the OpenTelemetry javaagent running, which starts before any code is executed. 

Inside this jar there has to be a class with a premain method which the JVM will look for. This bootstraps the java agent. As described above, any bytecode that is compiled is essentially filtered through the javaagent code so it can modify the class before being executed.

The second important thing here is the configuration of the javaagent.extensions, which loads our extension that we built to add instrumentation for our simple-java application.

Now run the following commands:

docker build -t djhope99/custom-otel-instrumentation:1 .
docker run -it -e 'SERVER_URL=XXX' -e 'SECRET_KEY=XX djhope99/custom-otel-instrumentation:1

If you use the SERVER_URL and SECRET_KEY you got earlier in here, you should see this connect to Elastic.

When it starts up, it will ask you to enter a sentence, enter a few sentences, and press enter. Do this a few times — there is a sleep in here to force a long running transaction:

code

Eventually you will see the service show up in the service map:

services

Traces will appear:

service name

And in the span you will see the wordcount attribute we collected:

transaction details

This can be used for further dashboarding and AI/ML, including anomaly detection if you need, which is easy to do, as you can see below. 

First click on the burger on the left side and select Dashboard to create a new dashboard:

analytics

From here, click Create Visualization.

visualization

Search for the wordcount label in the APM index as shown below:

dashboard

As you can see, because we created this attribute in the Span code as below with wordCount as a type “Integer,” we were able to automatically assign it as a numeric field in Elastic:

span.setAttribute("wordCount", wordCount);

From here we can drag and drop it into the visualization for display on our Dashboard! Super easy.

dra and drop

In conclusion

This blog elucidates the invaluable role of OpenTelemetry Java Agent in filling the visibility gaps and obtaining crucial business monitoring data, especially when access to the source code is not feasible. 

The blog unraveled the basic understanding of Java Agent, Bytecode, and Byte Buddy, followed by a comprehensive examination of the automatic instrumentation process with Byte Buddy.

The implementation of the OpenTelemetry Java Agent, using the Extensions framework, was demonstrated with the aid of a simple Java application, which underscored the agent's ability to inject trace code into the application to facilitate monitoring. 

It detailed how to configure the agent and integrate OpenTelemetry Extension, and it outlined the operation of a sample application to help users comprehend the practical application of the information discussed. This instructive blog post is an excellent resource for SREs and IT Operations seeking to optimize their work with applications using OpenTelemetry's automatic instrumentation feature.

Don’t have an Elastic Cloud account yet? Sign up for Elastic Cloud.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.