Create your own instrumentation with the Java Agent Plugin

blog-charts-packages.png

The Elastic APM (application performance monitoring) Java Agent automatically instruments many frameworks and technologies to trace and monitor the performance of applications and track errors. Developers can also add code to instrument their own applications using the OpenTelemetry API (more on that below).

But if you can’t or don’t want to add code, and the technology you want to monitor is not automatically supported, there’s another option: use the Elastic APM Java Agent Plugin API to create your own instrumentation that runs within the Java Agent.

In this article, I’ll step through a straightforward example to show you how easy it is to create your own instrumentation that will run when both the Java Agent and the targeted technology is loaded in your application. The repo holding the example is available to use alongside this article to help you apply the learning technique that works best for you.

[Related article: Regression testing your Java Agent plugin]

Agent basics

If you already know about or use Elastic APM and the Java Agent, skip to the next section. For everyone else, I’ll quickly cover the components of our overall architecture.

To start, you’ll have a Java application running somewhere within a larger system. By adding in Elastic APM, you’ll have an Elastic APM server installation (probably in the cloud), and your pre-existing Java application will now run with the Java agent attached to it. The Java Agent communicates with Elastic APM. 

If you’d like to get into the details of running your Java application with Elastic APM, this article is a great starting point.

Our example Java application

For this walkthrough, we need an example application to instrument. I'll use a simple custom webserver, chosen because it is a simple but non-trivial application, so that we have something simple but realistic to instrument. The webserver is deliberately very simple so it’s easy to understand (though that leaves it quite limited). You can run it with:

  var server = new ExampleBasicHttpServer();
    new Thread(() -> {
        try {
            server.start();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }).start();
    server.blockUntilReady();
    System.out.println("WebServer started on port "+server.getLocalPort());
    System.out.println("Check at http://localhost:"+server.getLocalPort());

The Elastic APM Java Agent automatically instruments the webserver included with the JDK. We have an equivalent (though more fully featured) webserver in the repo using that, so you can do a direct like-for-like comparison of all aspects of the plugin (implementation, outputs).

[Related article: Monitoring Elastic Enterprise Search performance using Elastic APM]

Choosing what to instrument

Our aim is to add specific instrumentation to monitor our example application. To select what to instrument, either go into the source code (preferred, if you have the sources available) or examine stack traces and find the methods where the application can provide useful monitoring information. 

You can instrument technology that you don’t have sources for, but this is more difficult than if you have the source code. Instrumenting requires choosing which methods and fields let you “hook” in to monitor the technology. You want the instrumentation to provide span duration and child spans from callouts, as well as any contextual metadata useful for tracing issues.

For our ExampleBasicHttpServer, there is a private void handleRequest(String, BufferedReader, PrintWriter) method, which does the interesting work of taking an incoming HTTP request, processing it, and producing the resulting HTML page for the browser. That looks like the ideal method to track what is happening in the application for any request flow. Request handling entry and exit points are often a good point to target instrumentation.

Using the Plugin/OpenTelemetry API

The Elastic APM Java Agent Plugin API uses the OpenTelemetry API to keep the learning load minimal (or optimal, depending on how you look at it). This means OpenTelemetry examples of adding tracing to code are completely applicable. Using the OpenTelemetry documentation for our example, we'd wrap the handleRequest() method call in a Span. If you did that directly in the code, it would look like this:

Span span = tracer.spanBuilder(request).startSpan();
try (Scope scope = span.makeCurrent()) {
  handleRequest(request, ... 
} catch(Throwable thrown){
  span.setStatus(StatusCode.ERROR);
  span.recordException(thrown);
  throw t;
} finally {
  span.end();
}

If we were going to alter the code directly in the application, this is exactly the code we would add and we’d be done — the Elastic APM Java Agent (and any other OpenTelemetry compatible agent) would pick up the manual tracing we’ve added. 

But in this article, we’re assuming we can’t or don’t want to directly change the code (e.g., for maintainability/separation of concerns and responsibilities). Below, we’ll use the above wrapping code as our template for how to instrument the handleRequest() method. (Note that on exiting the try(Scope ...) block, the Scope resource is closed — that’s the contract of try(resource) blocks.)

The full instrumentation specification

Here’s how to do the equivalent of manually adding the above wrapping code in to the application:

  1. Find the class co.elastic.apm.example.webserver.ExampleBasicHttpServer
  2. Within that class, find the method with signature void handleRequest(String request, BufferedReader clientInput, PrintWriter outputToClient) throws IOException
  3. When you find that class and method, create a Span and Scope when you enter the method (i.e., run the following each time on entering the handleRequest() method)

    Span span = tracer.spanBuilder(request).startSpan();
    Scope scope = span.makeCurrent();
  4. After the handleRequest() method finishes running, on exiting the method, capture any exception, and end the span and close the scope (the scope in the code fragment in the previous section is closed as part of exiting the try(resource) block).

    span.end();
    if (thrown != null {
    span.setStatus(StatusCode.ERROR);
    span.recordException(thrown);
    }
    scope.close();

(The implementation in the example class is wrapped by try blocks to be more defensive.)

The full plugin class implementation

There are some additional niceties — for example, for better indexing, keep the cardinality of the span names low. To do that, just use the request path rather than the full request (e.g., both “/nothing” and “/nothing#something” requests would become just “/nothing” in the span name). 

Essentially, we now have all the implementation ready from the specification in the last section. Now we can construct the actual plugin. There is only one additional item we need to know: the actual implementation needs to extend the ElasticApmInstrumentation abstract class from co.elastic.apm:apm-agent-plugin-sdk. (So, our plugin has three dependencies: the OpenTelemetry API (in compile scope), the Elastic APM Java Agent Plugin API (in provided scope) and, of course, the classes that are being instrumented.)

The full implementation is in the ExampleHttpServerInstrumentation class in the repo. Let’s have a look at some interesting bits we haven’t already covered.

The ElasticApmInstrumentation abstract methods

First, we have three methods that need to be implemented because they are abstract in the abstract ElasticApmInstrumentation superclass:

  • getTypeMatcher() - match the classes that we want to instrument (step 1 in the instrumentation specification section above)
  • getMethodMatcher() - match the method signatures (in the classes matched in getTypeMatcher()) we want to instrument (step 2 in the instrumentation specification section above)
  • getInstrumentationGroupNames() - names that we can use to disable the instrumentation if we don’t want them applied automatically

The first two methods return Byte Buddy matcher objects. The Elastic APM Java Agent uses Byte Buddy for the low-level byte-code instrumentation (the majority of Java APM agents do too, Byte Buddy is well designed for agent transformations).

Byte Buddy is a transitive dependency of the Elastic APM Java Agent Plugin API. Using Byte Buddy ElementMatchers is fairly straightforward and very flexible, you can match in several different ways. There are many examples on the web for Byte Buddy ElementMatchers, and it’s also quite easy to test the matching. The matching is easily understandable, too:

    ElementMatchers.named(
        "co.elastic.apm.example.webserver.ExampleBasicHttpServer")

clearly matches the ExampleBasicHttpServer class, and

    ElementMatchers
        .named("handleRequest")
        .and(takesArguments(3))
        .and(takesArgument(0, named("java.lang.String")))

clearly matches any method named “handleRequest”, which has three parameters and where the first parameter is a String.

Note that some matchers are expensive, like ones that inspect the class hierarchy. Using heavy matchers can considerably affect startup time and startup heap consumption. Whenever there is no way to avoid heavy matchers, plugin authors should get familiar with other (non-abstract) filtering matchers offered by ElasticApmInstrumentation.

The ElasticApmInstrumentation Advice class

The last overridden method in ExampleHttpServerInstrumentation is getAdviceClassName(). This is not abstract in the superclass, so it has a default implementation. The default implementation would produce exactly the same string that the overridden method does here (we included it to draw attention to the flexibility it provides). The classname it returns can be any class accessible within the plugin jar, though using a static inner class is common because it keeps the code that does the instrumentation close to the code that does the matching that would apply the instrumentation.

The Inner Advice class

We’ve done steps 1 and 2 in the instrumentation specification section above, with the matchers implemented to find the method to instrument. Now we need to do steps 3 and 4 — the steps where we actually instrument the code to alter ExampleBasicHttpServer.handleRequest() so that it is automatically monitored by the agent.

We will again use Byte Buddy features for these two steps. We’ll use the most common pattern of instrumentation, where a method is hooked in to execute code on entry and on exit. The code we want to run are those two steps defined earlier, step 3 (on entering) and step 4 (on exiting).

The Advice On-Enter method

The first step is to tell Byte Buddy that when we enter the matched method (ExampleBasicHttpServer.handleRequest()), we want to run code (before the ExampleBasicHttpServer.handleRequest() body is run). We use the Byte Buddy annotation @Advice.OnMethodEnter to annotate a new “onEnterHandle” method.

    @Advice.OnMethodEnter(suppress = Throwable.class, inline = false)
    public static Object onEnterHandle(
                        @Advice.Argument(0) String requestLine) { ...

Note: 

  • The method being annotated must have a “public static” signature.
  • The method can return an Object, which will be made available to the On-Exit advice by Byte Buddy (see below), or be void.
  • The method name (here “onEnterHandle”) can be any valid method name.
  • The Byte Buddy annotation on the first parameter @Advice.Argument(0) String requestLine says that we want to use the first String parameter of ExampleBasicHttpServer.handleRequest(), in our onEnterHandle() method body, calling that first parameter “requestLine.”
  • The “suppress” parameter used in the annotation means that if any Throwable exception is thrown while the advice method (onEnterHandle()) runs, that exception will be suppressed (not thrown by ExampleBasicHttpServer.handleRequest() nor make it exit early).
  • The “inline” annotation parameter being “false” means that the code in onEnterHandle() will not be inlined into ExampleBasicHttpServer.handleRequest(); instead, ExampleHttpServerInstrumentation$AdviceClass.onEnterHandle() will be called on entry of ExampleBasicHttpServer.handleRequest().

We feel that calling out instead of inlining the code makes the instrumentation more flexible. For a much more detailed explanation, check out this article by my Elastic Java Agent colleague Felix, and Rafael the author of Byte Buddy.

The onEnterHandle() method body implementation is straightforward Java. We strip the request down to keep the span name cardinality low, and we use the recommended way to access the OpenTelemetry tracer in the Java Agent Opentelemetry Bridge, which is:

    tracer = GlobalOpenTelemetry.get().getTracer(SOME_STRING);

Last of all, we return the Scope object from the method — this is so that we can close it on exiting the instrumented method (see step 4 above and the next section).

The Advice On-Exit method

The onExitHandle method covers much of the same ground as the onEnterHandle. Again, we use Byte Buddy annotations to say we want to add a call to the onExitHandle() method when the ExampleBasicHttpServer.handleRequest() method exits. 

The additional “onThrowable” parameter in the annotation tells Byte Buddy that we want this onExitHandle() method to be called even if any Throwable is thrown by the ExampleBasicHttpServer.handleRequest(). The parameters to the onExitHandle() method let us obtain the thrown Throwable (@Advice.Thrown Throwable thrown, “thrown” will be null if there was no thrown Throwable) and obtain that Scope object we returned from the onEnterHandle() (@Advice.Enter Object scopeObject). The actual method implementation is straightforward Java similar to Step 4 above, but more defensive.

The plugin jar

We're almost done! We just have a couple of final steps to include our plugin with our agent. The final step will be to build a jar — the agent needs the plugin to be in a separate jar. The jar is specified by the plugins_dir configuration option.

Before we build the jar, we need to add a couple of things because of how the plugin works. The agent isolates each plugin so that plugins can't interfere with each other. It does so by loading the plugin in its own classloader. This means the plugin will need all of its dependencies (not including the instrumented library, since that is loaded by the application, nor the agent plugin SDK which is provided by the agent) packaged in the plugin jar. 

Additionally, as there are potentially many classes in that jar, the agent needs to know which classes are the ones that specify instrumentations to apply. This is specified by listing the full instrumentation class name in a file called "co.elastic.apm.agent.sdk.ElasticApmInstrumentation" which is held in the directory META-INF/services/ within the jar.

Now we have everything. The implements exactly the example described in this article, including a regression integration test that loads the agent and runs the resulting plugin jar.

Try it out!

To run the agent with the plugin, just include the jar as described above in the directory specified by the plugins_dir configuration option. You may additionally need to set enable_experimental_instrumentations depending on the version of the Elastic APM Java Agent you are using.

The best place to get started with Elastic APM is in the cloud. Begin your 14-day free trial of Elastic Cloud today!

Troubleshooting

There are some common problems you might run into when creating your plugin:

Still experimental?

The OpenTelemetry bridge was added in Elastic APM Java Agent version 1.30.0 — so that is the earliest version you can use this plugin mechanism with — and it was initially added as experimental technology. Depending on which version you are using, you may need to have  enable_experimental_instrumentations set. You can check this is on by looking for the following log output from the Elastic APM Java Agent:

INFO co.elastic.apm.agent.configuration.StartupInfo - enable_experimental_instrumentations: 'true' 

Correct plugins dir?

If you are using the plugins_dir option, is it being picked up correctly and do you have the jar you created present in that directory? Look for the following log output from the Elastic APM Java Agent:

INFO co.elastic.apm.agent.configuration.StartupInfo - plugins_dir: '... 

Jar being loaded?

Check to ensure the plugin jar is being loaded. Look for the following log output from the Elastic APM Java Agent:

INFO  co.elastic.apm.agent.bci.ElasticApmAgent - Loading plugin ... 

Classes compiled to a more recent version than the JDK?

If you have compiled the classes to a JDK version that’s more advanced than the JVM you are running, you’ll likely get loading errors. A typical thing to look out for after turning the log level to DEBUG are lines like:

DEBUG co.elastic.apm.agent.util.DependencyInjectingServiceLoader - Skipping …  because it only applies to more recent Java versions and the JVM running this app is too old to load it.

Class loading errors or dependencies missing

Remember that you need all the dependencies in the plugin jar — if some are missing, there will be errors. Make sure you have turned the log level to DEBUG, and search through the output for errors. For example, missing the OpenTelemetry dependency produces this output:

java.lang.Error: Unresolved compilation problems:  
	Tracer cannot be resolved to a type 
	GlobalOpenTelemetry cannot be resolved 
	Span cannot be resolved to a type 

Other errors vs. what success looks like

Make sure you have turned the log level to DEBUG. The output when the plugin is successfully instrumenting should include lines like:

DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Applying instrumentation YOUR_PLUGIN_INSTRUMENTATION_CLASS 
DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Type match for instrumentation  YOUR_PLUGIN_INSTRUMENTATION_CLASS: … matches … 
DEBUG co.elastic.apm.agent.bci.ElasticApmAgent - Method match for instrumentation  YOUR_PLUGIN_INSTRUMENTATION_CLASS: … 

And in particular, you should see the instrumentation working, with output for the tracing you’ve added like:

DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - increment references to " TRACE_ID (OBJECT_ID) (REFERENCE_COUNT) 
DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - startTransaction " TRACE_ID (OBJECT_ID) (REFERENCE_COUNT)
DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - increment references to 'SPAN NAME' TRACE_ID (OBJECT_ID) (REFERENCE_COUNT)
DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - endTransaction 'SPAN NAME' TRACE_ID (OBJECT_ID) (REFERENCE_COUNT)
DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'SPAN NAME' TRACE_ID (OBJECT_ID) (REFERENCE_COUNT)