Tech Topics

Performance Tuning of the Elastic APM Java Agent

In case you missed the announcement, the Elastic APM Java agent is one of the new additions to the collection of APM agents we provide free and open source. You can use our agents to get insight into the performance of your application and to analyze the root causes of errors. Elastic APM also supports distributed tracing which is very useful for service-oriented architectures.

In this blog post, I want to give insight into how much the agent affects the performance of your applications and which configuration settings affect the performance.

Agents typically have several main areas where they impose overhead. And, by the way, don't believe anyone who tells you that their agent has zero overhead. That's a myth, unless they have found a way to bend the laws of physics. Recording and reporting data always imposes some overhead. The question is whether that overhead is small enough for your particular application to be negligible. Where was I? Right, agents impose overhead in these areas:

Latency

In order to collect traces, there has to run some code on the critical path of your application. Traces are captured metadata about requests and responses, like duration, URL, status code etc. However, we take great care to keep the code on the critical path as lightweight as possible. For example, the actual reporting of events is done on a background thread.

Not only is it important that the average latency is low but there should also be no significant outliers so that, for example, one out of 100 requests experiences very poor performance while the others are relatively fast. That's why it's important to not only ask for the average latency but also for the higher percentiles of the latency.

The main sources of spikes in higher latencies are garbage collection pauses and contended locks.

We take great care to minimize the memory allocations we do in the Java agent as much as possible. For example, we reuse the objects needed to record transactions and spans. Instead of allocating new objects, we take them from an object pool and return them to this pool once they are not used anymore. See this README for more details. When it comes to reporting the recorded events, we directly serialize them into the output stream of the request to the APM server while only relying on reusable buffers. This way we can report events without allocating any objects. We do all that in order to not add additional work for the GC which is already busy cleaning up the memory your application is allocating.

The Java agent is also using specialized data structures (LMAX Disruptor and queues from JCTools) when we transfer events across threads, for example from the application threads which record transactions to the background reporter thread. This is to circumvent problems like lock contention and false sharing you would get from standard JDK data structures like ArrayBlockingQueue.

In single-threaded benchmarks, our Java agent imposes an overhead in the order of single-digit microseconds (µs) up to the 99.99th percentile. The benchmarks were run on a Linux machine with an i7-7700 (3.60GHz) on Oracle JDK 10. We are currently working on multi-threaded benchmarks. When disabling header recording, the agent allocates less than one byte for recording an HTTP request and one JDBC (SQL) query, including reporting those events in the background to the APM Server.

CPU

Even though we do the bulk of the work in the background, which is serializing and compressing the events and sending them to the APM Server, this does actually also add a bit of CPU overhead. If your application is not CPU bound, this shouldn’t matter much. Your application is probably not CPU bound if you do (blocking) network I/O like communicating with databases or external services.

Note that if the APM Server can’t handle all the events, the agent will drop data to not crash your application. It will then also not serialize and gzip the events.

Memory

Unless you have really small heaps, you usually don't have to increase the heap size for the Java agent. It has a fairly small and static memory overhead for the object pools and some small buffers in the order of a couple of megabytes.

Network

The Agent requires some network bandwidth as it needs to send the recorded events to the APM server. This is where it really comes down to how many requests your application handles and how many of those you want to record and store. This can be adjusted with the so-called "sample rate". More on that later.

Tuning the Performance with Configuration Options

The Java agent offers a variety of configuration options, some of which can have a significant impact on performance.

Sample Rate

The sample rate is the percentage of requests which should be recorded and sent to the APM Server, which processes and sends them to Elasticsearch for storage.

There is no one-size-fits-all answer to an ideal sample rate. Sampling comes down to your preferences and your application. The more you want to sample, the more network bandwidth and disk space you’ll need.

It’s important to note that the latency of an application won’t be affected much by the agent, even if you sample at 100%. However, the background reporter thread has some work to do for serializing and gzipping events.

The sample rate can be changed by altering the transaction_sample_rate configuration.

Collection of Stack Traces

If a span (for example a captured JDBC query) takes longer than 5ms, we capture the stack trace so that you can find out what the code path was which lead to the query. Stack traces can be quite long, taking up bandwidth and disk space and also require object allocations. But because we are processing the stack trace asynchronously, it adds very little latency. Upping the span_frames_min_duration setting or disabling stack trace collection altogether can gain you a bit of performance if needed.