Overhead and Performance tuningedit

Agent overheadedit

Any APM Agent will impose overhead. Here are a few different areas where that overhead might be seen.

Latencyedit

Great care is taken to keep code on critical paths as lightweight as possible. For example, the actual reporting of events is done on a background thread.

It is very important that both the average latency, and higher percentiles of latency are low. That’s because a low average latency means nothing if 1% of your requests experiences very poor performance. The main sources of spikes in higher latencies are garbage collection pauses and contended locks.

We take great care to minimize the memory allocations we do in the Java agent as much as possible. For example, instead of allocating new objects, we take them from an object pool and return them to the pool when they are not used anymore. More details on this process can be found here. When it comes to reporting the recorded events, we directly serialize them into the output stream of the request to the APM server while only relying on reusable buffers. This way we can report events without allocating any objects. We do all that in order to not add additional work for the GC which is already busy cleaning up the memory your application is allocating.

The Java agent also uses specialized data structures (LMAX Disruptor and queues from JCTools) when we transfer events across threads. For example, from the application threads which record transactions to the background reporter thread. This is to circumvent problems like lock contention and false sharing you would get from standard JDK data structures like ArrayBlockingQueue.

In single-threaded benchmarks, our Java agent imposes an overhead in the order of single-digit microseconds (µs) up to the 99.99th percentile. The benchmarks were run on a Linux machine with an i7-7700 (3.60GHz) on Oracle JDK 10. We are currently working on multi-threaded benchmarks. When disabling header recording, the agent allocates less than one byte for recording an HTTP request and one JDBC (SQL) query, including reporting those events in the background to the APM Server.

CPUedit

Even though the Agent does most of its work in the background, serializing and compressing events, along with sending them to the APM Server does actually also add a bit of CPU overhead. If your application is not CPU bound, this shouldn’t matter much. Your application is probably not CPU bound if you do (blocking) network I/O, like communicating with databases or external services.

In a scenario where APM Server can’t handle all of the events, the Agent will drop data so as to not crash your application.

Memoryedit

Unless you have really small heaps, you usually don’t have to increase the heap size for the Java Agent. It has a fairly small and static memory overhead for the object pools, and some small buffers in the order of a couple of megabytes.

Networkedit

The Agent requires some network bandwidth, as it needs to send recorded events to the APM server. This is where it’s important to know how many requests your application handles and how many of those you want to record and store. This can be adjusted with the Sample rate.

Tuning the Agentedit

The Java agent offers a variety of configuration options, some of which can have a significant impact on performance. To make it easy to determine which options impact performance, we’ve tagged certain configuration options in the documentation with (performance).

Sample rateedit

Sample rate is the percentage of requests which should be recorded and sent to the APM Server. What is an ideal sample rate? Unfortunately, there’s no one-size-fits-all answer to that question. Sampling comes down to your preferences and your application. The more you want to sample, the more network bandwidth and disk space you’ll need.

It’s important to note that the latency of an application won’t be affected much by the agent, even if you sample at 100%. However, the background reporter thread has some work to do when serializing and gzipping events.

The sample rate can be changed by altering the transaction_sample_rate (performance).

Stack trace collectionedit

If a span, e.g., a captured JDBC query, takes longer than 5ms, we capture the stack trace so that you can easily find the code path which lead to the query. Stack traces can be quite long, taking up bandwidth and disk space, and also requiring object allocations. But because we are processing the stack trace asynchronously, it adds very little latency. Upping the span_frames_min_duration (performance) or disabling stack trace collection altogether can gain you a bit of performance if needed.

Recording headers and cookiesedit

By default, the Java agent records all request and response headers, including cookies. Disabling capture_headers (performance) can save allocations, network bandwidth, and disk space.

Circuit Breaker (experimental)edit

When enabled, the agent periodically polls stress monitors to detect system/process/JVM stress state. If ANY of the monitors detects a stress indication, the agent will become inactive, as if the recording configuration option has been set to false, thus reducing resource consumption to a minimum. When inactive, the agent continues polling the same monitors in order to detect whether the stress state has been relieved. If ALL monitors approve that the system/process/JVM is not under stress anymore, the agent will resume and become fully functional. For fine-grained Circuit Breaker configurations please refer to Circuit-Breaker configuration options.