Metricsedit

The Java agent tracks certain system and application metrics. Some of them have built-in visualizations and some can only be visualized with custom Kibana dashboards.

These metrics will be sent regularly to the APM Server and from there to Elasticsearch. You can adjust the interval with the setting metrics_interval.

The metrics will be stored in the apm-* index and have the processor.event property set to metric.

Dedicated JVM metrics views are available since Elastic stack version 7.2. Starting in 7.5, metrics are aggregated separately for each JVM, relying on the ID of the underlying system — either container ID (where applicable) or hostname. Starting in Java agent version 1.11.0, it is possible to manually configure a unique name for each service node/JVM through service_node_name. When multiple JVMs are running on the same host and report data for the same service, this configuration is required in order to be able to view metrics at the JVM level.

System metricsedit

Host metrics. As of version 6.6, these metrics will be visualized in the APM app.

For more system metrics, consider installing metricbeat on your hosts.

system.cpu.total.norm.pct

type: scaled_float

format: percent

The percentage of CPU time in states other than Idle and IOWait, normalised by the number of cores.

system.process.cpu.total.norm.pct

type: scaled_float

format: percent

The percentage of CPU time spent by the process since the last event. This value is normalized by the number of CPU cores and it ranges from 0 to 100%.

system.memory.total

type: long

format: bytes

Total memory.

system.memory.actual.free

type: long

format: bytes

Actual free memory in bytes. It is calculated based on the OS. On Linux it consists of the free memory plus caches and buffers. On OSX it is a sum of free memory and the inactive memory. On Windows, this value does not include memory consumed by system caches and buffers.

system.process.memory.size

type: long

format: bytes

The total virtual memory the process has.

cgroup metrics (added in 1.18.0)edit

Linux’s cgroup metrics.

system.process.cgroup.memory.mem.limit.bytes

type: long

format: bytes

Memory limit for current cgroup slice.

system.process.cgroup.memory.mem.usage.bytes

type: long

format: bytes

Memory usage in current cgroup slice.

JVM Metricsedit

JVM-specific metrics

jvm.memory.heap.used

type: long

format: bytes

The amount of used heap memory in bytes

jvm.memory.heap.committed

type: long

format: bytes

The amount of heap memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.

jvm.memory.heap.max

type: long

format: bytes

The maximum amount of heap memory in bytes that can be used for memory management. If the maximum memory size is undefined, the value is -1.

jvm.memory.non_heap.used

type: long

format: bytes

The amount of used non-heap memory in bytes

jvm.memory.non_heap.committed

type: long

format: bytes

The amount of non-heap memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.

jvm.memory.non_heap.max

type: long

format: bytes

The maximum amount of non-heap memory in bytes that can be used for memory management. If the maximum memory size is undefined, the value is -1.

jvm.thread.count

type: int

The current number of live threads in the JVM, including both daemon and non-daemon threads.

jvm.gc.count

type: long

labels

  • name: The name representing this memory manager (for example G1 Young Generation, G1 Old Generation)

The total number of collections that have occurred.

jvm.gc.time

type: long

format: ms

labels

  • name: The name representing this memory manager (for example G1 Young Generation, G1 Old Generation)

The approximate accumulated collection elapsed time in milliseconds.

jvm.gc.alloc

type: long

format: bytes

An approximation of the total amount of memory, in bytes, allocated in heap memory.

Built-in application metricsedit

To power the Time spent by span type graph, the agent collects summarized metrics about the timings of spans and transactions, broken down by span type.

transaction.duration

type: simple timer

This timer tracks the duration of transactions and allows for the creation of graphs displaying a weighted average.

Fields:

  • sum.us: The sum of all transaction durations in ms since the last report (the delta)
  • count: The count of all transactions since the last report (the delta)

You can filter and group by these dimensions:

  • transaction.name: The name of the transaction
  • transaction.type: The type of the transaction, for example request
transaction.breakdown.count

type: long

format: count (delta)

The number of transactions for which breakdown metrics (span.self_time) have been created. As the Java agent tracks the breakdown for both sampled and non-sampled transactions, this metric is equivalent to transaction.duration.count

You can filter and group by these dimensions:

  • transaction.name: The name of the transaction
  • transaction.type: The type of the transaction, for example request
span.self_time

type: simple timer

This timer tracks the span self-times and is the basis of the transaction breakdown visualization.

Fields:

  • sum.us: The sum of all span self-times in ms since the last report (the delta)
  • count: The count of all span self-times since the last report (the delta)

You can filter and group by these dimensions:

  • transaction.name: The name of the transaction
  • transaction.type: The type of the transaction, for example request
  • span.type: The type of the span, for example app, template or db
  • span.subtype: The sub-type of the span, for example mysql (optional)

Custom metrics using Micrometeredit

The Elastic APM Java agent lets you use the popular metrics collection framework Micrometer to track custom application metrics.

Some use cases for tracking custom metrics from your application include monitoring performance-related things like cache statistics, thread pools, or page hits. However, you can also track business-related metrics such as revenue and correlate them with performance metrics. Metrics registered to a Micrometer MeterRegistry are aggregated in memory and reported every metrics_interval. Based on the metadata about the service and the timestamp, you can correlate metrics with traces. The advantage is that the metrics won’t be affected by the sampling rate and that they usually take up less space. That is because not every event is stored individually.

The limitation of tracking metrics is that you won’t be able to attribute a value to a specific transaction. If you’d like to do that, add labels to your transaction instead of tracking the metric with micrometer. The tradeoff here is that you either have to do 100% sampling or account for the missing events. The reason for that is that if you set your sampling rate to 10%, for example, you’ll only be storing one out of 10 requests. The labels you set on non-sampled transactions will be lost.

Get started with existing Micrometer setupedit

You only have to attach the agent, and you’re done. The agent automatically detects all MeterRegistry instances and reports all metrics to APM Server in addition to where they originally report. When attaching the agent after the application has already started, the agent detects a MeterRegistry when calling any public method on it. If you are using multiple registries within a CompoundMeterRegistry, the agent makes sure to only report the metrics once.

Get started from scratchedit

Declare a dependency to Micrometer:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
    <version>${micrometer.version}</version>
</dependency>

Create a Micrometer MeterRegistry.

MeterRegistry registry = new SimpleMeterRegistry(new SimpleConfig() {

        @Override
        public CountingMode mode() {
            // to report the delta since the last report
            // this makes building dashbaords a bit easier
            return CountingMode.STEP;
        }

        @Override
        public Duration step() {
            // the duration should match metrics_interval, which defaults to 30s
            return Duration.ofSeconds(30);
        }

        @Override
        public String get(String key) {
            return null;
        }
    }, Clock.SYSTEM);

When using Spring Boot, you can use the management.metrics.export.simple prefix to configure via application.properties

management.metrics.export.simple.enabled=true
management.metrics.export.simple.step=1m
management.metrics.export.simple.mode=STEP

Supported Metersedit

This section lists all supported Micrometer Meter s and describes how they are mapped to Elasticsearch documents.

Micrometer tags are nested under labels. Example:

"labels": {
  "tagKey1": "tagLabel1",
  "tagKey2": "tagLabel2",
}

Labels are great to break down metrics by different dimensions. Although there is no upper limit, note that a high number of distinct values per label (aka high cardinality) may lead to higher memory usage, higher index sizes, and slower queries. Also, make sure the number of distinct tag keys is limited to avoid mapping explosions.

Timer

Fields:

  • ${name}.sum.us: The total time of recorded events (the delta when using CountingMode.STEP). This is equivalent to timer.totalTime(TimeUnit.MICROSECONDS).
  • ${name}.count: The number of times that stop has been called on this timer (the delta when using CountingMode.STEP). This is equivalent to timer.count().
FunctionTimer

Fields:

  • ${name}.sum.us: The total time of all occurrences of the timed event (the delta when using CountingMode.STEP). This is equivalent to functionTimer.totalTime(TimeUnit.MICROSECONDS).
  • ${name}.count: The total number of occurrences of the timed event (the delta when using CountingMode.STEP). This is equivalent to functionTimer.count().
LongTaskTimer

Fields:

  • ${name}.sum.us: The cumulative duration of all current tasks (the delta when using CountingMode.STEP). This is equivalent to longTaskTimer.totalTime(TimeUnit.MICROSECONDS).
  • ${name}.count: The current number of tasks being executed (the delta when using CountingMode.STEP) This is equivalent to longTaskTimer.activeTasks().
DistributionSummary

Fields:

  • ${name}.sum: The total amount of all recorded events (the delta when using CountingMode.STEP). This is equivalent to distributionSummary.totalAmount().
  • ${name}.count: The number of times that record has been called (the delta when using CountingMode.STEP). This is equivalent to distributionSummary.count().
Gauge

Fields:

  • ${name}: The value of gauge.value().
Counter

Fields:

  • ${name}: The value of counter.count() (the delta when using CountingMode.STEP).
FunctionCounter

Fields:

  • ${name}: The value of functionCounter.count() (the delta when using CountingMode.STEP).