Metricsedit
Metrics measure the state of a system by gathering information on a regular interval. There are two types of APM metrics:
- System metrics: Basic infrastructure and application metrics.
- Calculated metrics: Aggregated trace event metrics used to power visualizations in the APM app.
System metricsedit
APM agents automatically pick up basic host-level metrics, including system and process-level CPU and memory metrics. Agent specific metrics are also available, like JVM metrics in the Java Agent, and Go runtime metrics in the Go Agent.
Infrastructure and application metrics are important sources of information when debugging production systems, which is why we’ve made it easy to filter metrics for specific hosts or containers in the Kibana metrics overview.
Metrics have the processor.event
property set to metric
.
Most agents limit keyword fields (e.g. processor.event
) to 1024 characters,
non-keyword fields (e.g. system.memory.total
) to 10,000 characters.
Metrics are stored in metric indices.
For a full list of tracked metrics, see the relevant agent documentation:
Calculated metricsedit
APM agents and APM Server calculate metrics from trace events to power visualizations in the APM app. These metrics are described below.
Breakdown metricsedit
To power the Time spent by span type graph, agents collect summarized metrics about the timings of spans and transactions, broken down by span type.
-
span.self_time.count
andspan.self_time.sum.us
-
These metrics measure the "self-time" for a span type, and optional subtype, within a transaction group. Together these metrics can be used to calculate the average duration and percentage of time spent on each type of operation within a transaction group.
These metric documents can be identified by searching for
metricset.name: span_breakdown
.You can filter and group by these dimensions:
-
transaction.name
: The name of the enclosing transaction group, for exampleGET /
-
transaction.type
: The type of the enclosing transaction, for examplerequest
-
span.type
: The type of the span, for exampleapp
,template
ordb
-
span.subtype
: The sub-type of the span, for examplemysql
(optional)
-
Transaction metricsedit
To power APM app visualizations, APM Server aggregates transaction events into latency distribution metrics.
-
transaction.duration.histogram
-
This metric measures the latency distribution of transaction groups, used to power visualizations and analytics in Elastic APM.
These metric documents can be identified by searching for
metricset.name: transaction
.You can filter and group by these dimensions (some of which are optional, for example
container.id
):-
transaction.name
: The name of the transaction, for exampleGET /
-
transaction.type
: The type of the transaction, for examplerequest
-
transaction.result
: The result of the transaction, for exampleHTTP 2xx
-
transaction.root
: A boolean flag indicating whether the transaction is the root of a trace -
metricset.interval
: A string with the aggregation interval the metricset represents. -
event.outcome
: The outcome of the transaction, for examplesuccess
-
agent.name
: The name of the APM agent that instrumented the transaction, for examplejava
-
service.name
: The name of the service that served the transaction -
service.version
: The version of the service that served the transaction -
service.node.name
: The name of the service instance that served the transaction -
service.environment
: The environment of the service that served the transaction -
service.language.name
: The language name of the service that served the transaction, for exampleGo
-
service.language.version
: The language version of the service that served the transaction -
service.runtime.name
: The runtime name of the service that served the transaction, for examplejRuby
-
service.runtime.version
: The runtime version that served the transaction -
host.hostname
: The hostname of the service that served the transaction -
host.os.platform
: The platform name of the service that served the transaction, for examplelinux
-
container.id
: The container ID of the service that served the transaction -
kubernetes.pod.name
: The name of the Kubernetes pod running the service that served the transaction -
cloud.provider
: The cloud provider hosting the service instance that served the transaction -
cloud.region
: The cloud region hosting the service instance that served the transaction -
cloud.availability_zone
: The cloud availability zone hosting the service instance that served the transaction -
cloud.account.id
: The cloud account id of the service that served the transaction -
cloud.account.name
: The cloud account name of the service that served the transaction -
cloud.machine.type
: The cloud machine type or instance type of the service that served the transaction -
cloud.project.id
: The cloud project identifier of the service that served the transaction -
cloud.project.name
: The cloud project name of the service that served the transaction -
cloud.service.name
: The cloud service name of the service that served the transaction -
faas.coldstart
: Whether the serverless service that served the transaction had a cold start -
faas.trigger.type
: The trigger type that the lambda function was executed by of the service that served the transaction -
faas.id
: The unique identifier of the invoked serverless function -
faas.name
: The name of the lambda function -
faas.version
: The version of the lambda function -
labels
: Key-value object containing string labels set globally by the APM agents. -
numeric_labels
: Key-value object containing numeric labels set globally by the APM agents.
-
The @timestamp
field of these documents holds the start of the aggregation interval.
Service-destination metricsedit
To power APM app visualizations, APM Server aggregates span events into service-destination metrics.
-
span.destination.service.response_time.count
andspan.destination.service.response_time.sum.us
-
These metrics measure the count and total duration of requests from one service to another service. These are used to calculate the throughput and latency of requests to backend services such as databases in Service maps.
These metric documents can be identified by searching for
metricset.name: service_destination
.You can filter and group by these dimensions:
-
span.destination.service.resource
: The destination service resource, for examplemysql
-
span.name
: The name of the operation, for exampleSELECT FROM table_name
. -
event.outcome
: The outcome of the operation, for examplesuccess
-
agent.name
: The name of the APM agent that instrumented the operation, for examplejava
-
service.name
: The name of the service that made the request -
service.environment
: The environment of the service that made the request -
service.target.name
: The target service name, for examplecustomer_db
-
service.target.type
: The target service type, for examplemysql
-
metricset.interval
: A string with the aggregation interval the metricset represents. -
labels
: Key-value object containing string labels set globally by the APM agents. -
numeric_labels
: Key-value object containing numeric labels set globally by the APM agents.
-
The @timestamp
field of these documents holds the start of the aggregation interval.
Data streamsedit
Metrics are stored in the following data streams:
-
APM internal metrics:
metrics-apm.internal-<namespace>
-
APM transaction metrics:
metrics-apm.transaction.<metricset.interval>-<namespace>
-
APM service destination metrics:
metrics-apm.service_destination.<metricset.interval>-<namespace>
-
APM service transaction metrics:
metrics-apm.service_transaction.<metricset.interval>-<namespace>
-
APM service summary metrics:
metrics-apm.service_summary.<metricset.interval>-<namespace>
-
Application metrics:
metrics-apm.app.<service.name>-<namespace>
See Data streams to learn more.
Example metric documentedit
This example shows what metric documents can look like when indexed in Elasticsearch.
Expand Elasticsearch document
This example contains JVM metrics produced by the Elastic APM Java agent.
and contains two related metrics: jvm.gc.time
and jvm.gc.count
. These are accompanied by various fields describing
the environment in which the metrics were captured: service name, host name, Kubernetes pod UID, container ID, process ID, and more.
These fields make it possible to search and aggregate across various dimensions, such as by service, host, and Kubernetes pod.
{ "container": { "id": "a47ed147c6ee269400f7ea4e296b3d01ec7398471bb2951907e4ea12f028bc69" }, "kubernetes": { "pod": { "uid": "b0cb3baa-4619-4b82-bef5-84cc87b5f853", "name": "opbeans-java-7c68f48dc6-n6mzc" } }, "process": { "pid": 8, "title": "/opt/java/openjdk/bin/java", "parent": { "pid": 1 }, }, "agent": { "name": "java", "ephemeral_id": "29a27947-ed3a-4d87-b2e6-28f7a940ec2d", "version": "1.25.1-SNAPSHOT.UNKNOWN" }, "jvm.gc.time": 11511, "processor": { "name": "metric", "event": "metric" }, "labels": { "name": "Copy" }, "metricset.name": "app", "observer": { "hostname": "3c5ac040e8f9", "name": "instance-0000000002", "type": "apm-server", "version": "7.15.0" }, "@timestamp": "2021-09-14T09:52:49.454Z", "ecs": { "version": "1.11.0" }, "service": { "node": { "name": "a47ed147c6ee269400f7ea4e296b3d01ec7398471bb2951907e4ea12f028bc69" }, "environment": "production", "name": "opbeans-java", "runtime": { "name": "Java", "version": "11.0.11" }, "language": { "name": "Java", "version": "11.0.11" }, "version": "2021-09-08 03:55:06" }, "jvm.gc.count": 2224, "host": { "os": { "platform": "Linux" }, "ip": ["35.240.52.17"], "architecture": "amd64" }, "event": { "ingested": "2021-09-14T09:53:00.834276431Z" } }