Common problemsedit

This section describes common problems you might encounter with APM Server.

No data is indexededit

If no data shows up in Elasticsearch, first check that the APM components are properly connected.

To ensure that APM Server configuration is valid and it can connect to the configured output, Elasticsearch by default, run the following commands:

apm-server test config
apm-server test output

To see if the agent can connect to the APM Server, send requests to the instrumented service and look for lines containing [request] in the APM Server logs.

If no requests are logged, it might be that SSL is misconfigured or that the host is wrong. Particularly, if you are using Docker, ensure to bind to the right interface (for example, set apm-server.host = 0.0.0.0:8200 to match any IP) and set the SERVER_URL setting in the agent accordingly.

If you see requests coming through the APM Server but they are not accepted (response code other than 202), consider the response code to narrow down the possible causes (see sections below).

Another reason for data not showing up is that the agent is not auto-instrumenting something you were expecting, check the agent documentation for details on what is automatically instrumented.

HTTP 400: Data decoding error / Data validation erroredit

The most likely cause for this is that you are using incompatible versions of agent and APM Server. For instance, APM Server 6.2.0 changed the Intake API spec and requires a minimum version of each agent.

View the agent/server compatibility matrix for more information.

HTTP 400: Event too largeedit

Note

Version 6.5 of the APM Server introduced a new intake API (v2). This error only applies to users using v2 of the intake API. You can learn more about this change in the intake API changes documentation.

APM Agents communicate with the APM server by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you should consider increasing the max_event_size setting in the APM Server, and adjusting relevant settings in the agent.

HTTP 401: Invalid tokenedit

The secret token in the request header doesn’t match the configured in the APM Server.

HTTP 403: Forbidden requestedit

Either you are sending requests to a RUM endpoint without RUM enabled, or a request is coming from an origin not whitelisted in apm-server.rum.allow_origins. See the RUM configuration.

HTTP 413: Request body too largeedit

Note

Version 6.5 of the APM Server introduced a new intake API (v2). This error only applies to users using v1 of the intake API. You can learn more about this change in the intake API changes documentation.

The agent is collecting too much data and sending it all at once. Consider increasing the apm-server.max_unzipped_size setting in the APM Server, and adjusting relevant settings in the agent.

HTTP 503: Queue is fulledit

APM Server has an internal queue that helps to:

  • Buffer data temporarily if Elasticsearch is intermittently unavailable
  • Handle sudden large spikes of data
  • Send documents to Elasticsearch in bulk, instead of individually

When the queue has reached the maximum size, APM Server returns an HTTP 503 status with the message "Queue is full".

In v1 of the intake API, a full queue generally means that the agents collect more data than APM server is able to process. This might happen when APM Server is not configured properly for the size of your Elasticsearch cluster, or because your Elasticsearch cluster is underpowered or not configured properly for the given workload.

The queue can also fill up if Elasticsearch runs out of disk space.

If the APM Server only returns 503 responses, it indicates that an Elasticsearch disk might be full. If the APM Server returns interleaved 503 and 202 responses, it indicates that the APM Server can’t process that much data.

You have a few options to solve this problem:

HTTP 503: Request timed out waiting to be processededit

This happens when APM Server exceeds the maximum number of requests that it can process concurrently. This limit is determined by the apm-server.concurrent_requests configuration parameter [6.5] Deprecated in 6.5. .

To alleviate this problem, you can try to:

SSL client fails to connectedit

The target host running might be unreachable or the certificate may not be valid. To resolve your issue:

  • Make sure that server process on the target host is running and you can connect to it. First, try to ping the target host to verify that you can reach it from the host running APM Server. Then use either nc or telnet to make sure that the port is available. For example:

    ping <hostname or IP>
    telnet <hostname or IP> 5044
  • Verify that the certificate is valid and that the hostname and IP match.
  • Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.

Common SSL-Related Errors and Resolutionsedit

Here are some common errors and ways to fix them:

x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANsedit

This happens because your certificate is only valid for the hostname present in the Subject field.

To resolve this problem, try one of these solutions:

  • Create a DNS entry for the hostname mapping it to the server’s IP.
  • Create an entry in /etc/hosts for the hostname. Or on Windows add an entry to C:\Windows\System32\drivers\etc\hosts.
  • Re-create the server certificate and add a SubjectAltName (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.
getsockopt: no route to hostedit

This is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.

getsockopt: connection refusededit

This is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.

No connection could be made because the target machine actively refused itedit

A firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.

Field limit exceedededit

When adding too many distinct tag keys on a transaction or span, you risk creating a mapping explosion.

For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag values with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum.

The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again.

In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to Elasticsearch. However, the APM server and Elasticsearch log a warning like this:

{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in index [apm-7.0.0-transaction-2017.05.30] has been exceeded\"}

I/O Timeoutedit

I/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.

You may see an error like the one below in the agent logs, and/or a similar error on the APM Server side:

[ElasticAPM] APM Server responded with an error:
"read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"

To fix this, ensure timeouts are incrementing from the APM Agent, through your load balancer, to the APM Server.

By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 30 seconds. Your load balancer should be set somewhere between these numbers.

For example:

APM Agent --> Load Balancer  --> APM Server
   10s            15s               30s