Common problemsedit

This documentation refers to the standalone (legacy) method of running APM Server. This method of running APM Server will be deprecated and removed in a future release. Please consider upgrading to the Elastic APM integration.

This section describes common problems you might encounter with APM Server.

No data is indexededit

If no data shows up in Elasticsearch, first check that the APM components are properly connected.

To ensure that APM Server configuration is valid and it can connect to the configured output, Elasticsearch by default, run the following commands:

apm-server test config
apm-server test output

To see if the agent can connect to the APM Server, send requests to the instrumented service and look for lines containing [request] in the APM Server logs.

If no requests are logged, it might be that SSL is misconfigured or that the host is wrong. Particularly, if you are using Docker, ensure to bind to the right interface (for example, set apm-server.host = 0.0.0.0:8200 to match any IP) and set the SERVER_URL setting in the agent accordingly.

If you see requests coming through the APM Server but they are not accepted (response code other than 202), consider the response code to narrow down the possible causes (see sections below).

Another reason for data not showing up is that the agent is not auto-instrumenting something you were expecting, check the agent documentation for details on what is automatically instrumented.

APM Server currently relies on Elasticsearch to create indices that do not exist. As a result, Elasticsearch must be configured to allow automatic index creation for APM indices.

Data is indexed but doesn’t appear in the APM UIedit

The APM app relies on index mappings to query and display data. If your APM data isn’t showing up in the APM app, but is elsewhere in Kibana, like the Discover app, you may have a missing index mapping.

You can determine if a field was mapped correctly with the _mapping API. For example, run the following command in the Kibana console. This will display the field data type of the service.name field.

GET apm-*/_mapping/field/service.name

If the mapping.name.type is "text", your APM indices were not set up correctly.

"mappings" : {
   "service.name" : {
      "full_name" : "service.name",
      "mapping" : {
         "name" : {
            "type" : "text", 
            "fields" : {
               "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
               }
            }
         }
      }
   }
}

The service.name mapping.name.type would be "keyword" if this field had been set up correctly.

To fix this problem, you must delete and recreate your APM indices as index templates cannot be applied retroactively.

  1. Stop your APM Server(s) so they are not writing any new documents.
  2. Delete your existing apm-* indices. In the Kibana console, run:

    DELETE apm-*

    Alternatively, you can use the Index Management page in Kibana. Select all apm-* index templates and navigate to Manage Indices > Delete Indices.

  3. Starting in version 8.0.0, Fleet uses the APM integration to set up and manage APM index templates. Install the APM integration by following these steps:

    An internet connection is required to install the APM integration. If your environment has network traffic restrictions, there are ways to work around this requirement. See Air-gapped environments for more information.

    1. Open Kibana and select Add integrations > Elastic APM.
    2. Click APM integration.
    3. Click Add Elastic APM.
    4. Click Save and continue.
    5. Click Add Elastic Agent later. You do not need to run an Elastic Agent to complete the setup.
  4. Start APM Server.
  5. Verify the correct index templates were installed. In the Kibana console, run:

    GET _template/apm-*

    Alternatively, you can use the Index Management page in Kibana. On the Index Templates tab, search for apm under Legacy Index Templates.

HTTP 400: Data decoding error / Data validation erroredit

The most likely cause for this error is using incompatible versions of APM agent and APM Server. See the agent/server compatibility matrix for more information.

HTTP 400: Event too largeedit

APM agents communicate with the APM server by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you should consider increasing the max_event_size setting in the APM Server, and adjusting relevant settings in the agent.

HTTP 401: Invalid tokenedit

The secret token in the request header doesn’t match the configured in the APM Server.

HTTP 403: Forbidden requestedit

Either you are sending requests to a RUM endpoint without RUM enabled, or a request is coming from an origin not specified in apm-server.rum.allow_origins. See the RUM configuration.

HTTP 503: Request timed out waiting to be processededit

This happens when APM Server exceeds the maximum number of requests that it can process concurrently.

To alleviate this problem, you can try to:

SSL client fails to connectedit

The target host running might be unreachable or the certificate may not be valid. To resolve your issue:

  • Make sure that the APM Server process on the target host is running and you can connect to it. First, try to ping the target host to verify that you can reach it from the host running APM Server. Then use either nc or telnet to make sure that the port is available. For example:

    ping <hostname or IP>
    telnet <hostname or IP> 5044
  • Verify that the certificate is valid and that the hostname and IP match.
  • Use OpenSSL to test connectivity to the target server and diagnose problems. See the OpenSSL documentation for more info.
Common SSL-Related Errors and Resolutionsedit

Here are some common errors and ways to fix them:

x509: cannot validate certificate for <IP address> because it doesn’t contain any IP SANsedit

This happens because your certificate is only valid for the hostname present in the Subject field.

To resolve this problem, try one of these solutions:

  • Create a DNS entry for the hostname, mapping it to the server’s IP.
  • Create an entry in /etc/hosts for the hostname. Or, on Windows, add an entry to C:\Windows\System32\drivers\etc\hosts.
  • Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address.
getsockopt: no route to hostedit

This is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate.

getsockopt: connection refusededit

This is not an SSL problem. Make sure that Logstash is running and that there is no firewall blocking the traffic.

No connection could be made because the target machine actively refused itedit

A firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host.

Field limit exceedededit

When adding too many distinct tag keys on a transaction or span, you risk creating a mapping explosion.

For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag values with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum.

The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again.

In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to Elasticsearch. However, the APM server and Elasticsearch log a warning like this:

{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in index [apm-7.0.0-transaction-2017.05.30] has been exceeded\"}

I/O Timeoutedit

I/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer.

You may see an error like the one below in the agent logs, and/or a similar error on the APM Server side:

[ElasticAPM] APM Server responded with an error:
"read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout"

To fix this, ensure timeouts are incrementing from the APM agent, through your load balancer, to the APM Server.

By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 30 seconds. Your load balancer should be set somewhere between these numbers.

For example:

APM agent --> Load Balancer  --> APM Server
   10s            15s               30s

What happens when APM Server or Elasticsearch is down?edit

If Elasticsearch is down

APM Server does not have an internal queue to buffer requests, but instead leverages an HTTP request timeout to act as back-pressure. If Elasticsearch goes down, the APM Server will eventually deny incoming requests. Both the APM Server and APM agent(s) will issue logs accordingly.

If APM Server is down

Some agents have internal queues or buffers that will temporarily store data if the APM Server goes down. As a general rule of thumb, queues fill up quickly. Assume data will be lost if APM Server goes down. Adjusting these queues/buffers can increase the agent’s overhead, so use caution when updating default values.

  • Go agent - Circular buffer with configurable size: ELASTIC_APM_BUFFER_SIZE.
  • Java agent - Internal buffer with configurable size: max_queue_size.
  • Node.js agent - No internal queue. Data is lost.
  • PHP agent - No internal queue. Data is lost.
  • Python agent - Internal Transaction queue with configurable size and time between flushes.
  • Ruby agent - Internal queue with configurable size: api_buffer_size.
  • RUM agent - No internal queue. Data is lost.
  • .NET agent - No internal queue. Data is lost.

/api/apm/settings/agent-configuration/search errorsedit

If you’re instrumenting and starting a lot of services at the same time or using a very large number of service or environment names, you may see the following APM Server logs related to APM agent central configuration:

  • .../api/apm/settings/agent-configuration/search: context canceled
  • .../api/apm/settings/agent-configuration/search: net/http: TLS handshake timeout

There are two possible causes:

  1. Kibana is overwhelmed by the number of requests coming from APM Server.
  2. Elasticsearch can’t reply quickly enough to Kibana.

For cause #1, try one or more of the following:

For cause #2, investigate why Elasticsearch is not responding in a timely manner. Kibana’s queries to Elasticsearch are simple, so it may just be that Elasticsearch is unhealthy. If that’s not the problem, you may need to use Search Slow Log to investigate your Elasticsearch logs.

To avoid this problem entirely, we recommend upgrading to the Elastic APM integration.