Troubleshoot endpoints

This topic covers common troubleshooting issues when using Elastic Security endpoint management tools.

Endpoints

In some cases, an Unhealthy Elastic Agent status may be caused by a failure in the Elastic Defend integration policy. In this situation, the integration and any failing features are flagged on the agent details page in Fleet. Expand each section and subsection to display individual responses from the agent.

Tip

Integration policy response information is also available from the Endpoints page in the Elastic Security app (AssetsEndpoints, then click the link in the Policy status column).

Common causes of failure in the Elastic Defend integration policy include missing prerequisites or unexpected system configuration. Consult the following topics to resolve a specific error:

Tip

If the Elastic Defend integration policy is not the cause of the Unhealthy agent status, refer to Fleet troubleshooting for help with the Elastic Agent.

If you have an Unhealthy Elastic Agent status with the message Disabled due to potential system deadlock, that means malware protection was disabled on the Elastic Defend integration policy due to errors while monitoring a Linux host.

You can resolve the issue by configuring the policy's advanced settings related to fanotify, a Linux feature that monitors file system events. By default, Elastic Defend works with fanotify to monitor specific file system types that Elastic has tested for compatibility, and ignores other unknown file system types.

If your network includes nonstandard, proprietary, or otherwise unrecognized Linux file systems that cause errors while being monitored, you can configure Elastic Defend to ignore those file systems. This allows Elastic Defend to resume monitoring and protecting the hosts on the integration policy.

Caution

Ignoring file systems can create gaps in your security coverage. Use additional security layers for any file systems ignored by Elastic Defend.

To resolve the potential system deadlock error:

  1. Go to AssetsPolicies, then click a policy's name.

  2. Scroll to the bottom of the policy and click Show advanced settings.

  3. In the setting linux.advanced.fanotify.ignored_filesystems, enter a comma-separated list of file system names to ignore, as they appear in /proc/filesystems (for example: ext4,tmpfs). Refer to Find file system names for more on determining the file system names.

  4. Click Save.

    Once you save the policy, malware protection is re-enabled.

If you encounter a “Required transform failed” notice on the Endpoints page, you can usually resolve the issue by restarting the transform. Refer to Transforming data for more information about transforms.

To restart a transform that’s not running:

  1. Go to Project settingsManagementTransforms.

  2. Enter endpoint.metadata in the search box to find the transforms for Elastic Defend.

  3. Click the Actions menu () and do one of the following for each transform, depending on the value in the Status column:

    • stopped: Select Start to restart the transform.
    • failed: Select Stop to first stop the transform, and then select Start to restart it.

  4. On the confirmation message that displays, click Start to restart the transform.

  5. The transform’s status changes to started. If it doesn't change, refresh the page.

After Elastic Agent installs Endpoint, Endpoint connects to Elastic Agent over a local relay connection to report its health status and receive policy updates and response action requests. If that connection cannot be established, the Elastic Defend integration will cause Elastic Agent to be in an Unhealthy status, and Endpoint won't operate properly.

Identify if the issue is happening

You can identify if this issue is happening in the following ways:

  • Run Elastic Agent's status command:

    • sudo /opt/Elastic/Agent/elastic-agent status (Linux)
    • sudo /Library/Elastic/Agent/elastic-agent status (macOS)
    • c:\Program Files\Elastic\Agent\elastic-agent.exe status (Windows)

    If the status result for endpoint-security says that Endpoint has missed check-ins or localhost:6788 cannot be bound to, it might indicate this problem is occurring.

  • If the problem starts happening right after installing Endpoint, check the value of fleet.agent.id in the following file:

    • /opt/Elastic/Endpoint/elastic-endpoint.yaml (Linux)
    • /Library/Elastic/Endpoint/elastic-endpoint.yaml (macOS)
    • c:\Program Files\Elastic\Endpoint\elastic-endpoint.yaml (Windows)

    If the value of fleet.agent.id is 00000000-0000-0000-0000-000000000000, this indicates this problem is occurring.

    Note

    If this problem starts happening after Endpoint has already been installed and working properly, then this value will have changed even though the problem is happening.

Examine Endpoint logs

If you've confirmed that the issue is happening, you can look at Endpoint log messages to identify the cause:

  • Failed to find connection to validate. Is Agent listening on 127.0.0.1:6788? or Failed to validate connection. Is Agent running as root/admin? means that Endpoint is not able to create an initial connection to Elastic Agent over port 6788.
  • Unable to make GRPC connection in deadline(60s). Fetching connection info again means that Endpoint's original connection to Elastic Agent over port 6788 worked, but the connection over port 6789 is failing.

Resolve the issue

To debug and resolve the issue, follow these steps:

  1. Examine the Endpoint diagnostics file named analysis.txt, which contains information about what may cause this issue. Elastic Agent diagnostics automatically include Endpoint diagnostics.

  2. Make sure nothing else on your device is listening on ports 6788 or 6789 by running:

    • sudo netstat -anp --tcp (Linux)
    • sudo netstat -an -f inet (macOS)
    • netstat -an (Windows)
  3. Make sure localhost can be resolved to 127.0.0.1 by running:

    • ping -4 -c 1 localhost (Linux)
    • ping -c 1 localhost (macOS)
    • ping -4 localhost (Windows)

On this page