Troubleshooting EDOT Cloud Forwarder for Azure

This page helps you diagnose and resolve issues with EDOT Cloud Forwarder for Azure when data is not being forwarded to Elasticsearch as expected. To do so, EDOT Cloud Forwarder for Azure relies on native Azure platform features, specifically Azure Monitor and Application Insights.

Key indicators for identifying failures

The following metrics are crucial for diagnosing issues, as they focus on identifying failures or possible performance issues in the processing pipeline:

Indicator	Source	Why it matters
IncomingMessages and OutgoingMessages	Event Hub	A significant gap between the number of incoming and outgoing messages indicates the Function App is not consuming messages at the expected rate.
Function Execution Failures	Application Insights > Metrics	Non-zero values indicate the Function App is failing to process messages.
AverageMemoryWorkingSet	Function App	High usage precedes OOM errors, signaling resource limits or memory leaks before they cause failures.
Failures view	Application Insights > Investigate	A rising error rate points to application-level defects that need immediate attention.
Error container blobs	Storage account	Presence of `logs-error-container` or `metrics-error-container` containers confirms that deliveries to Elasticsearch are failing.

General troubleshooting

As a first step, review the logs and metrics provided by the Azure platform. The following EDOT Cloud Forwarder components provide useful information about data processing.

Event Hub

Go to your EDOT Cloud Forwarder resource group and find the Event Hubs Namespace resource to verify the health of the entry point:

Throughput: Compare the number of Incoming Messages with Outgoing Messages. A significant gap suggests the Function App is not consuming messages at the expected rate.
Throttling: Monitor Throttled Requests. If you hit your Throughput Units (TUs), Event Hub will reject incoming telemetry from your sources.

Function App

Navigate to your EDOT Cloud Forwarder resource group and find the Function App resource.

Invocations: Go to Functions > [Name] > Invocations to verify if the trigger is firing. Here you can see overall counters of successful and unsuccessful invocations. To access data about the function execution over time, check the Metrics tab.

Note

This view can have a reporting delay. Use this for auditing past trends rather than real-time troubleshooting. For immediate feedback, use Live Metrics.
Live Metrics: Located under Monitoring > Application Insights > Investigate > Live Metrics. Use this for real-time debugging during a suspected outage. It shows instantaneous CPU/Memory spikes, dependency failures, incoming request rate, duration, and number of failed requests.
Failures: Located under Investigate > Failures in Application Insights. It provides a list of all failed operations with the option to drill into each one.
Logs: Located under Logs in Application Insights. This provides a searchable history of all traces and exceptions across every instance of your Function App.

Storage account

Navigate to your EDOT Cloud Forwarder resource group and find the Storage account resource.

If EDOT Cloud Forwarder is unable to process or deliver signals to Elasticsearch, it persists the batch as a blob in logs-error-container or metrics-error-container containers. These containers are created only upon the first failure.

Note

EDOT Cloud Forwarder does not yet provide built-in tools to reprocess these messages automatically.

Although the Blob Count metric exists, its utility is limited because it aggregates hourly.

Telemetry data not appearing in Elasticsearch

This section helps you diagnose and resolve issues when EDOT Cloud Forwarder for Azure is deployed but telemetry data is not appearing in Elasticsearch.

Symptoms

Logs or metrics are not appearing in Elasticsearch or Kibana dashboards.
The storage account contains unprocessed messages in error containers (logs-error-container or metrics-error-container).
Function App execution logs in Application Insights show errors.
Function App Invocations > Error count is non-zero or increasing.
The Event Hub metrics show messages are being received but not processed.

Diagnosis

Using the key indicators and general troubleshooting guidance above, check the following to identify the root cause:

Verify Event Hub is receiving data: Check the Incoming Messages metric. If no messages are arriving, the issue is with the diagnostic setting or the Data Collection Rule configuration.
Compare Incoming and Outgoing Messages: A significant gap between the number of incoming and outgoing messages confirms the Function App is not consuming messages at the expected rate.
Check Function App logs: Review execution logs in Application Insights for errors related to authentication, network connectivity, or data processing.
Inspect error containers: Check if failed messages are accumulating in logs-error-container or metrics-error-container.

Resolution

Verify the OTLP endpoint configuration

Confirm that the ELASTICSEARCH_OTLP_ENDPOINT environment variable is correct and the endpoint is accessible from the Function App.
1. In the Azure portal, navigate to the Function App.
2. Go to Settings → Environment variables.
3. Verify the OTLP endpoint value matches your Elastic Cloud Serverless or Elastic Cloud Hosted endpoint.
Verify the API key

Confirm that the ELASTICSEARCH_API_KEY is valid and not expired.
1. Check that the API key was trimmed correctly (only MYKEYVALUE..., not Authorization=ApiKey MYKEYVALUE...).
2. Verify the API key has not expired in your Elastic Cloud deployment or Serverless project.
Verify the diagnostic setting or Data Collection Rule

Confirm the telemetry source is correctly configured to stream to the Event Hub.
1. For Activity logs, verify the diagnostic setting is active and streaming to the correct Event Hub namespace and logs event hub.
2. Ensure the Event Hub namespace is in the same region as the resources generating the telemetry.
Verify Event Hub metrics

Confirm the Event Hub is successfully passing data to the Function App and not showing throughput or throttling issues.
1. Compare the number of Incoming Messages and Outgoing Messages. They should be the same or very close. If there is a significant gap, the Function App is not consuming messages at the expected rate. Continue to the next steps.
2. Check for Throttled Requests that may indicate the Event Hub is rejecting incoming telemetry.
Check Function App invocations and metrics

Confirm there are no failing invocations.
1. In the Azure portal, navigate to the Function App.
2. Open the logs function and go to Invocations.
3. Check the Error counter and review invocation details for recent failures.
Note

This view might have a significant reporting delay. If you are currently deploying changes or investigating an active incident, proceed to the next step for real-time data.
Check Application Insights failures and Live Metrics

Confirm there are no errors or exceptions, and monitor real-time health.
1. In the Azure portal, navigate to the Function App > Monitoring > Application Insights > Investigate.
2. Go to Live Metrics to see incoming request rate, duration, and number of failed requests in real time. Watch the memory consumption, as high usage may precede OOM errors.
3. Go to Failures to confirm that no errors or exceptions are occurring.
Review Function App logs

Check Application Insights for specific error messages.
1. In the Azure portal, navigate to the Application Insights resource created by the deployment.
2. Go to Logs and query for Function App logs.
3. Look for authentication failures, network errors, or data processing issues.
The following KQL query helps structure logs for easier analysis:
KQL query to structure Function App logs
traces | extend parsed = parse_json(message) | extend logMessage = coalesce(tostring(parsed.message), message) | extend caller = parsed.caller | extend componentId = parsed.["otelcol.component.id"] | extend componentKind = parsed.["otelcol.component.kind"] | extend stackTrace = parsed.stacktrace | project timestamp, severityLevel, logMessage, caller, componentId, componentKind, stackTrace
To quickly identify errors and warnings, use this query:
KQL query to filter non-informational logs
traces | where severityLevel != 1 | project timestamp, severityLevel, message | order by timestamp asc
If you need to be more specific, you can filter for particular symptoms. For example, use the following query to check for 503 Service Unavailable errors:
KQL query to filter 503 errors
traces | where severityLevel != 1 and message has "503" | project timestamp, severityLevel, message | order by timestamp asc