Troubleshooting EDOT Cloud Forwarder for Azure
This page helps you diagnose and resolve issues with EDOT Cloud Forwarder for Azure when data is not being forwarded to Elasticsearch as expected. To do so, EDOT Cloud Forwarder for Azure relies on native Azure platform features, specifically Azure Monitor and Application Insights.
The following metrics are crucial for diagnosing issues, as they focus on identifying failures or possible performance issues in the processing pipeline:
| Indicator | Source | Why it matters |
|---|---|---|
| IncomingMessages and OutgoingMessages | Event Hub | A significant gap between the number of incoming and outgoing messages indicates the Function App is not consuming messages at the expected rate. |
| Function Execution Failures | Application Insights > Metrics | Non-zero values indicate the Function App is failing to process messages. |
| AverageMemoryWorkingSet | Function App | High usage precedes OOM errors, signaling resource limits or memory leaks before they cause failures. |
| Failures view | Application Insights > Investigate | A rising error rate points to application-level defects that need immediate attention. |
| Error container blobs | Storage account | Presence of logs-error-container or metrics-error-container containers confirms that deliveries to Elasticsearch are failing. |
As a first step, review the logs and metrics provided by the Azure platform. The following EDOT Cloud Forwarder components provide useful information about data processing.
Go to your EDOT Cloud Forwarder resource group and find the Event Hubs Namespace resource to verify the health of the entry point:
- Throughput: Compare the number of Incoming Messages with Outgoing Messages. A significant gap suggests the Function App is not consuming messages at the expected rate.
- Throttling: Monitor Throttled Requests. If you hit your Throughput Units (TUs), Event Hub will reject incoming telemetry from your sources.
Navigate to your EDOT Cloud Forwarder resource group and find the Function App resource.
Invocations: Go to Functions > [Name] > Invocations to verify if the trigger is firing. Here you can see overall counters of successful and unsuccessful invocations. To access data about the function execution over time, check the Metrics tab.
NoteThis view can have a reporting delay. Use this for auditing past trends rather than real-time troubleshooting. For immediate feedback, use Live Metrics.
Live Metrics: Located under Monitoring > Application Insights > Investigate > Live Metrics. Use this for real-time debugging during a suspected outage. It shows instantaneous CPU/Memory spikes, dependency failures, incoming request rate, duration, and number of failed requests.
Failures: Located under Investigate > Failures in Application Insights. It provides a list of all failed operations with the option to drill into each one.
Logs: Located under Logs in Application Insights. This provides a searchable history of all traces and exceptions across every instance of your Function App.
Navigate to your EDOT Cloud Forwarder resource group and find the Storage account resource.
If EDOT Cloud Forwarder is unable to process or deliver signals to Elasticsearch, it persists the batch as a blob in logs-error-container or metrics-error-container containers. These containers are created only upon the first failure.
EDOT Cloud Forwarder does not yet provide built-in tools to reprocess these messages automatically.
Although the Blob Count metric exists, its utility is limited because it aggregates hourly.
This section helps you diagnose and resolve issues when EDOT Cloud Forwarder for Azure is deployed but telemetry data is not appearing in Elasticsearch.
- Logs or metrics are not appearing in Elasticsearch or Kibana dashboards.
- The storage account contains unprocessed messages in error containers (
logs-error-containerormetrics-error-container). - Function App execution logs in Application Insights show errors.
- Function App Invocations > Error count is non-zero or increasing.
- The Event Hub metrics show messages are being received but not processed.
Using the key indicators and general troubleshooting guidance above, check the following to identify the root cause:
- Verify Event Hub is receiving data: Check the Incoming Messages metric. If no messages are arriving, the issue is with the diagnostic setting or the Data Collection Rule configuration.
- Compare Incoming and Outgoing Messages: A significant gap between the number of incoming and outgoing messages confirms the Function App is not consuming messages at the expected rate.
- Check Function App logs: Review execution logs in Application Insights for errors related to authentication, network connectivity, or data processing.
- Inspect error containers: Check if failed messages are accumulating in
logs-error-containerormetrics-error-container.
-
Verify the OTLP endpoint configuration
Confirm that the
ELASTICSEARCH_OTLP_ENDPOINTenvironment variable is correct and the endpoint is accessible from the Function App.- In the Azure portal, navigate to the Function App.
- Go to Settings → Environment variables.
- Verify the OTLP endpoint value matches your Elastic Cloud Serverless or Elastic Cloud Hosted endpoint.
-
Verify the API key
Confirm that the
ELASTICSEARCH_API_KEYis valid and not expired.- Check that the API key was trimmed correctly (only
MYKEYVALUE..., notAuthorization=ApiKey MYKEYVALUE...). - Verify the API key has not expired in your Elastic Cloud deployment or Serverless project.
- Check that the API key was trimmed correctly (only
-
Verify the diagnostic setting or Data Collection Rule
Confirm the telemetry source is correctly configured to stream to the Event Hub.
- For Activity logs, verify the diagnostic setting is active and streaming to the correct Event Hub namespace and
logsevent hub. - Ensure the Event Hub namespace is in the same region as the resources generating the telemetry.
- For Activity logs, verify the diagnostic setting is active and streaming to the correct Event Hub namespace and
-
Verify Event Hub metrics
Confirm the Event Hub is successfully passing data to the Function App and not showing throughput or throttling issues.
- Compare the number of Incoming Messages and Outgoing Messages. They should be the same or very close. If there is a significant gap, the Function App is not consuming messages at the expected rate. Continue to the next steps.
- Check for Throttled Requests that may indicate the Event Hub is rejecting incoming telemetry.
-
Check Function App invocations and metrics
Confirm there are no failing invocations.
- In the Azure portal, navigate to the Function App.
- Open the
logsfunction and go to Invocations. - Check the Error counter and review invocation details for recent failures.
NoteThis view might have a significant reporting delay. If you are currently deploying changes or investigating an active incident, proceed to the next step for real-time data.
-
Check Application Insights failures and Live Metrics
Confirm there are no errors or exceptions, and monitor real-time health.
- In the Azure portal, navigate to the Function App > Monitoring > Application Insights > Investigate.
- Go to Live Metrics to see incoming request rate, duration, and number of failed requests in real time. Watch the memory consumption, as high usage may precede OOM errors.
- Go to Failures to confirm that no errors or exceptions are occurring.
-
Review Function App logs
Check Application Insights for specific error messages.
- In the Azure portal, navigate to the Application Insights resource created by the deployment.
- Go to Logs and query for Function App logs.
- Look for authentication failures, network errors, or data processing issues.
The following KQL query helps structure logs for easier analysis:
KQL query to structure Function App logstraces | extend parsed = parse_json(message) | extend logMessage = coalesce(tostring(parsed.message), message) | extend caller = parsed.caller | extend componentId = parsed.["otelcol.component.id"] | extend componentKind = parsed.["otelcol.component.kind"] | extend stackTrace = parsed.stacktrace | project timestamp, severityLevel, logMessage, caller, componentId, componentKind, stackTraceTo quickly identify errors and warnings, use this query:
KQL query to filter non-informational logstraces | where severityLevel != 1 | project timestamp, severityLevel, message | order by timestamp ascIf you need to be more specific, you can filter for particular symptoms. For example, use the following query to check for 503 Service Unavailable errors:
KQL query to filter 503 errorstraces | where severityLevel != 1 and message has "503" | project timestamp, severityLevel, message | order by timestamp asc