Elastic Observability Labs - Azure

Bringing observability insights from Elastic AI Assistant to the world of GitHub Copilot

Thu, 23 May 2024 00:00:00 GMT

GitHub announced GitHub Copilot Extensions this week at Microsoft Build. We are working with the GitHub team in the Limited Beta Program to explore bringing observability insights from Elastic AI Assistant to GitHub Copilot users.

Elastic’s GitHub Copilot Extension aims to combine the capabilities of GitHub Copilot and Elastic AI Assistant for Observability. This could enable developers to access critical insights from Elastic AI Assistant from GitHub Copilot Chat on GitHub.com, Visual Studio, GitHub.com, Visual Studio, and VS Code - places where they write their code.

Developers will be able ask questions such as

What errors are active?
What’s the latest stacktrace for my application?
What caused a slowdown in the application after the last push to the dev environment?
How to write an ES|QL for query that my app will send to Elasticsearch?
What runbook from Github has been loaded into Elasticsearch and is related to the issue I’m investigating And many more!

Watch Jeff's PoC Demo@Microsoft Build 2024

Elastic AI Assistant surfaced in GitHub Copilot Chat from our Extension (Proof of Concept)

What is the Elastic AI Assistant for Observability

The Elastic Observability AI Assistant for Observability, a user-centric tool, is a game-changer in providing contextual insights and streamlining troubleshooting within the Elastic Observability environment. By harnessing generative AI capabilities, the assistant offers open prompts that decipher error messages and propose remediation actions. It adopts a Retrieval-Augmented Generation (RAG) approach to fetch the most pertinent internal information, such as APM traces, log messages, SLOs, GitHub issues, runbooks, and more. This contextual assistance is a huge leap forward for Site Reliability Engineers (SREs) and operations teams, offering immediate, relevant solutions to issues based on existing documentation and resources, boosting developer productivity.

For more information on setting up and using the AI Assistant for Observability check out the blog Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI. Additionally, learn how Elastic Observability AI Assistant uses RAG to help analyze application issues with GitHub issues.

One unique feature of the AI Assistant is its API support. This allows you to take advantage of all the capabilities provided by the Elastic AI Assistant, and integrate them right into your workflow.

What is a GitHub Copilot Extension

GitHub Copilot Extensions, a new addition to GitHub Copilot, revolutionizes the developer experience by integrating a diverse array of tools and services directly into the developer's workflow. These unique extensions, crafted by partners, enable developers to interact with various services and tools using natural language within their Integrated Development Environment (IDE) or GitHub.com. This integration eliminates the need for context-switching, allowing developers to maintain their flow state, troubleshoot issues, and deploy solutions with unparalleled efficiency. These extensions will be accessible through GitHub Copilot Chat in the GitHub Marketplace, with options for organizations to create private extensions tailored to their internal tooling.

What’s next

We are participating in the Github Limited Beta Program as a partner and exploring the possibility of bringing Elastic GitHub Copilot Extension to the GitHub Marketplace. We are excited to unlock insights from Elastic Observability to GitHub Copilot users side by side to the code behind those services. Stay tuned!

Resources:

Debugging Azure Networking for Elastic Cloud Serverless

Thu, 05 Jun 2025 00:00:00 GMT

Summary of Findings

Elastic's Site Reliability Engineering team (SRE) observed unstable throughput and packet loss in Elastic Cloud Serverless running on Azure Kubernetes Service (AKS). After investigation, we identified the primary contributing factors to be RX ring buffer overflows and kernel input queue saturation on SR-IOV interfaces. To address this, we increased RX buffer sizes and adjusted the netdev backlog, which significantly improved network stability.

Setting the Scene

Elastic Cloud Serverless is a fully managed solution that allows you to deploy and use Elastic for your use cases without managing the underlying infrastructure. Built on Kubernetes, it represents a shift in how you interact with Elasticsearch. Instead of managing clusters, nodes, data tiers, and scaling, you create serverless projects that are fully managed and automatically scaled by Elastic. This abstraction of infrastructure decisions allows you to focus solely on gaining value and insight from your data.

Elastic Cloud Serverless is generally available (GA) on AWS, GCP and currently in Technical Preview on Azure. As part of preparing Elastic Cloud Serverless GA on Azure, we have been conducting extensive performance and scalability tests to ensure that our users get a consistent and reliable user experience.

In this post, we’ll take you behind the scenes of a deep technical investigation into a surprising performance issue that affected Serverless Elasticsearch in our Azure Kubernetes clusters. At first, the network seemed like the least likely place to look, especially with a high-speed 100 Gb/s interface on the host backing it. But as we dug deeper, with help from the Microsoft Azure team, that’s exactly where the problem led us.

Unexpected Results!

While the high-level architectures and system design patterns of the major cloud provider’s systems are often similar, the implementations are different, and these differences can have dramatic impacts on a system’s performance characteristics.

One of the most significant differences between the different cloud providers is that the underlying hypervisor software and server hardware of the Virtual Machines can vary significantly, even between instance families of the same provider.

There is no way to fully abstract the hardware away from an application like Elasticsearch. Fundamentally, its performance is dictated by the CPU, memory, disks, and network interfaces on the physical server. In preparation for the Elastic Cloud Serverless GA on Azure, our Elasticsearch Performance team kicked off large-scale load testing against Serverless Elasticsearch projects running on Azure Kubernetes Service (AKS), using ARM-based VMs (we’re big fans!). Throughout this process, we relied heavily on Elastic tools to analyse system behaviour, identify bottlenecks, and validate performance under load.

To perform these scale and load tests, the Elasticsearch Performance team use Rally, an open-source benchmarking tool designed to measure the performance of Elasticsearch clusters. The workload (or in Rally nomenclature, ‘Track’) used for these tests was the GitHub Archive Track. Rally collects and sends test telemetry using the official Python client to a separate Elasticsearch cluster running Elastic Observability, which allows for monitoring and analysis during these scale and load tests in real time via Kibana.

When we looked at the results, we observed that the indexing rate (the number of docs/s) for the Serverless projects was not only much lower than we had expected for the given hardware, but the throughput was also quite unstable. There were peaks and valleys, interspersed with frequent errors, whereas we were instead expecting a stable indexing rate for the duration of the test.

These tests are designed to push the system to its limits, and in doing so, they surfaced unexpected behavior in the form of unstable indexing throughput and intermittent errors. This was precisely the kind of problem we'd hoped to uncover prior to going GA — giving us the opportunity to work closely with Azure.

![Indexing Rate with Packet Loss](/assets/images/debugging-aks-packet-loss/indexing-rate-before.png) _A Kibana visualisation of Rally telemetry, showing fluctuating Elasticsearch indexing rates alongside spikes in 5xx and 4xx HTTP error responses._

Debugging!

Debugging performance issues can feel a little bit like trying to find a ‘Butterfly in a Hurricane’, so it’s crucial that you take a methodological approach to analysing application and system performance.

Using methodologies helps you to be more consistent and thorough in your debugging, and avoids missing things. We started with the Utilisation Saturation and Errors (USE) Method, looking at both the client and server side to identify any obvious bottlenecks in the system.

Elastic's Site Reliability Engineers (SREs) maintain a suite of custom Elastic Observability dashboards designed to visualise data collected from various Elastic Integrations. These dashboards provide deep visibility into the health and performance of Elastic Cloud infrastructure and systems.

For this investigation, we leveraged a custom dashboard built using metrics and log data from the System and Linux Integrations:

![Node Overview Dashboard](/assets/images/debugging-aks-packet-loss/overview-dashboard.png) _One of many Elastic Observability dashboards built and maintained by the SRE team._

Following the USE Method, these dashboards highlight resource utilisation, saturation, and errors across our systems. With their help, we quickly identified that the AKS nodes hosting the Elasticsearch pods under test were dropping thousands of packets per second.

![Node Packet Loss Before Tuning](/assets/images/debugging-aks-packet-loss/packet-loss-before.png) _A Kibana visualisation of [Elastic Agent's System Integration](https://www.elastic.co/docs/reference/integrations/system), showing the rate of packet drops per second for AKS nodes._

Dropping packets forces reliable protocols, such as TCP, to retransmit any missing packets. These retransmissions can introduce significant delays, which kills the throughput of any system where client requests are only triggered upon the previous request completion (known as a Closed System).

To investigate further, we jumped onto one of the AKS nodes exhibiting the packet loss to check the basics. First off, we wanted to identify what type of packet drops or errors we’re seeing; is it for specific pods, or the host as a whole?

root@aks-k8s-node-1:~# ip -s link show
2: eth0:  mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 7c:1e:52:be:ce:5e brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast
    373507935420 134292481      0       0       0      15
    TX:    bytes   packets errors dropped carrier collsns
    644247778936 303191014      0       0       0       0
3: enP42266s1:  mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000
    link/ether 7c:1e:52:be:ce:5e brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast
    386782548951 307000571      0       0 5321081       0
    TX:    bytes   packets errors dropped carrier collsns
    655758630548 477594747      0       0       0       0
    altname enP42266p0s2
15: lxc0ca0ec41ecd2@if14:  mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:f5:5e:c9:4e:fb brd ff:ff:ff:ff:ff:ff link-netns cni-3f90ab53-df66-cac5-bd19-9cea4a68c29b
    RX:    bytes   packets errors dropped  missed   mcast
    627954576078  54297550      0    1600       0       0
    TX:    bytes   packets errors dropped carrier collsns
    372155326349 133538064      0    3927       0       0

In this output you can see the enP42266s1 interface is showing a significant number of packets in the missed column. That’s interesting, sure, but what does missed actually represent? And what is enP42266s1?

To understand, let’s look at roughly what happens when a packet arrives at the NIC:

A packet arrives at the NIC from the network.
The NIC uses DMA (Direct Memory Access) to place the packet into a receive ring buffer allocated in memory by the kernel, mapped for use by the NIC. Since our NICs supports multiple hardware queues, each queue has its own dedicated ring buffer, IRQ, and NAPI context.
The NIC raises a hardware interrupt (IRQ) to notify the CPU that a packet is ready.
The CPU runs the NIC driver’s IRQ handler. The driver schedules a NAPI (New API) poll to defer packet processing to a softirq context. A mechanism in the Linux kernel that defers work to be processed outside of the hard IRQ context, for better batching and CPU efficiency, enabling improved scalability.
The NAPI poll function is executed in a softirq context (NET_RX_SOFTIRQ) and retrieves packets from the ring buffer. This polling continues either until the driver’s packet budget is exhausted (net.core.netdev_budget) or the time limit is hit (net.core.netdev_budget_usecs).
Each packet is wrapped in an sk_buff (socket buffer) structure, which includes metadata such as protocol headers, timestamps, and interface identifiers.
If the networking stack is slower than the rate at which NAPI fetches packets, excess packets are queued in a per-CPU backlog queue (via enqueue_to_backlog). The maximum size of this backlog is controlled by the net.core.netdev_max_backlog sysctl.
Packets are then handed off to the kernel’s networking stack for routing, filtering, and protocol-specific processing (e.g. TCP, UDP).
Finally, packets reach the appropriate socket receive buffer, where they are available for consumption by the user-space application.

Visualised, it looks something like this:

![Linux Packet Flow Diagram](/assets/images/debugging-aks-packet-loss/packet-flow.png) _Image © 2018 Leandro Moreira. Used under the [BSD 3-Clause License](https://opensource.org/licenses/BSD-3-Clause). Source: [GitHub repository](https://github.com/leandromoreira/linux-network-performance-parameters)._

The missed counter is incremented whenever the NIC tries to DMA a packet into a fully occupied ring buffer. The NIC essentially "misses" the chance to deliver the packet to the VM’s memory. However, what’s most interesting is that this counter seldom increments for VMs. This is because Virtual NICs are usually implemented as software via the hypervisor, which typically has much more flexible memory management compared to the physical NICs and can reduce the chance of ring buffer overflow.

We mentioned earlier that we’re building Azure Elasticsearch Serverless on top of Azure’s AKS service, which is important to note because all of our AKS nodes use an Azure feature called Accelerated Networking. In this setup, network traffic is delivered directly to the VM’s network interface, bypassing the hypervisor. This is enabled by single root I/O virtualization (SR-IOV), which offers much lower latency and higher throughput than traditional VM networking. Each node is physically connected to a 100 Gb/s network interface, although the SR-IOV Virtual Function (VF) exposed to the VM typically provides only a fraction of that total bandwidth.

Despite the VM only having a fraction of the 100 Gb/s bandwidth, microbursts are still very possible. These physical interfaces are so fast that they can transmit and receive multiple packets in just nanoseconds, far faster than most buffers or processing queues can absorb. At these timescales, even a short-lived burst of traffic can overwhelm the receiver, leading to dropped packets and unpredictable latency.

Direct access to the SR-IOV interface means that our VMs are responsible for handling the hardware interrupts triggered by the NIC in a timely manner, if there's any delay in handling the hardware interrupt (e.g. waiting to be scheduled onto CPU by the hypervisor) then network packets can be missed!

Firstly - NIC-level Tuning

Since we'd confirmed that our VMs were using SR-IOV, we established that the enP42266s1 and eth0 interfaces were a bonded pair and acted as a single interface. Knowing this, then we reasoned that we should be able to adjust the ring buffer values directly using ethtool.

root@aks-k8s-node-1:~# ethtool -g enP42266s1
Ring parameters for enP42266s1:
Pre-set maximums:
RX:		8192
RX Mini:	n/a
RX Jumbo:	n/a
TX:		8192
Current hardware settings:
RX:		1024
RX Mini:	n/a
RX Jumbo:	n/a
TX:		1024

In the output above, we were using only 1/8th of the available ring buffer descriptors. These values were set by the OS defaults, which generally aim to balance performance and resource usage. Set too low, they risk packet drops under load; set too high, they can lead to unnecessary memory consumption. We knew that the VMs were backed by a virtual function carved out of the directly attached 100 Gb/s network interface, which is fast enough to deliver microbursts that could easily overwhelm small buffers. To better absorb those short, high-intensity bursts of traffic, we increased the NIC’s RX ring buffer size from 1024 to 8192. Using a privileged DaemonSet, we rolled out the change across all of our AKS nodes by installing a udev rule to automatically increase the buffer size:

# Match Mellanox ConnectX network cards and run ethtool to update the ring buffer settings
ENV{INTERFACE}=="en*", ENV{ID_NET_DRIVER}=="mlx5_core", RUN+="/sbin/ethtool -G %k rx ${CONFIG_AZURE_MLX_RING_BUFFER_SIZE} tx ${CONFIG_AZURE_MLX_RING_BUFFER_SIZE}"

![AKS Node Packet Loss after RX ring buffer change](/assets/images/debugging-aks-packet-loss/packet-loss-after.png) _A Kibana visualisation of [Elastic Agent's System Integration](https://www.elastic.co/docs/reference/integrations/system), showing packet loss reduced by ~99% after increasing the NIC's RX ring buffer values._

As soon as the change had been applied to all AKS nodes we stopped ‘missing’ RX packets! Fantastic! As a result of this simple change we observed a significant improvement in our indexing throughput and stability.

![Indexing rate after RX ring buffer change](/assets/images/debugging-aks-packet-loss/indexing-rate-after.png) _A Kibana visualisation of Rally telemetry, showing stable and improved Elasticsearch indexing rates after increasing the RX ring buffer size._

Job done, right? Not quite..

Further improvements - Kernel-level Tuning

Eagle eyed readers may have noticed two things:

In the previous screenshot, despite adjusting the physical RX ring buffer values, we still observed a small number of dropped packets on the TX side.
In the original ip link -s show output, one of the ‘logical’ interfaces used by the Elasticsearch pod was showing dropped packets on both the TX and RX sides.

15: lxc0ca0ec41ecd2@if14:  mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:f5:5e:c9:4e:fb brd ff:ff:ff:ff:ff:ff link-netns cni-3f90ab53-df66-cac5-bd19-9cea4a68c29b
    RX:    bytes   packets errors dropped  missed   mcast
    627954576078  54297550      0    1600       0       0
    TX:    bytes   packets errors dropped carrier collsns
    372155326349 133538064      0    3927       0       0

So, we continued to dig. We’d eliminated ~99% of the packet loss, and the remaining loss rate wasn’t as significant as what we’d started with, but we still wanted to understand why it was occurring even after adjusting the RX ring buffer size of the NIC.

So what does dropped represent, and what is this lxc0ca0ec41ecd2 interface? dropped is similar to missed, but only occurs when packets are deliberately dropped by the kernel or network interface. Crucially though, it doesn’t tell you why a packet was dropped. As for the lxc0ca0ec41ecd2 interface, we use the Azure CNI Powered by Cilium to provide the network functionality to our AKS clusters. Any pod spun up on an AKS node gets a ‘logical’ interface, which is a virtual ethernet (veth) pair that connects the pod’s network namespace with the host’s network namespace. It was here that we were dropping packets.

In our experience, packet drops at this layer are unusual, so we started digging deeper into the cause of the drops. There are numerous ways you can debug why a packet is being dropped, but one of the easiest is to use perf attach to the skb:kfree_skb tracepoint. The "socket buffer" (skb) is the primary data structure used to represent network packets in the Linux kernel. When a packet is dropped, its corresponding socket buffer is usually freed, triggering the kfree_skb tracepoint. Using perf to attach to this event allowed us to capture stack traces to analyze the cause of the drops.

``` # perf record -g -a -e skb:kfree_skb ```

We left this to run for ~10 minutes or so to capture as many drops as possible, and then ‘heavily inspired’ by this GitHub Gist by Ivan Babrou, we converted the stack traces into an ‘easier’ to read Flamegraphs:

# perf script | sed -e 's/skb:kfree_skb:.*reason:\(.*\)/\n\tfffff \1 (unknown)/' -e 's/^\(\w\+\)\s\+/kernel /' > stacks.txt
cat stacks.txt | stackcollapse-perf.pl --all | perl -pe 's/.*?;//' | sed -e 's/.*irq_exit_rcu_\[k\];/irq_exit_rcu_[k];/' | flamegraph.pl --colors=java --hash --title=aks-k8s-node-1 --width=1440 --minwidth=0.005 > aks-k8s-node-1.svg

![AKS Node Packet Loss Flamegraph](/assets/images/debugging-aks-packet-loss/aks-packet-loss-flamegraph.png) _A Flamegraph showing the various stack trace ancestry of packet loss._

The flamegraph here shows how often different functions appeared in stack traces for packets drops. Each box represents a function call and wider boxes mean the function appears more frequently in the traces. The stack's ancestry builds upward from the bottom with earlier calls, to the top with later calls.

Firstly, we quickly discovered that unfortunately the skb_drop_reason enum was only added in Kernel 5.17 (Azure’s Node Image at the time was using 5.15). This meant that there was no single human readable message that told us why the packets were being dropped, instead all we got was NOT_SPECIFIED. To work out why packets were being dropped we needed to do a little sleuthing through the stack traces to work out what code paths were being taken when a packet was dropped.

In the flamegraph above you can see that many of the stack traces include veth driver function calls (e.g. veth_xmit), and many end abruptly with a call to the enqueue_to_backlog function. When many stacks end at the same function (like enqueue_to_backlog) it suggests that function is a common point where packets are being dropped. If you go back to the earlier explanation of what happens when a packet arrives at the NIC, you’ll notice that in step 7 we explained:

7. If the networking stack is slower than the rate at which NAPI fetches packets, excess packets are queued in a per-CPU backlog queue (via enqueue_to_backlog). The maximum size of this backlog is controlled by the net.core.netdev_max_backlog sysctl.

Using the same privileged DaemonSet method for the RX ring buffer adjustment, we set the value of the net.core.netdev_max_backlog adjustable kernel parameter from 1000 to 32768:

/usr/sbin/sysctl -w net.core.netdev_max_backlog=32768

This value was based on the fact we knew the hosts were using a 100 Gb/s SR-IOV NIC, even if the VM was allowed only a fraction of the total bandwidth. We acknowledge that it’s worth revisiting this value in the future to see if it can be better optimised to not waste extraneous memory, but at the time “perfect was the enemy of good”.

We re-ran the load tests and compared the three sets of results we’d collected thus far.

![Final Indexing Rate Results](/assets/images/debugging-aks-packet-loss/indexing-rate-final.png) _A Kibana visualisation of Rally results, comparing impact to median throughput after each configuration change._

Tuning Step	Packet Loss	Median indexing throughput
Baseline	High	~18,000 docs/s
+RX Buffer	~99% drop ↓	~26,000 (+ ~40% from baseline)
+Backlog & +RX Buffer	Near zero	~29,000 (+ ~60% from baseline)

Here you can see the P50 of throughput in docs/s over the course of the hours-long load tests. Compared to the baseline, we saw a roughly ~40% increase in throughput by only adjusting the RX ring buffer values, and a ~50-60% increase with both the RX ring buffer and backlog changes! Hooray!

A great result and one more step on our journey towards better Serverless Elasticsearch performance.

Working with Azure

It’s great that we were able to quickly identify and mitigate the majority of our packet loss issues, but since we were using AKS with AKS node images, it made sense to engage with Azure to understand why the defaults weren’t working for our workload.

We walked Azure through our investigation, mitigations and results, and asked for some additional validation of our mitigations. Azure Engineering confirmed that the host NICs were not discarding packets, which confirmed that everything arriving at the host level was passed through to the hypervisor on the host. Further investigation confirmed that no loss or discards were occurring to Azure network fabric, or internal to the hypervisor – which shifted focus from the host to the guest OS and why the guest OS kernel was slow when reading packets off of the enP* SR-IOV interfaces.

Given the complexity of our load testing scenario — which involved configuring multiple systems and tools, including Elastic Observability, we also developed a simplified reproduction of the packet loss issue using iperf3. This simplified test was created specifically to share with Azure for targeted analysis, and added to the broader monitoring and analysis enabled by Elastic Observability and Rally.

With this reproduction Azure was able to confirm the increasing missed and dropped packet counters we had observed, and confirmed the increased RX ring buffer and netdev_max_backlog increase as the recommended mitigations.

Conclusion

While cloud providers offer various abstractions to manage your resources, the underlying hardware ultimately determines your application's performance and stability. High-performance hardware often requires tuning at the operating system level, well beyond the default settings most environments ship with. In managed platforms like AKS, where Azure controls both the node images and infrastructure, it is easy to overlook the impact of low-level configurations such as network device ring buffer sizes or sysctls like net.core.netdev_max_backlog.

Our experience shows that even with the convenience of a managed Kubernetes service, performance issues can still emerge if these hardware parameters are not tuned appropriately. It was tempting to assume that high-speed 100 Gb/s network interfaces, directly attached to the VM using SR-IOV would eliminate any chance of network-related bottlenecks. In reality, that assumption didn’t hold up.

Engaging early with Azure was essential, as they provided deeper visibility into the underlying infrastructure and worked with us to tune low-level, performance-critical settings. Combined with thorough load and scale testing and robust observability using tools like Elastic Observability, this collaboration helped us detect and rectify the issue early in order to deliver a consistent, reliable, and high-performing experience for our users.

Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI

Wed, 03 Apr 2024 00:00:00 GMT

Recently, Elastic announced the AI Assistant for Observability is now generally available for all Elastic users. The AI Assistant enables a new tool for Elastic Observability providing large language model (LLM) connected chat and contextual insights to explain errors and suggest remediation. Similar to how Microsoft Copilot is an AI companion that introduces new capabilities and increases productivity for developers, the Elastic AI Assistant is an AI companion that can help you quickly gain additional value from your observability data.

This blog post presents a step-by-step guide on how to set up the AI Assistant for Observability with Azure OpenAI as the backing LLM. Then once you’ve got the AI Assistant set up, this post will show you how to add documents to the AI Assistant’s knowledge base along with demonstrating how the AI Assistant uses its knowledge base to improve its responses to address specific questions.

Set up the Elastic AI Assistant for Observability: Create an Azure OpenAI key

Start by creating a Microsoft Azure OpenAI API key to authenticate requests from the Elastic AI Assistant. Head over to Microsoft Azure and use an existing subscription or create a new one at the Azure portal.

Currently, access to the Azure OpenAI service is granted by applying for access. See the official Microsoft documentation for the current prerequisites.

In the Azure portal, select Azure OpenAI.

In the Azure OpenAI service, click the Create button.

Enter an instance Name and click Next.

Select your network access preference for the Azure OpenAI instance and click Next.

Add optional Tags and click Next.

Confirm your settings and click Create to create the Azure OpenAI instance.

Once the instance creation is complete, click the Go to resource button.

Click the Manage keys link to access the instance’s API key.

Copy your Azure OpenAI API Key and the Endpoint and save them both in a safe place for use in a later step.

Next, click Model deployments to create a deployment within the Azure OpenAI instance you just created.

Click the Manage deployments button to open Azure OpenAI Studio.

Click the Create new deployment button.

Select the model type you want to use and enter a Deployment name. Note the Deployment name for use in a later step. Click the Create button to deploy the model.

Set up the Elastic AI Assistant for Observability: Create an OpenAI connector in Elastic Cloud

The remainder of the instructions in this post will take place within Elastic Cloud. You can use an existing deployment or you can create a new Elastic Cloud deployment as a free trial if you’re trying Elastic Cloud for the first time. Another option to get started is to create an Elastic deployment from the Microsoft Azure Marketplace.

The next step is to create an Azure OpenAI connector in Elastic Cloud. In the Elastic Cloud console for your deployment, select the top-level menu and then select Stack Management.

Select Connectors on the Stack Management page.

Select Create connector.

Select the connector for Azure OpenAI.

Enter a Name of your choice for the connector. Select Azure OpenAI as the OpenAI provider.

Enter the Endpoint URL using the following format:

Replace {your-resource-name} with the name of the Azure Open AI instance that you created within the Azure portal in a previous step.
Replace deployment-id with the Deployment name that you specified when you created a model deployment within the Azure portal in a previous step.
Replace {api-version} with one of the valid Supported versions listed in the Completions section of the Azure OpenAI reference page.

https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}

Your completed Endpoint URL should look something like this:

https://example-openai-instance.openai.azure.com/openai/deployments/gpt-4-turbo/chat/completions?api-version=2024-02-01

Enter the API Key that you copied in a previous step. Then click the Save & test button.

Within the Edit Connector flyout window, click the Run button to confirm that the connector configuration is valid and can successfully connect to your Azure OpenAI instance.

A successful connector test should look something like this:

Add an example logs record

Now that you have your Elastic Cloud deployment set up with an AI Assistant connector, let’s add an example logs record to demonstrate how the AI Assistant can help you to better understand logs data.

We’ll use the Elastic Dev Tools to add a single logs record. Click the top-level menu and select Dev Tools.

Within the Console area of Dev Tools, enter the following POST statement:

POST /logs-elastic_agent-default/_doc
{
	"message": "Status(StatusCode=\"FailedPrecondition\", Detail=\"Can't access cart storage. \nSystem.ApplicationException: Wasn't able to connect to redis \n  at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104 \n  at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168\").",
	"@timestamp": "2024-02-22T11:34:00.884Z",
	"log": {
    	"level": "error"
	},
	"service": {
    	"name": "cartService"
	},
	"host": {
    	"name": "appserver-1"
	}
}

Then run the POST command by clicking the green Run button.

You should see a 201 response confirming that the example logs record was successfully created.

Use the Elastic AI Assistant

Now that you have a log record to work with, let’s jump over to the Observability Logs Explorer to see how the AI Assistant interacts with logs data. Click the top-level menu and select Observability.

Select Logs Explorer to explore the logs data.

In the Logs Explorer search box, enter the text “redis” and press the Enter key to perform the search.

Click the View all matches button to include all search results.

You should see the one log record that you previously inserted via Dev Tools. Click the expand icon to see the log record’s details.

You should see the expanded view of the logs record. Instead of trying to understand its contents ourselves, we'll use the AI Assistant to summarize it. Click on the What's this message? button.

We get a fairly generic answer back. Depending on the exception or error we're trying to analyze, this can still be really useful, but we can make this better by adding additional documentation to the AI Assistant knowledge base.

Let’s see how we can use the AI Assistant’s knowledge base to improve its understanding of this specific logs message.

Create an Elastic AI Assistant knowledge base

Select Overview from the Observability menu.

Click the AI Assistant button at the top right of the window.

Click the Install Knowledge base button.

Click the top-level menu and select Stack Management.

Then select AI Assistants.

Click Elastic AI Assistant for Observability.

Select the Knowledge base tab.

Click the New entry button and select Single entry.

Give it the Name “cartservice” and enter the following text as the Contents :

Link: [Cartservice Intermittent connection issue](https://github.com/elastic/observability-examples/issues/25)
I have the following GitHub issue. Store this information in your knowledge base and always return the link to it if relevant.
GitHub Issue, return if relevant

Link: https://github.com/elastic/observability-examples/issues/25

Title: Cartservice Intermittent connection issue

Body:
The cartservice occasionally encounters storage errors due to an unreliable network connection.

The errors typically indicate a failure to connect to Redis, as seen in the error message:

Status(StatusCode="FailedPrecondition", Detail="Can't access cart storage.
System.ApplicationException: Wasn't able to connect to redis
at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104
at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168')'.
I just talked to the SRE team in Slack, they have plans to implement retries as a quick fix and address the network issue later.

Click Save to save the new knowledge base entry.

Now let’s go back to the Observability Logs Explorer. Click the top-level menu and select Observability.

Then select Explorer under Logs.

Expand the same logs entry as you did previously and click the What’s this message? button.

The response you get now should be much more relevant.

Try out the Elastic AI Assistant with a knowledge base filled with your own data

Now that you’ve seen how easy it is to set up the Elastic AI Assistant for Observability, go ahead and give it a try for yourself. Sign up for a free 14-day trial. You can quickly spin up an Elastic Cloud deployment in minutes and have your own search powered AI knowledge base to help you with getting your most important work done.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Elastic Observability monitors metrics for Microsoft Azure in just minutes

Mon, 29 Jan 2024 00:00:00 GMT

Developers and SREs choose Microsoft Azure to run their applications because it is a trustworthy world-class cloud platform. It has also proven itself over the years as an extremely powerful and reliable infrastructure for hosting business-critical applications.

Elastic Observability offers over 25 out-of-the-box integrations for Microsoft Azure services with more on the way. A full list of Azure integrations can be found in our online documentation.

Elastic Observability aggregates not only logs but also metrics for Azure services and the applications running on Azure compute services (Virtual Machines, Functions, Kubernetes Service, etc.). All this data can be analyzed visually and more intuitively using Elastic®’s advanced machine learning (ML) capabilities, which help detect performance issues and surface root causes before end users are affected.

For more details on how Elastic Observability provides application performance monitoring (APM) capabilities such as service maps, tracing, dependencies, and ML-based metrics correlations, read APM correlations in Elastic Observability: Automatically identifying probable causes of slow or failed transactions.

That’s right, Elastic offers capabilities to collect, aggregate, and analyze metrics for Microsoft Azure services and applications running on Azure. Elastic Observability is for more than just capturing logs — it offers a unified observability solution for Microsoft Azure workloads.

In this blog, we’ll review how Elastic Observability can monitor metrics for a three-tier web application running on Microsoft Azure and leveraging:

Microsoft Azure Virtual Machines
Microsoft Azure SQL database
Microsoft Azure Virtual Network

As you will see, once the integration is installed, metrics will arrive instantly and you can immediately start deriving insights from metrics.

Prerequisites and config

Here are some of the components and details we used to set up this demonstration:

Ensure you have a Microsoft Azure account and an Azure service principal with permission to read monitoring data from Microsoft Azure (see details in our documentation).
This post does not cover application monitoring; instead, we will focus on how Microsoft Azure services can be easily monitored. If you want to get started with examples of application monitoring, see our Hello World observability code samples.
In order to see metrics, you will need to load the application. We’ve also created a Playwright script to drive traffic to the application.

Three-tier application overview

Before we dive into the Elastic deployment setup and configuration, let's review what we are monitoring. If you follow the Microsoft Learn N-tier example app instructions for deploying the "What's for Lunch?" app, you will have the following deployed.

What’s deployed:

Microsoft Azure VM presentation tier that renders an HTML client in the user's browser and enables user requests to be sent to the “What’s for Lunch?” app
Microsoft Azure VM application tier that communicates with the presentation and the database tier
Microsoft Azure SQL instance in the database tier, handling requests from the application tier to store and serve data

At the end of the blog, we will also provide a Playwright script that can be run to send requests to this app in order to load it with example data and exercise its functionality. This will help drive metrics to “light up” the dashboards.

Setting it all up

Let’s walk through the details of how to deploy the example three-tier application, Azure integration on Elastic and visualize what gets ingested in Elastic’s Kibana® dashboards.

Step 0: Get an account on Elastic Cloud

Follow the instructions to get started on Elastic Cloud.

Step 1: Deploy the Microsoft Azure three-tier application

From the Azure portal, click the Cloud Shell icon at the top of the portal to open Cloud Shell…

… and when the Cloud Shell first opens, select Bash as the shell type to use.

If you’re prompted that “You have no storage mounted,” then click the Create storage button to create a file store to be used for saving and editing files from Cloud Shell.

You should now see the open Cloud Shell terminal.

Run the following command in Cloud Shell to define the environment variables that we’ll be using in the Cloud Shell commands required to deploy and view the sample application.

Be sure to specify a valid RESOURCE_GROUP from your available Resource Groups listed in the Azure portal. Also specify a new password to replace the SpecifyNewPasswordHere placeholder text before running the command. See the Microsoft password policy documentation for password requirements.

RESOURCE_GROUP="test"
APP_PASSWORD="SpecifyNewPasswordHere"

Run the following az deployment group create command, which will deploy the example three-tier web app in around five minutes.

az deployment group create --resource-group $RESOURCE_GROUP --template-uri https://raw.githubusercontent.com/MicrosoftDocs/mslearn-n-tier-architecture/master/Deployment/azuredeploy.json --parameters password=$APP_PASSWORD

After the deployment has completed, run the following command, which returns the URL for the app.

az deployment group show --output table --resource-group $RESOURCE_GROUP --name azuredeploy --query properties.outputs.webSiteUrl

Copy the web app URL and paste it into a browser to view the example “What’s for Lunch?” web app.

Step 2: Create an Azure service principal and grant access permission

Go to the Microsoft Azure Portal. Search for active directory and select Microsoft Entra ID.

Copy the Tenant ID for use in a later step in this blog post. This ID is required to configure Elastic Agent to connect to your Azure account.

In the navigation pane, select App registrations.

Then click New registration.

Type the name of your application (this tutorial uses three-tier-app-azure) and click Register (accept the default values for other settings).

Copy the Application (client) ID and save it for later. This ID is required to configure Elastic Agent to connect to your Azure account.

In the navigation pane, select Certificates & secrets , and then click New client secret to create a new security key.

Type a description of the secret and select an expiration. Click Add to create the client secret. Under Value , copy the secret value and save it (along with your client ID) for later.

After creating the Azure service principal, you need to grant it the correct permissions. In the Azure Portal, search for and select Subscriptions.

In the Subscriptions page, click the name of your subscription. On the subscription details page, copy your Subscription ID and save it for a later step.

In the navigation pane, select Access control (IAM).

Click Add and select Add role assignment.

On the Role tab, select the Monitoring Reader role and then click Next.

On the Members tab, select the option to assign access to User, group, or service principal. Click Select members , and then search for and select the principal you created earlier. For the description, enter the name of your service principal. Click Next to review the role assignment.

Click Review + assign to grant the service principal access to your subscription.

Step 3: Create an Azure VM instance

In the Azure Portal, search for and select Virtual machines.

On the Virtual machines page, click + Create and select Azure virtual machine.

On the Virtual machine creation page, enter a name like “metrics-vm” for the virtual machine name and select VM Size to be “Standard_D2s_v3 - 2 vcpus, 8 GiB memory.” Click the Next : Disks button.

On the Disks page, keep the default settings and click the Next : Networking button.

On the Networking page, demo-vnet should be selected for Virtual network and demo-biz-subnet should be selected for Subnet. These resources are created as part of the three-tier example app’s deployment that was done in Step 1.

Click the Review + create button.

On the Review page, click the Create button.

Step 4: Install the Azure Resource Metrics integration

In your Elastic Cloud deployment, navigate to the Elastic Azure integrations by selecting Integrations from the top-level menu. Search for azure resource and click the Azure Resource Metrics tile.

Click Add Azure Resource Metrics.

Click Add integration only (skip agent installation).

Enter the values that you saved previously for Client ID, Client Secret, Tenant ID, and Subscription ID.

As you can see, the Azure Resource Metrics integration will collect a significant amount of data from eight Azure services. Click Save and continue.

You’ll be presented with a confirmation dialog window. Click Add Elastic Agent to your hosts.

This will display the instructions required to install the Elastic agent. Copy the command under the Linux Tar tab.

Next you will need to use SSH to log in to the Azure VM instance and run the commands copied from Linux Tar tab. Go to Azure Virtual Machines in the Azure portal. Then click the name of the VM instance that you created in Step 3.

Click the Select button in the SSH Using Azure CLI section.

Select the “I understand …” checkbox and then click the Configure + connect button.

Once you are SSH’d inside the VM instance terminal window, run the commands copied previously from Linux Tar tab in the Install Elastic Agent on your host instructions. When the installation completes, you’ll see a confirmation message in the Install Elastic Agent on your host form.

Super! The Elastic agent is sending data to Elastic Cloud. Now let’s observe some metrics.

Step 5: Run traffic against the application

While getting the application running is fairly easy, there is nothing to monitor or observe with Elastic unless you add a load on the application.

Here is a simple script you can also run using Playwright to add traffic and exercise the functionality of the Azure three-tier application:

import { test, expect } from "@playwright/test";

test("homepage for Microsoft Azure three tier app", async ({ page }) => {
  // Load web app
  await page.goto("http://20.172.198.231/");
  // Add lunch suggestions
  await page.fill("id=txtAdd", "tacos");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  await page.fill("id=txtAdd", "sushi");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  await page.fill("id=txtAdd", "pizza");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  await page.fill("id=txtAdd", "burgers");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  await page.fill("id=txtAdd", "salad");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  await page.fill("id=txtAdd", "sandwiches");
  await page.keyboard.press("Enter");
  await page.waitForTimeout(1000);
  // Click vote buttons
  await page.getByRole("button").nth(1).click();
  await page.getByRole("button").nth(3).click();
  await page.getByRole("button").nth(5).click();
  await page.getByRole("button").nth(7).click();
  await page.getByRole("button").nth(9).click();
  await page.getByRole("button").nth(11).click();
  // Click remove buttons
  await page.getByRole("button").nth(12).click();
  await page.getByRole("button").nth(10).click();
  await page.getByRole("button").nth(8).click();
  await page.getByRole("button").nth(6).click();
  await page.getByRole("button").nth(4).click();
  await page.getByRole("button").nth(2).click();
});

Step 6: View Azure dashboards in Elastic

With Elastic Agent running, you can go to Elastic Dashboards to view what’s being ingested. Simply search for “dashboard” in Elastic and choose Dashboard.

This will open the Elastic Dashboards page. In the Dashboards search box, search for azure vm and click the [Azure Metrics] Compute VMs Overview dashboard, one of the many out-of-the-box dashboards available.

You will see a Dashboard populated with your deployed application’s VM metrics.

On the Azure Compute VM dashboard, we can see the following sampling of some of the many available metrics:

CPU utilization
Available memory
Network sent and received bytes
Disk writes and reads metrics

For metrics not covered by out-of-the-box dashboards, custom dashboards can be easily created to visualize metrics that are important to you.

Congratulations, you have now started monitoring metrics from Microsoft Azure services for your application!

Analyze your data with Elastic AI Assistant

Once metrics and logs (or either one) are in Elastic, start analyzing your data with context-aware insights using the Elastic AI Assistant for Observability.

Conclusion: Monitoring Microsoft Azure service metrics with Elastic Observability is easy!

We hope you’ve gotten an appreciation for how Elastic Observability can help you monitor Azure service metrics. Here’s a quick recap of what you learned:

Elastic Observability supports ingest and analysis of Azure service metrics.
It’s easy to set up ingest from Azure services via the Elastic Agent.
Elastic Observability has multiple out-of-the-box Azure service dashboards you can use to preliminarily review information and then modify for your needs.

Try it out for yourself by signing up via Microsoft Azure Marketplace and quickly spin up a deployment in minutes on any of the Elastic Cloud regions on Microsoft Azure around the world. Your Azure Marketplace purchase of Elastic will be included in your monthly consolidated billing statement and will draw against your committed spend with Microsoft Azure.

How to deploy a Hello World web app with Elastic Observability on Azure Container Apps

Mon, 23 Oct 2023 00:00:00 GMT

Elastic Observability is the optimal tool to provide visibility into your running web apps. Microsoft Azure Container Apps is a fully managed environment that enables you to run containerized applications on a serverless platform so that your applications scale up and down. This allows you to accomplish the dual objective of serving every customer’s need for availability while meeting your needs to do so as efficiently as possible.

Using Elastic Observability and Azure Container Apps is a perfect combination for developers to deploy web apps that are auto-scaled with fully observable operations.

This blog post will show you how to deploy a simple Hello World web app to Azure Container Apps and then walk you through the steps to instrument the Hello World web app to enable observation of the application’s operations with Elastic Cloud.

Elastic Observability setup

We’ll start with setting up an Elastic Cloud deployment, which is where observability will take place for the web app we’ll be deploying.

From the Elastic Cloud console, select Create deployment.

Enter a deployment name and click Create deployment. It takes a few minutes for your deployment to be created. While waiting, you are prompted to save the admin credentials for your deployment, which provides you with superuser access to your Elastic® deployment. Keep these credentials safe as they are shown only once.

Elastic Observability requires an APM Server URL and an APM Secret token for an app to send observability data to Elastic Cloud. Once the deployment is created, we’ll copy the Elastic Observability server URL and secret token and store them somewhere safely for adding to our web app code in a later step.

To copy the APM Server URL and the APM Secret Token, go to Elastic Cloud . Then go to the Deployments page, which lists all of the deployments you have created. Select the deployment you want to use, which will open the deployment details page. In the Kibana row of links, click on Open to open Kibana® for your deployment.

Select Integrations from the top-level menu. Then click the APM tile.

On the APM Agents page, copy the secretToken and the serverUrl values and save them for use in a later step.

Now that we’ve completed the Elastic Cloud setup, the next step is to set up our account in Azure for deploying apps to the Container Apps service.

Azure Container Apps setup

First we’ll need an Azure account, so let’s create one by going to the Microsoft Azure portal and creating a new project. Click the Start free button and follow the steps to sign in or create a new account.

Deploy a Hello World web app to Container Apps

We’ll perform the process of deploying a C# Hello World web app to Container Apps using the handy Azure tool called Cloud Shell. To deploy the Hello World app, we’ll perform the following 12 steps:

From the Azure portal, click the Cloud Shell icon at the top of the portal to open Cloud Shell…

… and when the Cloud Shell first opens, select Bash as the shell type to use.

If you’re prompted that “You have no storage mounted,” then click the Create storage button to create a file store to be used for saving and editing files from Cloud Shell.

In Cloud Shell, clone a C# Hello World sample app repo from GitHub by entering the following command.

git clone https://github.com/elastic/observability-examples

Change directory to the location of the Hello World web app code.

cd observability-examples/azure/container-apps/helloworld

Define the environment variables that we’ll be using in the commands throughout this blog post.

RESOURCE_GROUP="helloworld-containerapps"
LOCATION="centralus"
ENVIRONMENT="env-helloworld-containerapps"
APP_NAME="elastic-helloworld"

Define a registry container name that is unique by running the following command.

ACR_NAME="helloworld"$RANDOM

Create an Azure resource group by running the following command.

az group create --name $RESOURCE_GROUP --location "$LOCATION"

Run the following command to create a registry container in Azure Container Registry.

az acr create --resource-group $RESOURCE_GROUP \
--name $ACR_NAME --sku Basic --admin-enable true

Build the app image and push it to Azure Container Registry by running the following command.

az acr build --registry $ACR_NAME --image $APP_NAME .

az provider register -n Microsoft.OperationalInsights --wait

Run the following command to create a Container App environment for deploying your app into.

az containerapp env create --name $ENVIRONMENT \
--resource-group $RESOURCE_GROUP --location "$LOCATION"

Create a new Container App by deploying the Hello World app’s image to Container Apps, using the following command.

az containerapp create \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT \
  --image $ACR_NAME.azurecr.io/$APP_NAME \
  --target-port 3500 \
  --ingress 'external' \
  --registry-server $ACR_NAME.azurecr.io \
  --query properties.configuration.ingress.fqdn

This command will output the deployed Hello World app's fully qualified domain name (FQDN). Copy and paste the FQDN into a browser to see your running Hello World app.

Instrument the Hello World web app with Elastic Observability

With a web app successfully running in Container Apps, we’re now ready to add the minimal code necessary to enable observability for the Hello World app in Elastic Cloud. We’ll perform the following eight steps:

In Azure Cloud Shell, create a new file named Telemetry.cs by typing the following command.

touch Telemetry.cs

Open the Azure Cloud Shell file editor by typing the following command in Cloud Shell.

code .

In the Azure Cloud Shell editor, open the Telemetry.cs file and paste in the following code. Save the edited file in Cloud Shell by pressing the [Ctrl] + [s] keys on your keyboard (or if you’re on a macOS computer, use the [⌘] + [s] keys). This class file is used to create a tracer ActivitySource, which can generate trace Activity spans for observability.

using System.Diagnostics;

public static class Telemetry
{
	public static readonly ActivitySource activitySource = new("Helloworld");
}

In the Azure Cloud Shell editor, edit the file named Dockerfile to add the following Elastic OpenTelemetry environment variables. Replace the ELASTIC_APM_SERVER_URL text and the ELASTIC_APM_SECRET_TOKEN text with the APM Server URL and the APM Secret Token values that you copied and saved in an earlier step.

Save the edited file in Cloud Shell by pressing the [Ctrl] + [s] keys on your keyboard (or if you’re on a macOS computer, use the [⌘] + [s] keys).

The updated Dockerfile should look something like this:

FROM ${ARCH}mcr.microsoft.com/dotnet/aspnet:7.0. AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:8.0-preview AS build
ARG TARGETPLATFORM

WORKDIR /src
COPY ["helloworld.csproj", "./"]
RUN dotnet restore "./helloworld.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "helloworld.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "helloworld.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
EXPOSE 3500
ENV ASPNETCORE_URLS=http://+:3500

ENV OTEL_EXPORTER_OTLP_ENDPOINT='https://******.apm.us-east-2.aws.elastic-cloud.com:443'
ENV OTEL_EXPORTER_OTLP_HEADERS='Authorization=Bearer ***********'
ENV OTEL_LOG_LEVEL=info
ENV OTEL_METRICS_EXPORTER=otlp
ENV OTEL_RESOURCE_ATTRIBUTES=service.version=1.0,deployment.environment=production
ENV OTEL_SERVICE_NAME=helloworld
ENV OTEL_TRACES_EXPORTER=otlp

ENTRYPOINT ["dotnet", "helloworld.dll"]

In the Azure Cloud Shell editor, edit the helloworld.csproj file to add the Elastic APM and OpenTelemetry dependencies. The updated helloworld.csproj file should look something like this:




  
	net7.0
	enable
	enable

In the Azure Cloud Shell editor, edit the Program.cs:

Add a using statement at the top of the file to import System.Diagnostics, which is used to create Activities that are equivalent to “spans” in OpenTelemetry. Also import the OpenTelemetry.Resources and OpenTelemetry.Trace packages.

using System.Diagnostics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

Update the “builder” initialization code block to include configuration to enable Elastic OpenTelemetry observability.

builder.Services.AddOpenTelemetry().WithTracing(builder => builder.AddOtlpExporter()
                	.AddSource("helloworld")
                	.AddAspNetCoreInstrumentation()
                	.AddOtlpExporter()
    	.ConfigureResource(resource =>
        	resource.AddService(
            	serviceName: "helloworld"))
);
builder.Services.AddControllers();

Replace the “Hello World!” HTML output string…

Hello World!

...with the “Hello Elastic Observability” HTML output string.


  
    Hello Elastic Observability - Azure Container Apps - C#

Add a telemetry trace span around the output response utilizing the Telemetry class’ ActivitySource.

using (Activity activity = Telemetry.activitySource.StartActivity("HelloSpan")!)
   	{
   		Console.Write("hello");
   		await context.Response.WriteAsync(output);
   	}

The updated Program.cs file should look something like this:

using System.Diagnostics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry().WithTracing(builder => builder.AddOtlpExporter()
                	.AddSource("helloworld")
                	.AddAspNetCoreInstrumentation()
                	.AddOtlpExporter()
    	.ConfigureResource(resource =>
        	resource.AddService(
            	serviceName: "helloworld"))
);
builder.Services.AddControllers();
var app = builder.Build();

string output =
"""


Hello Elastic Observability - Azure Container Apps - C#



""";

app.MapGet("/", async context =>
	{
    	using (Activity activity = Telemetry.activitySource.StartActivity("HelloSpan")!)
    		{
        		Console.Write("hello");
        		await context.Response.WriteAsync(output);
    		}
	}
);
app.Run();

Rebuild the Hello World app image and push the image to the Azure Container Registry by running the following command.

az acr build --registry $ACR_NAME --image $APP_NAME .

Redeploy the updated Hello World app to Azure Container Apps, using the following command.

az containerapp create \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT \
  --image $ACR_NAME.azurecr.io/$APP_NAME \
  --target-port 3500 \
  --ingress 'external' \
  --registry-server $ACR_NAME.azurecr.io \
  --query properties.configuration.ingress.fqdn

This command will output the deployed Hello World app's fully qualified domain name (FQDN). Copy and paste the FQDN into a browser to see the updated Hello World app running in Azure Container Apps.

Observe the Hello World web app

Now that we’ve instrumented the web app to send observability data to Elastic Observability, we can now use Elastic Cloud to monitor the web app’s operations.

In Elastic Cloud, select the Observability Services menu item.
Click the helloworld service.
Click the Transactions tab.
Scroll down and click the GET / transaction.Scroll down to the Trace Sample section to see the GET / , HelloSpan trace sample.

Observability made to scale

You’ve seen the entire process of deploying a web app to Azure Container Apps that is instrumented with Elastic Observability. This web app is now fully available on the web running on a platform that will auto-scale to serve visitors worldwide. And it’s instrumented for Elastic Observability APM using OpenTelemetry to ingest data into Elastic Cloud’s Kibana dashboards.

Now that you’ve seen how to deploy a Hello World web app with a basic observability setup, visit Elastic Observability to learn more about expanding to a full scale observability coverage solution for your apps. Or visit Getting started with Elastic on Microsoft Azure for more examples of how you can drive the data insights you need by combining Microsoft Azure’s cloud computing services with Elastic’s search-powered platform.

Gain insights into Kubernetes errors with Elastic Observability logs and OpenAI

Thu, 18 May 2023 00:00:00 GMT

As we’ve shown in previous blogs, Elastic^® provides a way to ingest and manage telemetry from the Kubernetes cluster and the application running on it. Elastic provides out-of-the-box dashboards to help with tracking metrics, log management and analytics, APM functionality (which also supports native OpenTelemetry), and the ability to analyze everything with AIOps features and machine learning (ML). While you can use pre-existing ML models in Elastic, out-of-the-box AIOps features, or your own ML models, there is a need to dig deeper into the root cause of an issue.

Elastic helps reduce the operational work to support more efficient operations, but users still need a way to investigate and understand everything from the cause of an issue to the meaning of specific error messages. As an operations user, if you haven’t run into a particular error before or it's part of some runbook, you will likely go to Google and start searching for information.

OpenAI’s ChatGPT is becoming an interesting generative AI tool that helps provide more information using the models behind it. What if you could use OpenAI to obtain deeper insights (even simple semantics) for an error in your production or development environment? You can easily tie Elastic to OpenAI’s API to achieve this.

Kubernetes, a mainstay in most deployments (on-prem or in a cloud service provider) requires a significant amount of expertise — even if that expertise is to manage a service like GKE, EKS, or AKS.

In this blog, I will cover how you can use Elastic’s watcher capability to connect Elastic to OpenAI and ask it for more information about the error logs Elastic is ingesting from a Kubernetes cluster(s). More specifically, we will use Azure’s OpenAI Service. Azure OpenAI is a partnership between Microsoft and OpenAI, so the same models from OpenAI are available in the Microsoft version.

While this blog goes over a specific example, it can be modified for other types of errors Elastic receives in logs. Whether it's from AWS, the application, databases, etc., the configuration and script described in this blog can be modified easily.

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up the configuration:

Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
We used a GCP GKE Kubernetes cluster, but you can use any Kubernetes cluster service (on-prem or cloud based) of your choice.
We’re also running with a version of the OpenTelemetry Demo. Directions for using Elastic with OpenTelemetry Demo are here.
We also have an Azure account and Azure OpenAI service configured. You will need to get the appropriate tokens from Azure and the proper URL endpoint from Azure’s OpenAI service.
We will use Elastic’s dev tools, the console to be specific, to load up and run the script, which is an Elastic watcher.
We will also add a new index to store the results from the OpenAI query.

Here is the configuration we will set up in this blog:

As we walk through the setup, we’ll also provide the alternative setup with OpenAI versus Azure OpenAI Service.

Setting it all up

Over the next few steps, I’ll walk through:

Getting an account on Elastic Cloud and setting up your K8S cluster and application
Gaining Azure OpenAI authorization (alternative option with OpenAI)
Identifying Kubernetes error logs
Configuring the watcher with the right script
Comparing the output from Azure OpenAI/OpenAI versus ChatGPT UI

Step 0: Create an account on Elastic Cloud

Follow the instructions to get started on Elastic Cloud.

Once you have the Elastic Cloud login, set up your Kubernetes cluster and application. A complete step-by-step instructions blog is available here. This also provides an overview of how to see Kubernetes cluster metrics in Elastic and how to monitor them with dashboards.

Step 1: Azure OpenAI Service and authorization

When you log in to your Azure subscription and set up an instance of Azure OpenAI Service, you will be able to get your keys under Manage Keys.

There are two keys for your OpenAI instance, but you only need KEY 1 .

Additionally, you will need to get the service URL. See the image above with our service URL blanked out to understand where to get the KEY 1 and URL.

If you are not using Azure OpenAI Service and the standard OpenAI service, then you can get your keys at:

**https** ://platform.openai.com/account/api-keys

You will need to create a key and save it. Once you have the key, you can go to Step 2.

Step 2: Identifying Kubernetes errors in Elastic logs

As your Kubernetes cluster is running, Elastic’s Kubernetes integration running on the Elastic agent daemon set on your cluster is sending logs and metrics to Elastic. The telemetry is ingested, processed, and indexed. Kubernetes logs are stored in an index called .ds-logs-kubernetes.container_logs-default-* (* is for the date), and an automatic data stream logs-kubernetes.container_logs is also pre-loaded. So while you can use some of the out-of-the-box dashboards to investigate the metrics, you can also look at all the logs in Elastic Discover.

While any error from Kubernetes can be daunting, the more nuanced issues occur with errors from the pods running in the kube-system namespace. Take the pod konnectivity agent, which is essentially a network proxy agent running on the node to help establish tunnels and is a vital component in Kubernetes. Any error will cause the cluster to have connectivity issues and lead to a cascade of issues, so it’s important to understand and troubleshoot these errors.

When we filter out for error logs from the konnectivity agent, we see a good number of errors.

But unfortunately, we still can’t understand what these errors mean.

Enter OpenAI to help us understand the issue better. Generally, you would take the error message from Discover and paste it with a question in ChatGPT (or run a Google search on the message).

One error in particular that we’ve run into but do not understand is:

E0510 02:51:47.138292       1 client.go:388] could not read stream err=rpc error: code = Unavailable desc = error reading from server: read tcp 10.120.0.8:46156->35.230.74.219:8132: read: connection timed out serverID=632d489f-9306-4851-b96b-9204b48f5587 agentID=e305f823-5b03-47d3-a898-70031d9f4768

The OpenAI output is as follows:

ChatGPT has given us a fairly nice set of ideas on why this rpc error is occurring against our konnectivity-agent.

So how can we get this output automatically for any error when those errors occur?

Step 3: Configuring the watcher with the right script

What is an Elastic watcher? Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Watchers are helpful for analyzing mission-critical and business-critical streaming data. For example, you might watch application logs for errors causing larger operational issues.

Once a watcher is configured, it can be:

Manually triggered
Run periodically
Created using a UI or a script

In this scenario, we will use a script, as we can modify it easily and run it as needed.

We’re using the DevTools Console to enter the script and test it out:

The script is listed at the end of the blog in the appendix. It can also be downloaded here .

The script does the following:

It runs continuously every five minutes.
It will search the logs for errors from the container konnectivity-agent.
It will take the first error’s message, transform it (re-format and clean up), and place it into a variable first_hit.

"script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"

The error message is sent into OpenAI with a query:

What are the potential reasons for the following kubernetes error:
  { { ctx.payload.second.first_hit } }

If the search yielded an error, it will proceed to then create an index and place the error message, pod.name (which is konnectivity-agent-6676d5695b-ccsmx in our setup), and OpenAI output into a new index called chatgpt_k8_analyzed.

To see the results, we created a new data view called chatgpt_k8_analyzed against the newly created index:

In Discover, the output on the data view provides us with the analysis of the errors.

For every error the script sees in the five minute interval, it will get an analysis of the error. We could alternatively also use a range as needed to analyze during a specific time frame. The script would just need to be modified accordingly.

Step 4. Output from Azure OpenAI/OpenAI vs. ChatGPT UI

As you noticed above, we got relatively the same result from the Azure OpenAI API call as we did by testing out our query in the ChatGPT UI. This is because we configured the API call to run the same/similar model as what was selected in the UI.

For the API call, we used the following parameters:

"request": {
             "method" : "POST",
             "Url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
             "headers": {"api-key" : "XXXXXXX",
                         "content-type" : "application/json"
                        },
             "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
              "connection_timeout": "60s",
               "read_timeout": "60s"
                            }

By setting the role: system with You are a helpful assistant and using the gpt-35-turbo url portion, we are essentially setting the API to use the davinci model, which is the same as the ChatGPT UI model set by default.

Additionally, for Azure OpenAI Service, you will need to set the URL to something similar the following:

https://YOURSERVICENAME.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview

If you use OpenAI (versus Azure OpenAI Service), the request call (against https://api.openai.com/v1/completions) would be as such:

"request": {
            "scheme": "https",
            "host": "api.openai.com",
            "port": 443,
            "method": "post",
            "path": "\/v1\/completions",
            "params": {},
            "headers": {
               "content-type": "application\/json",
               "authorization": "Bearer YOUR_ACCESS_TOKEN"
                        },
            "body": "{ \"model\": \"text-davinci-003\",  \"prompt\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\",  \"temperature\": 1,  \"max_tokens\": 512,     \"top_p\": 1.0,      \"frequency_penalty\": 0.0,   \"presence_penalty\": 0.0 }",
            "connection_timeout_in_millis": 60000,
            "read_timeout_millis": 60000
          }

If you are interested in creating a more OpenAI-based version, you can download an alternative script and look at another blog from an Elastic community member.

Gaining other insights beyond Kubernetes logs

Now that the script is up and running, you can modify it using different:

Inputs
Conditions
Actions
Transforms

Learn more on how to modify it here. Some examples of modifications could include:

Look for error logs from application components (e.g., cartService, frontEnd, from the OTel demo), cloud service providers (e.g., AWS/Azure/GCP logs), and even logs from components such as Kafka, databases, etc.
Vary the time frame from running continuously to running over a specific range.
Look for specific errors in the logs.
Query for analysis on a set of errors at once versus just one, which we demonstrated.

The modifications are endless, and of course you can run this with OpenAI rather than Azure OpenAI Service.

Conclusion

I hope you’ve gotten an appreciation for how Elastic Observability can help you connect to OpenAI services (Azure OpenAI, as we showed, or even OpenAI) to better analyze an error log message instead of having to run several Google searches and hunt for possible insights.

Here’s a quick recap of what we covered:

Developing an Elastic watcher script that can be used to find and send Kubernetes errors into OpenAI and insert them into a new index
Configuring Azure OpenAI Service or OpenAI with the right authorization and request parameters

Ready to get started? Sign up for Elastic Cloud and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.

Appendix

Watcher script

PUT _watcher/watch/chatgpt_analysis
{
    "trigger": {
      "schedule": {
        "interval": "5m"
      }
    },
    "input": {
      "chain": {
          "inputs": [
              {
                  "first": {
                      "search": {
                          "request": {
                              "search_type": "query_then_fetch",
                              "indices": [
                                "logs-kubernetes*"
                              ],
                              "rest_total_hits_as_int": true,
                              "body": {
                                "query": {
                                  "bool": {
                                    "must": [
                                      {
                                        "match": {
                                          "kubernetes.container.name": "konnectivity-agent"
                                        }
                                      },
                                      {
                                        "match" : {
                                          "message":"error"
                                        }
                                      }
                                    ]
                                  }
                                },
                                "size": "1"
                              }
                            }
                        }
                    }
                },
                {
                    "second": {
                        "transform": {
                            "script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"
                        }
                    }
                },
                {
                    "third": {
                        "http": {
                            "request": {
                                "method" : "POST",
                                "url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
                                "headers": {
                                    "api-key" : "XXX",
                                    "content-type" : "application/json"
                                },
                                "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
                                "connection_timeout": "60s",
                                "read_timeout": "60s"
                            }
                        }
                    }
                }
            ]
        }
    },
    "condition": {
      "compare": {
        "ctx.payload.first.hits.total": {
          "gt": 0
        }
      }
    },
    "actions": {
        "index_payload" : {
            "transform": {
                "script": {
                    "source": """
                        def payload = [:];
                        payload.timestamp = new Date();
                        payload.pod_name = ctx.payload.first.hits.hits[0]._source.kubernetes.pod.name;
                        payload.error_message = ctx.payload.second.first_hit;
                        payload.chatgpt_analysis = ctx.payload.third.choices[0].message.content;
                        return payload;
                    """
                }
            },
            "index" : {
                "index" : "chatgpt_k8s_analyzed"
            }
        }
    }
}

Additional logging resources:

Common use case examples with logs:

In this blog post, we may have used third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Screenshots of Microsoft products used with permission from Microsoft.

LLM Observability with Elastic’s Azure AI Foundry Integration

Fri, 25 Jul 2025 00:00:00 GMT

Introduction

As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM Observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Azure AI Foundry, while minimizing downtime and keeping costs in check.

Elastic is expanding support for LLM Observability with Elastic Observability's new Azure AI Foundry integration. This is now available as a tech preview on Elastic Cloud. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models, such as GPT-4, Mistral, Llama, and thousands of others from leading AI companies and from Azure available through Azure AI Foundry. The new Azure AI Foundry Integration in Elastic Observability integration offers an out-of-the-box experience by simplifying the collection of metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Azure AI Foundry.

This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.

Prerequisites

To get started with the Azure AI Foundry integration, you will need:

An account on Elastic Cloud and a deployed stack in Azure (see instructions here). Ensure you are using version 9.0.0 or higher.
An Azure account with permissions to pull the necessary data from Azure and Azure AI Foundry. See details in our documentation.

Configuring Azure AI Foundry Integration

To collect logs and metrics from Azure AI Foundry ensure you properly configure Azure logs and metrics from the following links:

Configure to receive Azure Metrics - This integration specifically collects Azure AI Foundry metrics which will come from the service, and ensure you have the client id, subscription id, and tenant id from Azure AI Foundry to collect metrics.
Configure to receive Azure Logs and more specifically ensure that you configure Azure event hub to properly allow Elastic to ingest logs. Once you have the Azure event hub information, you will need it to configure the logs section of the Azure AI Foundry Integration.

Maximize Visibility with Out-of-the-box dashboards

Azure AI Foundry integration offers rich out-of-the-box visibility into the performance and usage information of models in Azure AI Foundry, including text and image models. There are several dashboards currently available. More will be coming as the integration goes to GA.

Azure AI Foundry Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.
Azure AI Foundry Billing dashboard - which provides total costs and daily usage costs from Azure cognitive services.
Azure AI Foundry Advanced Monitoring - which focuses on logs generated by the Azure AI Foundry service when connected through the API Management Service. Provides request rate, error rate, model usage, latency, LLM prompt input, response completion.

Each dashboard provides specific insights important to SREs. Here is a quick overview of some of these insights:

Model Usage and Token Trends – Visualize token consumption and completion counts by model, endpoint, and time window.
Latency Metrics – Monitor average and percentile latency per prompt, per endpoint, and correlate with prompt types or user IDs.
Cost Estimation – Estimate API usage cost based on token consumption and model pricing.
Prompt/Completion Logging – View prompt-response pairs for debugging and quality monitoring.
Content Filtering and Guardrails – See which prompts or completions are being filtered, and why.

You can drill into specific users or sessions, slice by model type or region, and export reports for usage reviews or compliance.

Try it out today

The Azure AI Foundry Integration is currently available in Elastic Cloud (both serverless and hosted options). Sign up for a 7 day trial by signing up to Elastic Cloud directly or through Azure Marketplace. Alternatively you can also deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from Azure Marketplace then spin up the new technical preview of Azure AI Foundry integration, open the curated dashboards in Kibana and start monitoring your Azure AI Foundry service!

Optimizing Spend and Content Moderation on Azure OpenAI with Elastic

Tue, 13 May 2025 00:00:00 GMT

In a previous blog we showed you how to set up observability for your models hosted on Azure OpenAI using Elastic’s integration. We’ve expanded the integration to also include Azure OpenAI content filtering, and cost analysis for Azure OpenAI. If you previously onboarded the Azure OpenAI integration, just upgrade it and you will automatically get all new features we discuss in this blog. The enhanced integration now provides multiple dashboards including a general Azure OpenAI Overview, Azure Provisioned Throughput Unit dashboard, Azure Content filtering, and a dashboard for Azure OpenAI billing.

In this blog we will cover how to use Azure OpenAI Content Filtering and tracking Azure OpenAI usage costs. Let’s first review what these two capabilities from Azure OpenAI enable you to do:

Azure OpenAI Content Filtering: Enhancing AI Safety

Content filtering for Azure OpenAI plays a critical role in addressing AI safety challenges by helping to mitigate the risks associated with harmful or inappropriate content generated by AI models. By implementing robust content filtering mechanisms, organizations can proactively identify and filter out potentially harmful content, such as hate speech, misinformation, or violent imagery, before it is disseminated to users. This helps prevent the spread of harmful content and reduces the potential negative impact on individuals and communities.

Monitoring Azure OpenAI content filtering is essential for staying proactive in addressing emerging content moderation challenges. By closely monitoring the system, businesses can quickly detect any new types of harmful content or patterns of misuse that may arise. This enables organizations to stay ahead of potential content moderation issues and take timely action to protect their users and uphold their brand reputation.

Tracking Azure OpenAI Usage Costs

Monitoring Azure OpenAI model usage costs is crucial for managing budget and resource allocation effectively. By keeping track of usage costs, organizations can optimize their operations to avoid unnecessary expenses and ensure that they are getting the best value from their investment in AI technologies. Additionally, it helps in forecasting future expenses and aids in scaling resources according to the demand without compromising performance or incurring excessive costs. Effective monitoring also allows for transparency and accountability, enabling better decision-making in terms of AI deployment and utilization within Azure environments.

As we walk through this blog, we will provide you with prerequisites to set up and use the pre-configured dashboards for both of these capabilities, which are part of the Azure OpenAI integration.

Prerequisites

In order to follow along in this blog you will have to

Set up and install the Azure billing integration to monitor the usage costs. Once the integration is installed, you can track the usage in the enhanced Azure OpenAI Billing dashboard.
Additionally, make sure you have enabled the Azure API Management service to access the Azure OpenAI models.

How to Use Azure API Management with Azure OpenAI:

Provision an Azure OpenAI resource: Create an Azure OpenAI resource and select a model for your application.
Create an API Management instance: Establish an Azure API Management instance to manage the Azure OpenAI APIs.
Import the Azure OpenAI API: Import the Azure OpenAI API into your API Management instance using its OpenAPI specification.
Configure Policies: Implement policies in API Management to manage request authentication, rate limiting, traffic shaping, and more.

Steps to create a content filter for Azure OpenAI

Before you set up observability for the content filtering, ensure that you have configured the Azure content filtering for your model. Follow the steps below to create an Azure OpenAI content filtering,

Access the Azure OpenAI service console:
- Sign in to the Azure Console with the appropriate permissions and navigate to the Azure OpenAI service console.
Navigate to Safety + security:
- From the left-hand menu, select Safety + security.
Create a New Content filter:
- Select Create content filter.
- Configure various content filter policies including the following
  - Set input filter: Content will be annotated by category and blocked according to the threshold you set for prompts.
  - Set output filter: Content will be annotated by category and blocked according to the threshold you set for response output.
  - Blocklists: Define specific words or phrases to block.
  - Deployments: Apply filters to model deployments.
Review and Create:
- Review your settings and select Create to finalize the content filter configurations.

Customers can also configure content filters and create custom safety policies that are tailored to their use case requirements. The configurability feature allows customers to adjust the settings, separately for prompts and completions, to filter content for each content category at different severity levels.

Content filter types

The content filtering categories,
- (hate, sexual, violence, self-harm)
- Other optional classification models aimed at detecting jailbreak risk and known content for text and code.
Severity level within each content filter category,
- (low, medium, high)
- Content detected at the 'safe' severity level is labeled in annotations but isn't subject to filtering and isn't configurable.

Understanding the pre-configured dashboard for Azure OpenAI Content Filtering

Now that you have set up the filter, you can see what is being filtered in Elastic through the Azure OpenAI content filtering dashboard.

Navigate to the Dashboard Menu – Select the Dashboard menu option in Elastic and search for [Azure OpenAI] Content Filtering Overview to open the dashboard.
Navigate to the Integrations Menu – Open the Integrations menu in Elastic, select Azure OpenAI, go to the Assets tab, and choose [Azure OpenAI] Content Filtering Overview from the dashboard assets.

The Azure OpenAI Content Filtering Overview dashboard in the Elastic integration provides insights into blocked requests, API latency, error rates. This dashboard also provides detailed breakdown of content being filtered by the content filtering policy.

Content Filter overview

When the content filtering system detects harmful content, you receive either an error on the API call if the prompt was deemed inappropriate, or the finish_reason on the response will be content_filter to signify that some of the completion was filtered.

This can be summarized as,

Prompt filters: The prompt content that is classified in the filtered category will return HTTP 400 error.
Non-streaming completion: When the content is filtered, non-streaming completions calls won't return any content. In rare cases with longer responses, a partial result can be returned. In these cases, the finish_reason is updated.
Streaming completion: For streaming completions calls, segments are returned back to the user as they're completed. The service continues streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.

Prompt and response where content has been blocked

This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding completion response. The panel below gives a view on the responses after applying content filtering policy for prompts and completions.

You can use the following code snippet to start integrating your current prompt and settings into your application to test the content filter:

chat_prompt = [
   {
       "role": "user",
       "content": "How to kill a mocking bird?"
   }
]

After running the code, you can find the content being filtered by violence category with the severity level medium.

Content filtered by content source (Input & Output)

The content filtering system helps monitor and moderate different categories of content based on severity levels. The categories typically include things like adult content, offensive language, hate speech, violence, and more. The severity levels indicate the degree of sensitivity or potential harm associated with the content. This panel helps the user to effectively monitor and filter out inappropriate or harmful content to maintain a safe environment.

These metrics can be categorized into the following groups:

Blocked requests by category: Provides insights into the total blocked requests by category.
Severity distribution by categories: Monitors the blocked requests by categories and severity distribution. The severity distribution may be either low, medium or high.
Content filtered categories: Provides insights into the content filtered categories over time.

Reviewing the Azure OpenAI Billing dashboard

You can now look at what you are spending on Azure OpenAI.

Here is what you see on this dashboard:

Total costs: This measures the total usage cost across all the model deployments.
Overall Usage by model: This tracks the total usage costs broken down by model.
Daily usage: Monitors usage costs on a daily basis.
Daily usage costs by model: Monitors daily usage costs broken down by model deployments.

Conclusion

The Azure OpenAI integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Azure OpenAI along with content filtered responses. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.

Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!

LLM Observability with Elastic: Azure OpenAI Part 2

Fri, 23 Aug 2024 00:00:00 GMT

We recently announced GA of the Azure OpenAI integration. You can find details in our previous blog LLM Observability: Azure OpenAI.

Since then, we have added further capabilities to the Azure OpenAI GA package, which now offer prompt and response monitoring, PTU deployment performance tracking, and billing insights. Read on to learn more!

Advanced Logging and Monitoring

The initial GA release of the integration focused mainly on the native logs, to track the telemetry of the service by using cognitive services logging. This version of the Azure OpenAI integration allows you to process the advanced logs which gives a more holistic view of OpenAI resource usage.

To achieve this, you have to setup API Management services in Azure. The API Management service is a centralized place where you can put all OpenAI services endpoints to manage all of them end-to-end. Enable the API Management services and configure the Azure event hub to stream the logs.

To learn more about setting up the API Management service to access Azure OpenAI, please refer to the Azure documentation.

By using advanced logging, you can collect the following log data:

Request input text
Response output text
Content filter results
Usage Information
- Input prompt tokens
- Output completion tokens
- Total tokens

Azure OpenAI integration now collects the API Management Gateway logs. When a question from the user goes to the API Management, it logs the questions and the responses from the GPT models.

Here’s what a sample log looks like,

Content filtered results

Azure OpenAI’s content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. With Azure OpenAI model deployments, you can use the default content filter or create your own content filter.

Now, The integration collects the content filtered result logs. In this example let's create a custom filter in the Azure OpenAI Studio that generates an error log.

By leveraging the Azure Content Filters, you can create your own custom lists of terms or phrases to block or flag.

And the document ingested in Elastic would look like this: This screenshot provides insights into the content filtered request.

PTU Deployment Monitoring

Provisioned throughput units (PTU) are units of model processing capacity that you can reserve and deploy for processing prompts and generating completions.

The curated dashboard for PTU Deployment gives comprehensive visibility into metrics such as request latency, active token usage, PTU utilization, and fine-tuning activities, offering a quick snapshot of your deployment's health and performance.

Here are the essential PTU metrics captured by default:

Time to Response: Time taken for the first response to appear after a user send a prompt.
Active Tokens: Use this metric to understand your TPS or TPM based utilization for PTUs and compare to the benchmarks for target TPS or TPM scenarios.
Provision-managed Utilization V2: Provides insights into utilization percentages, helping prevent overuse and ensuring efficient resource allocation.
Prompt Token Cache Match Rate: The prompt token cache hit ratio expressed as a percentage.

Using Billing for cost

Using the curated overview dashboard you can now monitor the actual usage cost for the AI applications. You are one step away from processing the billing information.

You need to configure and install the Azure billing metrics integration. Once the installation is complete the usage cost is visualized for the cognitive services in the Azure OpenAI overview dashboard.

Try it out today

Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!

LLM Observability: Azure OpenAI

Mon, 24 Jun 2024 00:00:00 GMT

We are excited to announce the general availability of the Azure OpenAI Integration that provides comprehensive Observability into the performance and usage of the Azure OpenAI Service! Also look at Part 2 of this blog

While we have offered visibility into LLM environments for a while now, the addition of our Azure OpenAI integration enables richer out-of-the-box visibility into the performance and usage of your Azure OpenAI based applications, further enhancing LLM Observability.

The Azure OpenAI integration leverages Elastic Agent’s Azure integration capabilities to collect both logs (using Azure EventHub) and metrics (using Azure Monitor) to provide deep visibility on the usage of the Azure OpenAI Service.

The integration includes an out-of-the-box dashboard that summarizes the most relevant aspects of the service usage, including request and error rates, token usage and chat completion latency.

Creating Alerts and SLOs to monitor Azure OpenAI

As with every other Elastic integration, all the logs and metrics information is fully available to leverage in every capability in Elastic Observability, including SLOs, alerting, custom dashboards, in-depth logs exploration, etc.

To create an alert to monitor token usage, for example, start with the Custom Threshold rule on the Azure OpenAI datastream and set an aggregation condition to track and report violations of token usage past a certain threshold.

When a violation occurs, the Alert Details view linked in the alert notification for that alert provides rich context surrounding the violation, such as when the violation started, its current status, and any previous history of such violations, enabling quick triaging, investigation and root cause analysis.

Similarly, to create an SLO to monitor error rates in Azure OpenAI calls, start with the custom query SLI definition adding in the good events to be any result signature at or above 400 over a total value that includes all responses. Then, by setting an appropriate SLO target such as 99%, start monitoring your Azure OpenAI error rate SLO over a period of 7, 30, or 90 days to track degradation and take action before it becomes a pervasive problem.

Please refer to the User Guide to learn more and to get started!

Trace your Azure Function application with Elastic Observability

Tue, 16 May 2023 00:00:00 GMT

Adoption of Azure Functions in cloud-native applications on Microsoft Azure has been increasing exponentially over the last few years. Serverless functions, such as the Azure Functions, provide a high level of abstraction from the underlying infrastructure and orchestration, given these tasks are managed by the cloud provider. Software development teams can then focus on the implementation of business and application logic. Some additional benefits include billing for serverless functions based on the actual compute and memory resources consumed, along with automatic on-demand scaling.

While the benefits of using serverless functions are manifold, it is also necessary to make them observable in the wider end-to-end microservices architecture context.

Elastic Observability (APM) for Azure Functions: The architecture

Elastic Observability 8.7 introduced distributed tracing for Microsoft Azure Functions — available for the Elastic APM Agents for .NET, Node.js, and Python. Auto-instrumentation of HTTP requests is supported out-of-the-box, enabling the detection of performance bottlenecks and sources of errors.

The key components of the solution for observing Azure Functions are:

The Elastic APM Agent for the relevant language
Elastic Observability

The APM server validates and processes incoming events from individual APM Agents and transforms them into Elasticsearch documents. The APM Agent provides auto-instrumentation capabilities for the application being observed. The Node.js APM Agent can trace function invocations in an Azure Functions app.

Setting up Elastic APM for Azure Functions

To demonstrate the setup and usage of Elastic APM, we will use a sample Node.js application.

Application overview

The Node.js application has two HTTP-triggered functions named "Hello" and "Goodbye." Once deployed, they can be called as follows, and tracing data will be sent to the configured Elastic Observability deployment.

curl -i https://.azurewebsites.net/api/hello
curl -i https://.azurewebsites.net/api/goodbye

Setup

Step 0. Prerequisites

To run the sample application, you will need:

An installation of Node.js (v14 or later)
Access to an Azure subscription with an appropriate role to create resources
The Azure CLI (az) logged into an Azure subscription
1. Use az login to login
2. See the output of az account show
The Azure Functions Core Tools (func) (func --version should show a 4.x version)
An Elastic Observability deployment to which monitoring data will be sent
1. The simplest way to get started with Elastic APM Microsoft Azure is through Elastic Cloud. Get started with Elastic Cloud on Azure Marketplace or sign up for a trial on Elastic Cloud.
The APM server URL (serverUrl) and secret token (secretToken) from your Elastic stack deployment for configuration below
1. How to get the serverUrl and secretToken documentation

Step 1. Clone the sample application repo and install dependencies

git clone https://github.com/elastic/azure-functions-apm-nodejs-sample-app.git
cd azure-functions-apm-nodejs-sample-app
npm install

Step 2. Deploy the Azure Function App
Caution icon! Deploying a function app to Azure can incur costs. The following setup uses the free tier of Azure Functions. Step 5 covers the clean-up of resources.

Step 2.1
To avoid name collisions with others that have independently run this demo, we need a short unique identifier for some resource names that need to be globally unique. We'll call it the DEMO_ID. You can run the following to generate one and save it to DEMO_ID and the "demo-id" file.

if [[ ! -f demo-id ]]; then node -e 'console.log(crypto.randomBytes(3).toString("hex"))' >demo-id; fi
export DEMO_ID=$(cat demo-id)
echo $DEMO_ID

Step 2.2
Before you can deploy to Azure, you will need to create some Azure resources: a Resource Group, Storage Account, and the Function App. For this demo, you can use the following commands. (See this Azure docs section for more details.)

REGION=westus2   # Or use another region listed in 'az account list-locations'.
az group create --name "AzureFnElasticApmNodeSample-rg" --location "$REGION"
az storage account create --name "eapmdemostor${DEMO_ID}" --location "$REGION" \
    --resource-group "AzureFnElasticApmNodeSample-rg" --sku Standard_LRS
az functionapp create --name "azure-functions-apm-nodejs-sample-app-${DEMO_ID}" \
    --resource-group "AzureFnElasticApmNodeSample-rg" \
    --consumption-plan-location "$REGION" --runtime node --runtime-version 18 \
    --functions-version 4 --storage-account "eapmdemostor${DEMO_ID}"

Step 2.3
Next, configure your Function App with the APM server URL and secret token for your Elastic deployment. This can be done in the Azure Portal or with the az CLI.

In the Azure portal, browse to your Function App, then its Application Settings (Azure user guide). You'll need to add two settings:

First set your APM URL and token.

export ELASTIC_APM_SERVER_URL=""
export ELASTIC_APM_SECRET_TOKEN=""

Or you can use the az functionapp config appsettings set ... CLI command as follows:

az functionapp config appsettings set \
  -g "AzureFnElasticApmNodeSample-rg" -n "azure-functions-apm-nodejs-sample-app-${DEMO_ID}" \
  --settings "ELASTIC_APM_SERVER_URL=${ELASTIC_APM_SERVER_URL}"
az functionapp config appsettings set \
  -g "AzureFnElasticApmNodeSample-rg" -n "azure-functions-apm-nodejs-sample-app-${DEMO_ID}" \
  --settings "ELASTIC_APM_SECRET_TOKEN=${ELASTIC_APM_SECRET_TOKEN}"

The ELASTIC_APM_SERVER_URL and ELASTIC_APM_SECRET_TOKEN are set in Azure function’s settings for the app and used by the Elastic APM Agent. This is initiated by the initapm.js file, which starts the Elastic APM agent with:

require("elastic-apm-node").start();

When you log in to Azure and look at the function’s configuration, you will see them set:

Step 2.4
Now you can publish your app. (Re-run this command every time you make a code change.)

func azure functionapp publish "azure-functions-apm-nodejs-sample-app-${DEMO_ID}"

You should log in to Azure to see the function running.

Step 3. Try it out

% curl https://azure-functions-apm-nodejs-sample-app-${DEMO_ID}.azurewebsites.net/api/Hello
{"message":"Hello."}
% curl https://azure-functions-apm-nodejs-sample-app-${DEMO_ID}.azurewebsites.net/api/Goodbye
{"message":"Goodbye."}

In a few moments, the APM app in your Elastic deployment will show tracing data for your Azure Function app.

Step 4. Apply some load to your app
To get some more interesting data, you can run the following to generate some load on your deployed function app:

npm run loadgen

This uses the autocannon node package to generate some light load (2 concurrent users, each calling at 5 requests/s for 60s) on the "Goodbye" function.

Step 5. Clean up resources
If you deployed to Azure, you should make sure to delete any resources so you don't incur any costs.

az group delete --name "AzureFnElasticApmNodeSample-rg"

Analyzing Azure Function APM data in Elastic

Once you have successfully set up the sample application and started generating load, you should see APM data appearing in the Elastic Observability APM Services capability.

Service map

With the default setup, you will see two services in the APM Service map.

The main function: azure-functions-apm-nodejs-sample-app

And the end point where your function is accessible: azure-functions-apm-nodejs-sample-app-ec7d4c.azurewebsites.net

You will see that there is a connection between the two as your application is taking requests and answering through the endpoint.

From the APM Service map you can further investigate the function, analyze traces, look at logs, and more.

Service details

When we dive into the details, we can see several items.

Latency for the recent load we ran against the application
Transactions (Goodbye and Hello)
Average throughput
And more

Transaction details

We can see transaction details.

An individual trace shows us that the "Goodbye" function calls the "Hello" function in the same function app before returning:

Machine learning based latency correlation

As we’ve mentioned in other blogs, we can also correlate issues such as higher than normal latency. Since we see a spike at 1s, we run the embedded latency correlation, which uses machine learning to help analyze the potential impacting component by analyzing logs, metrics, and traces.

The correlation indicated there is a potential cause (25%) due to the host sending the load (my machine).

Cold start detection

Also, we can see the impact a cold start can have on the latency of a request:

Summary

Elastic Observability provides real-time monitoring of Azure Functions in your production environment for a broad range of use cases. Curated dashboards assist DevOps teams in performing root cause analysis for performance bottlenecks and errors. SRE teams can quickly view upstream and downstream dependencies, as well as perform analyses in the context of distributed microservices architecture.

Learn more

To learn how to add the Elastic APM Agent to an existing Node.js Azure Function app, read Monitoring Node.js Azure Functions. Additional resources include: