<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Azure</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Mon, 18 May 2026 18:13:10 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Azure</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[Bringing observability insights from Elastic AI Assistant to the world of GitHub Copilot]]></title>
            <link>https://www.elastic.co/observability-labs/blog/ai-assistant-to-github-copilot</link>
            <guid isPermaLink="false">ai-assistant-to-github-copilot</guid>
            <pubDate>Thu, 23 May 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[GitHub announced GitHub Copilot Extensions this week at Microsoft Build. We are working with the GitHub team to bring observability insights from Elastic AI Assistant to GitHub Copilot users.]]></description>
            <content:encoded><![CDATA[<p>GitHub <a href="https://github.blog/2024-05-21-introducing-github-copilot-extensions/">announced</a> GitHub Copilot Extensions this week at Microsoft Build. We are working with the GitHub team in the Limited Beta Program to explore bringing observability insights from Elastic AI Assistant to GitHub Copilot users.</p>
<p>Elastic’s GitHub Copilot Extension aims to combine the capabilities of GitHub Copilot and Elastic AI Assistant for Observability. This could enable developers to access critical insights from Elastic AI Assistant from GitHub Copilot Chat on GitHub.com, Visual Studio, GitHub.com, Visual Studio, and VS Code - places where they write their code.</p>
<p>Developers will be able ask questions such as</p>
<ul>
<li>What errors are active?</li>
<li>What’s the latest stacktrace for my application?</li>
<li>What caused a slowdown in the application after the last push to the dev environment?</li>
<li>How to write an ES|QL for query that my app will send to Elasticsearch?</li>
<li>What runbook from Github has been loaded into Elasticsearch and is related to the issue I’m investigating
And many more!</li>
</ul>
<p><a href="https://build.microsoft.com/en-US/sessions/acc48a7a-b412-4b4f-88a6-53ef4b2cb2bc?source=/schedule">Watch Jeff's PoC Demo@Microsoft Build 2024</a></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/ai-assistant-to-github-copilot/elastic-copilot-vscode.png" alt="Elastic's Copilot Extension in VSCode" /></p>
<p><em>Elastic AI Assistant surfaced in GitHub Copilot Chat from our Extension (Proof of Concept)</em></p>
<h2>What is the Elastic AI Assistant for Observability</h2>
<p>The Elastic Observability AI Assistant for Observability, a user-centric tool, is a game-changer in providing contextual insights and streamlining troubleshooting within the Elastic Observability environment. By harnessing generative AI capabilities, the assistant offers open prompts that decipher error messages and propose remediation actions. It adopts a Retrieval-Augmented Generation (RAG) approach to fetch the most pertinent internal information, such as APM traces, log messages, SLOs, GitHub issues, runbooks, and more. This contextual assistance is a huge leap forward for Site Reliability Engineers (SREs) and operations teams, offering immediate, relevant solutions to issues based on existing documentation and resources, boosting developer productivity.</p>
<p>For more information on setting up and using the AI Assistant for Observability check out the blog <a href="https://www.elastic.co/observability-labs/blog/elastic-ai-assistant-observability-microsoft-azure-openai">Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI</a>. Additionally, learn how <a href="https://www.elastic.co/observability-labs/blog/elastic-rag-ai-assistant-application-issues-llm-github">Elastic Observability AI Assistant uses RAG to help analyze application issues with GitHub issues</a>.</p>
<p>One unique feature of the AI Assistant is its API support. This allows you to take advantage of all the capabilities provided by the Elastic AI Assistant, and integrate them right into your workflow.</p>
<h2>What is a GitHub Copilot Extension</h2>
<p>GitHub Copilot Extensions, a new addition to GitHub Copilot, revolutionizes the developer experience by integrating a diverse array of tools and services directly into the developer's workflow. These unique extensions, crafted by partners, enable developers to interact with various services and tools using natural language within their Integrated Development Environment (IDE) or GitHub.com. This integration eliminates the need for context-switching, allowing developers to maintain their flow state, troubleshoot issues, and deploy solutions with unparalleled efficiency. These extensions will be accessible through GitHub Copilot Chat in the GitHub Marketplace, with options for organizations to create private extensions tailored to their internal tooling.</p>
<h2>What’s next</h2>
<p>We are participating in the Github Limited Beta Program as a partner and exploring the possibility of bringing Elastic GitHub Copilot Extension to the GitHub Marketplace. We are excited to unlock insights from Elastic Observability to GitHub Copilot users side by side to the code behind those services. Stay tuned!</p>
<p>Resources:</p>
<ul>
<li><a href="https://www.elastic.co/observability-labs/blog/elastic-ai-assistant-observability-microsoft-azure-openai">Getting Started with Elastic AI Assistant for Observability with Azure OpenAI</a></li>
<li><a href="https://ela.st/assistant-escapes">The Elastic AI Assistant for Observability escapes Kibana!</a></li>
<li><a href="https://www.elastic.co/observability-labs/blog/elastic-rag-ai-assistant-application-issues-llm-github">Elastic Observability AI Assistant uses RAG to help analyze application issues with GitHub issues</a></li>
<li><a href="https://www.elastic.co/observability-labs/blog/sre-troubleshooting-ai-assistant-observability-runbooks">Troubleshooting with Elastic AI Assistant using your organization's runbooks</a></li>
<li><a href="https://www.elastic.co/guide/en/observability/current/obs-ai-assistant.html">The AI Assistant Observability documentation</a></li>
<li><a href="https://github.blog/2024-05-21-introducing-github-copilot-extensions/">GitHub Copilot Extensions Blog Announcement</a></li>
<li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/esql.html">ES|QL documentation</a></li>
</ul>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/ai-assistant-to-github-copilot/githubcopilot-aiassistant-C-2x.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Debugging Azure Networking for Elastic Cloud Serverless]]></title>
            <link>https://www.elastic.co/observability-labs/blog/debugging-aks-packet-loss</link>
            <guid isPermaLink="false">debugging-aks-packet-loss</guid>
            <pubDate>Thu, 05 Jun 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how Elastic SREs uncovered and resolved unexpected packet loss in Azure Kubernetes Service (AKS), impacting Elastic Cloud Serverless performance.]]></description>
            <content:encoded><![CDATA[&lt;h2&gt; Summary of Findings &lt;/h2&gt; 
<p>Elastic's Site Reliability Engineering team (SRE) observed unstable throughput and packet loss in Elastic Cloud Serverless running on Azure Kubernetes Service (AKS). After investigation, we identified the primary contributing factors to be RX ring buffer overflows and kernel input queue saturation on SR-IOV interfaces. To address this, we increased RX buffer sizes and adjusted the netdev backlog, which significantly improved network stability.</p>
&lt;h2&gt; Setting the Scene &lt;/h2&gt;
<p><a href="https://www.elastic.co/cloud/serverless">Elastic Cloud Serverless</a> is a fully managed solution that allows you to deploy and use Elastic for your use cases without managing the underlying infrastructure. Built on Kubernetes, it represents a shift in how you interact with Elasticsearch. Instead of managing clusters, nodes, data tiers, and scaling, you create serverless projects that are fully managed and automatically scaled by Elastic. This abstraction of infrastructure decisions allows you to focus solely on gaining value and insight from your data.</p>
<p>Elastic Cloud Serverless is generally available (GA) on AWS, GCP and currently in <a href="https://www.elastic.co/guide/en/serverless/current/regions.html">Technical Preview on Azure</a>. As part of preparing Elastic Cloud Serverless GA on Azure, we have been conducting extensive performance and scalability tests to ensure that our users get a consistent and reliable user experience.</p>
<p>In this post, we’ll take you behind the scenes of a deep technical investigation into a surprising performance issue that affected Serverless Elasticsearch in our Azure Kubernetes clusters. At first, the network seemed like the least likely place to look, especially with a high-speed 100 Gb/s interface on the host backing it. But as we dug deeper, with help from the Microsoft Azure team, that’s exactly where the problem led us.</p>
&lt;h2&gt; Unexpected Results! &lt;/h2&gt;
<p>While the high-level architectures and system design patterns of the major cloud provider’s systems are often similar, the implementations are different, and these differences can have dramatic impacts on a system’s performance characteristics.</p>
<p>One of the most significant differences between the different cloud providers is that the underlying hypervisor software and server hardware of the Virtual Machines can vary significantly, even between instance families of the same provider.</p>
<p>There is no way to fully abstract the hardware away from an application like Elasticsearch. Fundamentally, its performance is dictated by the CPU, memory, disks, and network interfaces on the physical server. In preparation for the Elastic Cloud Serverless GA on Azure, our Elasticsearch Performance team kicked off large-scale load testing against Serverless Elasticsearch projects running on <a href="https://docs.azure.cn/en-us/aks/what-is-aks">Azure Kubernetes Service (AKS)</a>, using <a href="https://azure.microsoft.com/en-us/blog/azure-cobalt-100-based-virtual-machines-are-now-generally-available/">ARM-based VMs</a> (we’re big fans!). Throughout this process, we relied heavily on Elastic tools to analyse system behaviour, identify bottlenecks, and validate performance under load.</p>
<p>To perform these scale and load tests, the Elasticsearch Performance team use <a href="https://github.com/elastic/rally">Rally</a>, an open-source benchmarking tool designed to measure the performance of Elasticsearch clusters. The workload (or in Rally nomenclature, ‘Track’) used for these tests was the <a href="https://github.com/elastic/rally-tracks/tree/master/github_archive">GitHub Archive Track</a>. Rally collects and sends test telemetry using the <a href="https://www.elastic.co/docs/reference/elasticsearch/clients/python">official Python client</a> to a separate Elasticsearch cluster running <a href="https://www.elastic.co/observability">Elastic Observability</a>, which allows for monitoring and analysis during these scale and load tests in real time via <a href="https://www.elastic.co/docs/explore-analyze">Kibana</a>.</p>
<p>When we looked at the results, we observed that the indexing rate (the number of docs/s) for the Serverless projects was not only much lower than we had expected for the given hardware, but the throughput was also quite unstable. There were peaks and valleys, interspersed with frequent errors, whereas we were instead expecting a stable indexing rate for the duration of the test.</p>
<p>These tests are designed to push the system to its limits, and in doing so, they surfaced unexpected behavior in the form of unstable indexing throughput and intermittent errors. This was precisely the kind of problem we'd hoped to uncover prior to going GA — giving us the opportunity to work closely with Azure.</p>
&lt;div align=&quot;center&quot;&gt;
![Indexing Rate with Packet Loss](/assets/images/debugging-aks-packet-loss/indexing-rate-before.png)
_A Kibana visualisation of Rally telemetry, showing fluctuating Elasticsearch indexing rates alongside spikes in 5xx and 4xx HTTP error responses._
&lt;/div&gt;
&lt;h2&gt; Debugging! &lt;/h2&gt;
<p>Debugging performance issues can feel a little bit like trying to find a <a href="https://www.youtube.com/watch?v=7AO4wz6gI3Q">‘Butterfly in a Hurricane’</a>, so it’s crucial that you take a methodological approach to analysing application and system performance.</p>
<p>Using methodologies helps you to be more consistent and thorough in your debugging, and avoids missing things. We started with the <a href="https://www.brendangregg.com/usemethod.html">Utilisation Saturation and Errors (USE) Method</a>, looking at both the client and server side to identify any obvious bottlenecks in the system.</p>
<p>Elastic's Site Reliability Engineers (SREs) maintain a suite of custom <a href="https://www.elastic.co/docs/solutions/observability/get-started/what-is-elastic-observability">Elastic Observability</a> dashboards designed to visualise data collected from various <a href="https://www.elastic.co/docs/extend/integrations/what-is-an-integration">Elastic Integrations</a>. These dashboards provide deep visibility into the health and performance of Elastic Cloud infrastructure and systems.</p>
<p>For this investigation, we leveraged a custom dashboard built using metrics and log data from the <a href="https://www.elastic.co/docs/reference/integrations/system">System</a> and <a href="https://www.elastic.co/docs/reference/integrations/linux">Linux</a> Integrations:</p>
&lt;div align=&quot;center&quot;&gt;
  ![Node Overview Dashboard](/assets/images/debugging-aks-packet-loss/overview-dashboard.png)
  _One of many Elastic Observability dashboards built and maintained by the SRE team._
&lt;/div&gt;
<p>Following the USE Method, these dashboards highlight resource utilisation, saturation, and errors across our systems. With their help, we quickly identified that the AKS nodes hosting the Elasticsearch pods under test were dropping thousands of packets per second.</p>
&lt;div align=&quot;center&quot;&gt;
![Node Packet Loss Before Tuning](/assets/images/debugging-aks-packet-loss/packet-loss-before.png)
_A Kibana visualisation of [Elastic Agent's System Integration](https://www.elastic.co/docs/reference/integrations/system), showing the rate of packet drops per second for AKS nodes._
&lt;/div&gt;
<p>Dropping packets forces reliable protocols, such as TCP, to retransmit any missing packets. These retransmissions can introduce significant delays, which kills the throughput of any system where client requests are only triggered upon the previous request completion (known as a <a href="https://www.usenix.org/legacy/event/nsdi06/tech/full_papers/schroeder/schroeder.pdf">Closed System</a>).</p>
<p>To investigate further, we jumped onto one of the AKS nodes exhibiting the packet loss to check the basics. First off, we wanted to identify what type of packet drops or errors we’re seeing; is it for specific pods, or the host as a whole?</p>
<pre><code>root@aks-k8s-node-1:~# ip -s link show
2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 7c:1e:52:be:ce:5e brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast
    373507935420 134292481      0       0       0      15
    TX:    bytes   packets errors dropped carrier collsns
    644247778936 303191014      0       0       0       0
3: enP42266s1: &lt;BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP&gt; mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000
    link/ether 7c:1e:52:be:ce:5e brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast
    386782548951 307000571      0       0 5321081       0
    TX:    bytes   packets errors dropped carrier collsns
    655758630548 477594747      0       0       0       0
    altname enP42266p0s2
15: lxc0ca0ec41ecd2@if14: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:f5:5e:c9:4e:fb brd ff:ff:ff:ff:ff:ff link-netns cni-3f90ab53-df66-cac5-bd19-9cea4a68c29b
    RX:    bytes   packets errors dropped  missed   mcast
    627954576078  54297550      0    1600       0       0
    TX:    bytes   packets errors dropped carrier collsns
    372155326349 133538064      0    3927       0       0
</code></pre>
<p>In this output you can see the <code>enP42266s1</code> interface is showing a significant number of packets in the <code>missed</code> column. That’s interesting, sure, but what does missed actually represent? And what is <code>enP42266s1</code>?</p>
<p>To understand, let’s look at roughly what happens when a packet arrives at the NIC:</p>
<ol>
<li>A packet arrives at the NIC from the network.</li>
<li>The NIC uses DMA (Direct Memory Access) to place the packet into a receive ring buffer allocated in memory by the kernel, mapped for use by the NIC. Since our NICs supports multiple hardware queues, each queue has its own dedicated ring buffer, IRQ, and NAPI context.</li>
<li>The NIC raises a hardware interrupt (IRQ) to notify the CPU that a packet is ready.</li>
<li>The CPU runs the NIC driver’s IRQ handler. The driver schedules a NAPI (New API) poll to defer packet processing to a softirq context. A mechanism in the Linux kernel that defers work to be processed outside of the hard IRQ context, for better batching and CPU efficiency, enabling improved scalability.</li>
<li>The NAPI poll function is executed in a softirq context (<code>NET_RX_SOFTIRQ</code>) and retrieves packets from the ring buffer. This polling continues either until the driver’s packet budget is exhausted (<code>net.core.netdev_budget</code>) or the time limit is hit (<code>net.core.netdev_budget_usecs</code>).</li>
<li>Each packet is wrapped in an <code>sk_buff</code> (socket buffer) structure, which includes metadata such as protocol headers, timestamps, and interface identifiers.</li>
<li>If the networking stack is slower than the rate at which NAPI fetches packets, excess packets are queued in a per-CPU backlog queue (via <code>enqueue_to_backlog</code>). The maximum size of this backlog is controlled by the <code>net.core.netdev_max_backlog</code> sysctl.</li>
<li>Packets are then handed off to the kernel’s networking stack for routing, filtering, and protocol-specific processing (e.g. TCP, UDP).</li>
<li>Finally, packets reach the appropriate socket receive buffer, where they are available for consumption by the user-space application.</li>
</ol>
<p>Visualised, it looks something like this:</p>
&lt;div align=&quot;center&quot;&gt;
![Linux Packet Flow Diagram](/assets/images/debugging-aks-packet-loss/packet-flow.png)
_Image © 2018 Leandro Moreira. Used under the [BSD 3-Clause License](https://opensource.org/licenses/BSD-3-Clause). Source: [GitHub repository](https://github.com/leandromoreira/linux-network-performance-parameters)._
&lt;/div&gt;
<p>The <code>missed</code> counter is incremented whenever the NIC tries to DMA a packet into a fully occupied <a href="https://en.wikipedia.org/wiki/Circular_buffer">ring buffer</a>. The NIC essentially &quot;misses&quot; the chance to deliver the packet to the VM’s memory. However, what’s most interesting is that this counter seldom increments for VMs. This is because Virtual NICs are usually implemented as software via the hypervisor, which typically has much more flexible memory management compared to the physical NICs and can reduce the chance of ring buffer overflow.</p>
<p>We mentioned earlier that we’re building Azure Elasticsearch Serverless on top of Azure’s AKS service, which is important to note because all of our AKS nodes use an Azure feature called <a href="https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview">Accelerated Networking</a>. In this setup, network traffic is delivered directly to the VM’s network interface, bypassing the hypervisor. This is enabled by <a href="https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-">single root I/O virtualization (SR-IOV)</a>, which offers much lower latency and higher throughput than traditional VM networking. Each node is physically connected to a 100 Gb/s network interface, although the SR-IOV Virtual Function (VF) exposed to the VM typically provides only a fraction of that total bandwidth.</p>
<p>Despite the VM only having a fraction of the 100 Gb/s bandwidth, microbursts are still very possible. These physical interfaces are so fast that they can transmit and receive multiple packets in just nanoseconds, far faster than most buffers or processing queues can absorb. At these timescales, even a short-lived burst of traffic can overwhelm the receiver, leading to dropped packets and unpredictable latency.</p>
<p>Direct access to the SR-IOV interface means that our VMs are responsible for handling the hardware interrupts triggered by the NIC in a timely manner, if there's any delay in handling the hardware interrupt (e.g. waiting to be scheduled onto CPU by the hypervisor) then network packets can be missed!</p>
&lt;h2&gt; Firstly - NIC-level Tuning &lt;/h2&gt;
<p>Since we'd confirmed that our VMs were using SR-IOV, we established that the <code>enP42266s1</code> and <code>eth0</code> interfaces <a href="https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-how-it-works">were a bonded pair and acted as a single interface</a>. Knowing this, then we reasoned that we should be able to adjust the ring buffer values directly using <code>ethtool</code>.</p>
<pre><code>root@aks-k8s-node-1:~# ethtool -g enP42266s1
Ring parameters for enP42266s1:
Pre-set maximums:
RX:		8192
RX Mini:	n/a
RX Jumbo:	n/a
TX:		8192
Current hardware settings:
RX:		1024
RX Mini:	n/a
RX Jumbo:	n/a
TX:		1024
</code></pre>
<p>In the output above, we were using only 1/8th of the available ring buffer descriptors. These values were set by the OS defaults, which generally aim to balance performance and resource usage. Set too low, they risk packet drops under load; set too high, they can lead to unnecessary memory consumption. We knew that the VMs were backed by a virtual function carved out of the directly attached 100 Gb/s network interface, which is fast enough to deliver microbursts that could easily overwhelm small buffers. To better absorb those short, high-intensity bursts of traffic, we increased the NIC’s RX ring buffer size from 1024 to 8192. Using a privileged DaemonSet, we rolled out the change across all of our AKS nodes by installing <a href="https://en.wikipedia.org/wiki/Udev">a <code>udev</code> rule</a> to automatically increase the buffer size:</p>
<pre><code># Match Mellanox ConnectX network cards and run ethtool to update the ring buffer settings
ENV{INTERFACE}==&quot;en*&quot;, ENV{ID_NET_DRIVER}==&quot;mlx5_core&quot;, RUN+=&quot;/sbin/ethtool -G %k rx ${CONFIG_AZURE_MLX_RING_BUFFER_SIZE} tx ${CONFIG_AZURE_MLX_RING_BUFFER_SIZE}&quot;
</code></pre>
&lt;div align=&quot;center&quot;&gt;
![AKS Node Packet Loss after RX ring buffer change](/assets/images/debugging-aks-packet-loss/packet-loss-after.png)
_A Kibana visualisation of [Elastic Agent's System Integration](https://www.elastic.co/docs/reference/integrations/system), showing packet loss reduced by ~99% after increasing the NIC's RX ring buffer values._
&lt;/div&gt;
<p>As soon as the change had been applied to all AKS nodes we stopped ‘missing’ RX packets! Fantastic! As a result of this simple change we observed a significant improvement in our indexing throughput and stability.</p>
&lt;div align=&quot;center&quot;&gt;
![Indexing rate after RX ring buffer change](/assets/images/debugging-aks-packet-loss/indexing-rate-after.png)
_A Kibana visualisation of Rally telemetry, showing stable and improved Elasticsearch indexing rates after increasing the RX ring buffer size._
&lt;/div&gt;
<p>Job done, right? Not quite..</p>
&lt;h2&gt; Further improvements - Kernel-level Tuning &lt;/h2&gt;
<p>Eagle eyed readers may have noticed two things:</p>
<ol>
<li>In the previous screenshot, despite adjusting the physical RX ring buffer values, we still observed a small number of <code>dropped</code> packets on the TX side.</li>
<li>In the original <code>ip link -s show</code> output, one of the ‘logical’ interfaces used by the Elasticsearch pod was showing <code>dropped</code> packets on both the TX and RX sides.</li>
</ol>
<pre><code>15: lxc0ca0ec41ecd2@if14: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:f5:5e:c9:4e:fb brd ff:ff:ff:ff:ff:ff link-netns cni-3f90ab53-df66-cac5-bd19-9cea4a68c29b
    RX:    bytes   packets errors dropped  missed   mcast
    627954576078  54297550      0    1600       0       0
    TX:    bytes   packets errors dropped carrier collsns
    372155326349 133538064      0    3927       0       0
</code></pre>
<p>So, we continued to dig. We’d eliminated ~99% of the packet loss, and the remaining loss rate wasn’t as significant as what we’d started with, but we still wanted to understand why it was occurring even after adjusting the RX ring buffer size of the NIC.</p>
<p>So what does <code>dropped</code> represent, and what is this <code>lxc0ca0ec41ecd2</code> interface? <code>dropped</code> is similar to <code>missed</code>, but only occurs when packets are deliberately dropped by the kernel or network interface. Crucially though, it doesn’t tell you why a packet was dropped. As for the <code>lxc0ca0ec41ecd2</code> interface, we use the <a href="https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium">Azure CNI Powered by Cilium</a> to provide the network functionality to our AKS clusters. Any pod spun up on an AKS node gets a ‘logical’ interface, which is a virtual ethernet (<code>veth</code>) pair that connects the pod’s network namespace with the host’s network namespace. It was here that we were dropping packets.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/debugging-aks-packet-loss/aks-node-network-topology.png" alt="AKS Node Networking Diragram" /></p>
<p>In our experience, packet drops at this layer are unusual, so we started digging deeper into the cause of the drops. There are numerous ways you can debug why a packet is being dropped, but one of the easiest is <a href="https://perfwiki.github.io/main/">to use <code>perf</code></a> attach to the <code>skb:kfree_skb</code> tracepoint. The &quot;socket buffer&quot; (<code>skb</code>) is the primary data structure used to represent network packets in the Linux kernel. When a packet is dropped, its corresponding socket buffer is usually freed, triggering the <code>kfree_skb</code> tracepoint. Using <code>perf</code> to attach to this event allowed us to capture stack traces to analyze the cause of the drops.</p>
&lt;div align=&quot;center&quot;&gt;
```
# perf record -g -a -e skb:kfree_skb
```
&lt;/div&gt;
<p>We left this to run for ~10 minutes or so to capture as many drops as possible, and then ‘heavily inspired’ by <a href="https://gist.github.com/bobrik/0e57671c732d9b13ac49fed85a2b2290">this GitHub Gist by Ivan Babrou</a>, we converted the stack traces into an ‘easier’ to read <a href="https://github.com/brendangregg/FlameGraph">Flamegraphs</a>:</p>
<pre><code># perf script | sed -e 's/skb:kfree_skb:.*reason:\(.*\)/\n\tfffff \1 (unknown)/' -e 's/^\(\w\+\)\s\+/kernel /' &gt; stacks.txt
cat stacks.txt | stackcollapse-perf.pl --all | perl -pe 's/.*?;//' | sed -e 's/.*irq_exit_rcu_\[k\];/irq_exit_rcu_[k];/' | flamegraph.pl --colors=java --hash --title=aks-k8s-node-1 --width=1440 --minwidth=0.005 &gt; aks-k8s-node-1.svg
</code></pre>
&lt;div align=&quot;center&quot;&gt;
![AKS Node Packet Loss Flamegraph](/assets/images/debugging-aks-packet-loss/aks-packet-loss-flamegraph.png)
_A Flamegraph showing the various stack trace ancestry of packet loss._
&lt;/div&gt;
<p>The flamegraph here shows how often different functions appeared in stack traces for packets drops. Each box represents a function call and wider boxes mean the function appears more frequently in the traces. The stack's ancestry builds upward from the bottom with earlier calls, to the top with later calls.</p>
<p>Firstly, we quickly discovered that unfortunately the <code>skb_drop_reason</code> enum <a href="https://github.com/torvalds/linux/commit/c504e5c2f9648a1e5c2be01e8c3f59d394192bd3">was only added in Kernel 5.17</a> (Azure’s Node Image at the time was using 5.15). This meant that there was no single human readable message that told us why the packets were being dropped, instead all we got was <code>NOT_SPECIFIED</code>. To work out why packets were being dropped we needed to do a little sleuthing through the stack traces to work out what code paths were being taken when a packet was dropped.</p>
<p>In the flamegraph above you can see that many of the stack traces include <code>veth</code> driver function calls (e.g. <code>veth_xmit</code>), and many end abruptly with a call to the <code>enqueue_to_backlog</code> function. When many stacks end at the same function (like <code>enqueue_to_backlog</code>) it suggests that function is a common point where packets are being dropped. If you go back to the earlier explanation of what happens when a packet arrives at the NIC, you’ll notice that in step 7 we explained:</p>
<blockquote>
<p><em>7. If the networking stack is slower than the rate at which NAPI fetches packets, excess packets are queued in a per-CPU backlog queue (via <code>enqueue_to_backlog</code>). The maximum size of this backlog is controlled by the <code>net.core.netdev_max_backlog</code> sysctl.</em></p>
</blockquote>
<p>Using the same privileged DaemonSet method for the RX ring buffer adjustment, we set the value of the <code>net.core.netdev_max_backlog</code> adjustable kernel parameter from 1000 to 32768:</p>
<pre><code>/usr/sbin/sysctl -w net.core.netdev_max_backlog=32768
</code></pre>
<p>This value was based on the fact we knew the hosts were using a 100 Gb/s SR-IOV NIC, even if the VM was allowed only a fraction of the total bandwidth. We acknowledge that it’s worth revisiting this value in the future to see if it can be better optimised to not waste extraneous memory, but at the time “perfect was the enemy of good”.</p>
<p>We re-ran the load tests and compared the three sets of results we’d collected thus far.</p>
&lt;div align=&quot;center&quot;&gt;
![Final Indexing Rate Results](/assets/images/debugging-aks-packet-loss/indexing-rate-final.png)
_A Kibana visualisation of Rally results, comparing impact to median throughput after each configuration change._
&lt;/div&gt;
<table>
<thead>
<tr>
<th>Tuning Step</th>
<th>Packet Loss</th>
<th>Median indexing throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>Baseline</td>
<td>High</td>
<td>~18,000 docs/s</td>
</tr>
<tr>
<td>+RX Buffer</td>
<td>~99% drop ↓</td>
<td>~26,000 (+ ~40% from baseline)</td>
</tr>
<tr>
<td>+Backlog &amp; +RX Buffer</td>
<td>Near zero</td>
<td>~29,000 (+ ~60% from baseline)</td>
</tr>
</tbody>
</table>
<p>Here you can see the P50 of throughput in docs/s over the course of the hours-long load tests. Compared to the baseline, we saw a roughly <strong>~40%</strong> increase in throughput by only adjusting the RX ring buffer values, and a <strong>~50-60%</strong> increase with both the RX ring buffer and backlog changes! Hooray!</p>
<p>A great result and one more step on our journey towards better Serverless Elasticsearch performance.</p>
&lt;h2&gt; Working with Azure &lt;/h2&gt;
<p>It’s great that we were able to quickly identify and mitigate the majority of our packet loss issues, but since we were using AKS with AKS node images, it made sense to engage with Azure to understand why the defaults weren’t working for our workload.</p>
<p>We walked Azure through our investigation, mitigations and results, and asked for some additional validation of our mitigations. Azure Engineering confirmed that the host NICs were not discarding packets, which confirmed that everything arriving at the host level was passed through to the hypervisor on the host. Further investigation confirmed that no loss or discards were occurring to Azure network fabric, or internal to the hypervisor – which shifted focus from the host to the guest OS and why the guest OS kernel was slow when reading packets off of the <code>enP*</code> SR-IOV interfaces.</p>
<p>Given the complexity of our load testing scenario — which involved configuring multiple systems and tools, including <a href="https://www.elastic.co/observability">Elastic Observability</a>, we also developed a simplified reproduction of the packet loss issue using <a href="https://github.com/esnet/iperf"><code>iperf3</code></a>. This simplified test was created specifically to share with Azure for targeted analysis, and added to the broader monitoring and analysis enabled by Elastic Observability and Rally.</p>
<p>With this reproduction Azure was able to confirm the increasing <code>missed</code> and <code>dropped</code> packet counters we had observed, and confirmed the increased RX ring buffer and <code>netdev_max_backlog</code> increase as the recommended mitigations.</p>
&lt;h2&gt; Conclusion &lt;/h2&gt;
<p>While cloud providers offer various abstractions to manage your resources, the underlying hardware ultimately determines your application's performance and stability. High-performance hardware often requires tuning at the operating system level, well beyond the default settings most environments ship with. In managed platforms like AKS, where Azure controls both the node images and infrastructure, it is easy to overlook the impact of low-level configurations such as network device ring buffer sizes or sysctls like <code>net.core.netdev_max_backlog</code>.</p>
<p>Our experience shows that even with the convenience of a managed Kubernetes service, performance issues can still emerge if these hardware parameters are not tuned appropriately. It was tempting to assume that high-speed 100 Gb/s network interfaces, directly attached to the VM using SR-IOV would eliminate any chance of network-related bottlenecks. In reality, that assumption didn’t hold up.</p>
<p>Engaging early with Azure was essential, as they provided deeper visibility into the underlying infrastructure and worked with us to tune low-level, performance-critical settings. Combined with thorough load and scale testing and robust observability using tools like Elastic Observability, this collaboration helped us detect and rectify the issue early in order to deliver a consistent, reliable, and high-performing experience for our users.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/debugging-aks-packet-loss/debugging-aks-packet-loss.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elastic-ai-assistant-observability-microsoft-azure-openai</link>
            <guid isPermaLink="false">elastic-ai-assistant-observability-microsoft-azure-openai</guid>
            <pubDate>Wed, 03 Apr 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Follow this step-by-step process to get started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI.]]></description>
            <content:encoded><![CDATA[<p>Recently, Elastic <a href="https://www.elastic.co/blog/whats-new-elastic-observability-8-12-0">announced</a> the AI Assistant for Observability is now generally available for all Elastic users. The AI Assistant enables a new tool for Elastic Observability providing large language model (LLM) connected chat and contextual insights to explain errors and suggest remediation. Similar to how Microsoft Copilot is an AI companion that introduces new capabilities and increases productivity for developers, the Elastic AI Assistant is an AI companion that can help you quickly gain additional value from your observability data.</p>
<p>This blog post presents a step-by-step guide on how to set up the AI Assistant for Observability with Azure OpenAI as the backing LLM. Then once you’ve got the AI Assistant set up, this post will show you how to add documents to the AI Assistant’s knowledge base along with demonstrating how the AI Assistant uses its knowledge base to improve its responses to address specific questions.</p>
<h2>Set up the Elastic AI Assistant for Observability: Create an Azure OpenAI key</h2>
<p>Start by creating a Microsoft Azure OpenAI API key to authenticate requests from the Elastic AI Assistant. Head over to <a href="https://azure.microsoft.com/">Microsoft Azure and use an existing subscription or create a new one at the Azure portal</a>.</p>
<p>Currently, access to the Azure OpenAI service is granted by applying for access. See the <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&amp;pivots=programming-language-studio#prerequisites">official Microsoft documentation for the current prerequisites</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/1.png" alt="Watch what your data can do" /></p>
<p>In the Azure portal, select <strong>Azure OpenAI</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/2.png" alt="Azure OpenAI" /></p>
<p>In the Azure OpenAI service, click the <strong>Create</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/3.png" alt="+Create" /></p>
<p>Enter an instance <strong>Name</strong> and click <strong>Next</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/4.png" alt="Basics Next" /></p>
<p>Select your network access preference for the Azure OpenAI instance and click <strong>Next</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/5.png" alt="Network Next" /></p>
<p>Add optional <strong>Tags</strong> and click <strong>Next</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/6.png" alt="Tags Next" /></p>
<p>Confirm your settings and click <strong>Create</strong> to create the Azure OpenAI instance.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/7.png" alt="Review + submit Create" /></p>
<p>Once the instance creation is complete, click the <strong>Go to resource</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/8.png" alt="go to resource" /></p>
<p>Click the <strong>Manage keys</strong> link to access the instance’s API key.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/9.png" alt="manage keys" /></p>
<p>Copy your Azure OpenAI <strong>API Key</strong> and the <strong>Endpoint</strong> and save them both in a safe place for use in a later step.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/10.png" alt="copy to clipboard" /></p>
<p>Next, click <strong>Model deployments</strong> to create a deployment within the Azure OpenAI instance you just created.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/11.png" alt="model deployments" /></p>
<p>Click the <strong>Manage deployments</strong> button to open Azure OpenAI Studio.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/12.png" alt="manage deployments" /></p>
<p>Click the <strong>Create new deployment</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/13.png" alt="+ Create new deployment" /></p>
<p>Select the model type you want to use and enter a Deployment name. Note the Deployment name for use in a later step. Click the <strong>Create</strong> button to deploy the model.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/14.png" alt="deploy model" /></p>
<h2>Set up the Elastic AI Assistant for Observability: Create an OpenAI connector in Elastic Cloud</h2>
<p>The remainder of the instructions in this post will take place within <a href="https://cloud.elastic.co/registration">Elastic Cloud</a>. You can use an existing deployment or you can create a new Elastic Cloud deployment as a free trial if you’re trying Elastic Cloud for the first time. Another option to get started is to create an <a href="https://azuremarketplace.microsoft.com/en-us/marketplace/apps/elastic.ec-azure-observability?tab=Overview">Elastic deployment from the Microsoft Azure Marketplace</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/15.png" alt="sign up trial" /></p>
<p>The next step is to create an Azure OpenAI connector in Elastic Cloud. In the <a href="https://cloud.elastic.co/home">Elastic Cloud console</a> for your deployment, select the top-level menu and then select <strong>Stack Management</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/16.png" alt="stack management" /></p>
<p>Select <strong>Connectors</strong> on the Stack Management page.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/17.png" alt="connectors" /></p>
<p>Select <strong>Create connector</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/18.png" alt="create connector" /></p>
<p>Select the connector for Azure OpenAI.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/19.png" alt="openai" /></p>
<p>Enter a <strong>Name</strong> of your choice for the connector. Select <strong>Azure OpenAI</strong> as the OpenAI provider.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/20.png" alt="openai connector" /></p>
<p>Enter the Endpoint URL using the following format:</p>
<ul>
<li>
<p>Replace <code>{your-resource-name}</code> with the <strong>name of the Azure Open AI instance</strong> that you created within the Azure portal in a previous step.</p>
</li>
<li>
<p>Replace <code>deployment-id</code> with the <strong>Deployment name</strong> that you specified when you created a model deployment within the Azure portal in a previous step.</p>
</li>
<li>
<p>Replace <code>{api-version}</code> with one of the valid <strong>Supported versions</strong> listed in the <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/reference">Completions section of the Azure OpenAI reference page</a>.</p>
</li>
</ul>
<pre><code class="language-bash">https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}
</code></pre>
<p>Your completed Endpoint URL should look something like this:</p>
<pre><code class="language-bash">https://example-openai-instance.openai.azure.com/openai/deployments/gpt-4-turbo/chat/completions?api-version=2024-02-01
</code></pre>
<p>Enter the API Key that you copied in a previous step. Then click the <strong>Save &amp; test</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/21.png" alt="save &amp; test" /></p>
<p>Within the <strong>Edit Connector</strong> flyout window, click the <strong>Run</strong> button to confirm that the connector configuration is valid and can successfully connect to your Azure OpenAI instance.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/22.png" alt="" /></p>
<p>A successful connector test should look something like this:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/23.png" alt="results" /></p>
<h2>Add an example logs record</h2>
<p>Now that you have your Elastic Cloud deployment set up with an AI Assistant connector, let’s add an example logs record to demonstrate how the AI Assistant can help you to better understand logs data.</p>
<p>We’ll use the Elastic Dev Tools to add a single logs record. Click the top-level menu and select <strong>Dev Tools</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/24.png" alt="dev tools" /></p>
<p>Within the Console area of Dev Tools, enter the following POST statement:</p>
<pre><code class="language-bash">POST /logs-elastic_agent-default/_doc
{
	&quot;message&quot;: &quot;Status(StatusCode=\&quot;FailedPrecondition\&quot;, Detail=\&quot;Can't access cart storage. \nSystem.ApplicationException: Wasn't able to connect to redis \n  at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104 \n  at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168\&quot;).&quot;,
	&quot;@timestamp&quot;: &quot;2024-02-22T11:34:00.884Z&quot;,
	&quot;log&quot;: {
    	&quot;level&quot;: &quot;error&quot;
	},
	&quot;service&quot;: {
    	&quot;name&quot;: &quot;cartService&quot;
	},
	&quot;host&quot;: {
    	&quot;name&quot;: &quot;appserver-1&quot;
	}
}
</code></pre>
<p>Then run the POST command by clicking the green <strong>Run</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/25.png" alt="click to send request" /></p>
<p>You should see a 201 response confirming that the example logs record was successfully created.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/26.png" alt="201 response" /></p>
<h2>Use the Elastic AI Assistant</h2>
<p>Now that you have a log record to work with, let’s jump over to the Observability Logs Explorer to see how the AI Assistant interacts with logs data. Click the top-level menu and select <strong>Observability</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/27.png" alt="observability" /></p>
<p>Select <strong>Logs Explorer</strong> to explore the logs data.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/28.png" alt="explorer" /></p>
<p>In the Logs Explorer search box, enter the text “redis” and press the <strong>Enter</strong> key to perform the search.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/29.png" alt="redis" /></p>
<p>Click the <strong>View all matches</strong> button to include all search results.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/30.png" alt="view all matches" /></p>
<p>You should see the one log record that you previously inserted via Dev Tools. Click the expand icon to see the log record’s details.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/31.png" alt="expand icon" /></p>
<p>You should see the expanded view of the logs record. Instead of trying to understand its contents ourselves, we'll use the AI Assistant to summarize it. Click on the <strong>What's this message?</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/32.png" alt="What's this message?" /></p>
<p>We get a fairly generic answer back. Depending on the exception or error we're trying to analyze, this can still be really useful, but we can make this better by adding additional documentation to the AI Assistant knowledge base.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/33.png" alt="log details" /></p>
<p>Let’s see how we can use the AI Assistant’s knowledge base to improve its understanding of this specific logs message.</p>
<h2>Create an Elastic AI Assistant knowledge base</h2>
<p>Select <strong>Overview</strong> from the <strong>Observability</strong> menu.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/34.png" alt="Select Overview from the Observability menu." /></p>
<p>Click the <strong>AI Assistant</strong> button at the top right of the window.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/35.png" alt="AI Assistant" /></p>
<p>Click the <strong>Install Knowledge base</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/36.png" alt="Install Knowledge base" /></p>
<p>Click the top-level menu and select <strong>Stack Management</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/37.png" alt="Stack Management" /></p>
<p>Then select <strong>AI Assistants</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/38.png" alt="AI Assistants" /></p>
<p>Click <strong>Elastic AI Assistant for Observability</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/39.png" alt="Elastic AI Assistant for Observability" /></p>
<p>Select the <strong>Knowledge base</strong> tab.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/40.png" alt="Knowledge base" /></p>
<p>Click the <strong>New entry</strong> button and select <strong>Single entry</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/41.png" alt="new entry" /></p>
<p>Give it the <strong>Name</strong> “cartservice” and enter the following text as the <strong>Contents</strong> :</p>
<pre><code class="language-markdown">Link: [Cartservice Intermittent connection issue](https://github.com/elastic/observability-examples/issues/25)
I have the following GitHub issue. Store this information in your knowledge base and always return the link to it if relevant.
GitHub Issue, return if relevant

Link: https://github.com/elastic/observability-examples/issues/25

Title: Cartservice Intermittent connection issue

Body:
The cartservice occasionally encounters storage errors due to an unreliable network connection.

The errors typically indicate a failure to connect to Redis, as seen in the error message:

Status(StatusCode=&quot;FailedPrecondition&quot;, Detail=&quot;Can't access cart storage.
System.ApplicationException: Wasn't able to connect to redis
at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104
at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168')'.
I just talked to the SRE team in Slack, they have plans to implement retries as a quick fix and address the network issue later.
</code></pre>
<p>Click <strong>Save</strong> to save the new knowledge base entry.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/42.png" alt="save" /></p>
<p>Now let’s go back to the Observability Logs Explorer. Click the top-level menu and select <strong>Observability</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/43.png" alt="settings" /></p>
<p>Then select <strong>Explorer</strong> under <strong>Logs</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/44.png" alt="explorer" /></p>
<p>Expand the same logs entry as you did previously and click the <strong>What’s this message?</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/45.png" alt="What’s this message? button" /></p>
<p>The response you get now should be much more relevant.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/46.png" alt="log details" /></p>
<h2>Try out the Elastic AI Assistant with a knowledge base filled with your own data</h2>
<p>Now that you’ve seen how easy it is to set up the Elastic AI Assistant for Observability, go ahead and give it a try for yourself. Sign up for a <a href="https://cloud.elastic.co/registration">free 14-day trial</a>. You can quickly spin up an Elastic Cloud deployment in minutes and have your own search powered AI knowledge base to help you with getting your most important work done.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
<p><em>In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.</em></p>
<p><em>Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elastic-ai-assistant-observability-microsoft-azure-openai/AI_hand.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Elastic Observability monitors metrics for Microsoft Azure in just minutes]]></title>
            <link>https://www.elastic.co/observability-labs/blog/observability-monitors-metrics-microsoft-azure</link>
            <guid isPermaLink="false">observability-monitors-metrics-microsoft-azure</guid>
            <pubDate>Mon, 29 Jan 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Follow this step-by-step process to enable Elastic Observability for Microsoft Azure metrics.]]></description>
            <content:encoded><![CDATA[<p>Developers and SREs choose Microsoft Azure to run their applications because it is a trustworthy world-class cloud platform. It has also proven itself over the years as an extremely powerful and reliable infrastructure for hosting business-critical applications.</p>
<p>Elastic Observability offers over 25 out-of-the-box integrations for Microsoft Azure services with more on the way. A full list of Azure integrations can be found in <a href="https://docs.elastic.co/integrations/azure">our online documentation</a>.</p>
<p>Elastic Observability aggregates not only logs but also metrics for Azure services and the applications running on Azure compute services (Virtual Machines, Functions, Kubernetes Service, etc.). All this data can be analyzed visually and more intuitively using Elastic®’s advanced machine learning (ML) capabilities, which help detect performance issues and surface root causes before end users are affected.</p>
<p>For more details on how Elastic Observability provides application performance monitoring (APM) capabilities such as service maps, tracing, dependencies, and ML-based metrics correlations, read <a href="https://www.elastic.co/blog/apm-correlations-elastic-observability-root-cause-transactions">APM correlations in Elastic Observability: Automatically identifying probable causes of slow or failed transactions</a>.</p>
<p>That’s right, Elastic offers capabilities to collect, aggregate, and analyze metrics for Microsoft Azure services and applications running on Azure. Elastic Observability is for more than just capturing logs — it offers a unified observability solution for Microsoft Azure workloads.</p>
<p>In this blog, we’ll review how Elastic Observability can monitor metrics for a three-tier web application running on Microsoft Azure and leveraging:</p>
<ul>
<li>Microsoft Azure Virtual Machines</li>
<li>Microsoft Azure SQL database</li>
<li>Microsoft Azure Virtual Network</li>
</ul>
<p>As you will see, once the integration is installed, metrics will arrive instantly and you can immediately start deriving insights from metrics.</p>
<h2>Prerequisites and config</h2>
<p>Here are some of the components and details we used to set up this demonstration:</p>
<ul>
<li>Ensure you have a Microsoft Azure account and an Azure service principal with permission to read monitoring data from Microsoft Azure (<a href="https://docs.elastic.co/integrations/azure_metrics/monitor#integration-specific-configuration-notes">see details in our documentation</a>).</li>
<li>This post does <em>not</em> cover application monitoring; instead, we will focus on how Microsoft Azure services can be easily monitored. If you want to get started with examples of application monitoring, see our <a href="https://github.com/elastic/observability-examples/tree/main/azure/container-apps">Hello World observability code samples</a>.</li>
<li>In order to see metrics, you will need to load the application. We’ve also created a Playwright script to drive traffic to the application.</li>
</ul>
<h2>Three-tier application overview</h2>
<p>Before we dive into the Elastic deployment setup and configuration, let's review what we are monitoring. If you follow the <a href="https://learn.microsoft.com/en-us/training/modules/n-tier-architecture/">Microsoft Learn N-tier example app</a> instructions for deploying the &quot;What's for Lunch?&quot; app, you will have the following deployed.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-three-tier-application-overview.png" alt="three tier application overview" /></p>
<p>What’s deployed:</p>
<ul>
<li>Microsoft Azure VM presentation tier that renders an HTML client in the user's browser and enables user requests to be sent to the “What’s for Lunch?” app</li>
<li>Microsoft Azure VM application tier that communicates with the presentation and the database tier</li>
<li>Microsoft Azure SQL instance in the database tier, handling requests from the application tier to store and serve data</li>
</ul>
<p>At the end of the blog, we will also provide a Playwright script that can be run to send requests to this app in order to load it with example data and exercise its functionality. This will help drive metrics to “light up” the dashboards.</p>
<h2>Setting it all up</h2>
<p>Let’s walk through the details of how to deploy the example three-tier application, Azure integration on Elastic and visualize what gets ingested in Elastic’s Kibana® dashboards.</p>
<h3>Step 0: Get an account on Elastic Cloud</h3>
<p>Follow the instructions to <a href="https://cloud.elastic.co/registration">get started on Elastic Cloud</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-free-trial.png" alt="elastic cloud free trial sign up" /></p>
<h3>Step 1: Deploy the Microsoft Azure three-tier application</h3>
<p>From the <a href="https://portal.azure.com/">Azure portal</a>, click the Cloud Shell icon at the top of the portal to open Cloud Shell…</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-open-cloud-shell.png" alt="open cloud shell" /></p>
<p>… and when the Cloud Shell first opens, select <strong>Bash</strong> as the shell type to use.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-cloud-shell-bash.png" alt="cloud shell bash" /></p>
<p>If you’re prompted that “You have no storage mounted,” then click the <strong>Create storage</strong> button to create a file store to be used for saving and editing files from Cloud Shell.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-create-storage.png" alt="cloud shell create storage" /></p>
<p>You should now see the open Cloud Shell terminal.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-cloud-shell-terminal.png" alt="cloud shell terminal" /></p>
<p>Run the following command in Cloud Shell to define the environment variables that we’ll be using in the Cloud Shell commands required to deploy and view the sample application.</p>
<p>Be sure to specify a valid RESOURCE_GROUP from your available <a href="https://portal.azure.com/#view/HubsExtension/BrowseResourceGroups">Resource Groups listed in the Azure portal</a>. Also specify a new password to replace the SpecifyNewPasswordHere placeholder text before running the command. See the Microsoft <a href="https://learn.microsoft.com/en-us/sql/relational-databases/security/password-policy?view=sql-server-ver16#password-complexity">password policy documentation</a> for password requirements.</p>
<pre><code class="language-bash">RESOURCE_GROUP=&quot;test&quot;
APP_PASSWORD=&quot;SpecifyNewPasswordHere&quot;
</code></pre>
<p>Run the following az deployment group create command, which will deploy the example three-tier web app in around five minutes.</p>
<pre><code class="language-bash">az deployment group create --resource-group $RESOURCE_GROUP --template-uri https://raw.githubusercontent.com/MicrosoftDocs/mslearn-n-tier-architecture/master/Deployment/azuredeploy.json --parameters password=$APP_PASSWORD
</code></pre>
<p>After the deployment has completed, run the following command, which returns the URL for the app.</p>
<pre><code class="language-bash">az deployment group show --output table --resource-group $RESOURCE_GROUP --name azuredeploy --query properties.outputs.webSiteUrl
</code></pre>
<p>Copy the web app URL and paste it into a browser to view the example “What’s for Lunch?” web app.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-whats-for-lunch.png" alt="whats for lunch app" /></p>
<h3>Step 2: Create an Azure service principal and grant access permission</h3>
<p>Go to the <a href="https://portal.azure.com/">Microsoft Azure Portal</a>. Search for active directory and select <strong>Microsoft Entra ID</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-active-directory.png" alt="search active directory" /></p>
<p>Copy the <strong>Tenant ID</strong> for use in a later step in this blog post. This ID is required to configure Elastic Agent to connect to your Azure account.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-your-organization-overview.png" alt="your organization overview" /></p>
<p>In the navigation pane, select <strong>App registrations</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-your-organization-overview-app-registrations.png" alt="your organization overview app registrations" /></p>
<p>Then click <strong>New registration</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-your-organization-new-registration.png" alt="your organization new registrations" /></p>
<p>Type the name of your application (this tutorial uses three-tier-app-azure) and click <strong>Register</strong> (accept the default values for other settings).</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-register_an_application.png" alt="register an application" /></p>
<p>Copy the <strong>Application (client) ID</strong> and save it for later. This ID is required to configure Elastic Agent to connect to your Azure account.</p>
<p>In the navigation pane, select <strong>Certificates &amp; secrets</strong> , and then click <strong>New client secret</strong> to create a new security key.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-three-tier-app-new-client-secret.png" alt="three tier app new client secret" /></p>
<p>Type a description of the secret and select an expiration. Click <strong>Add</strong> to create the client secret. Under <strong>Value</strong> , copy the secret value and save it (along with your client ID) for later.</p>
<p>After creating the Azure service principal, you need to grant it the correct permissions. In the Azure Portal, search for and select <strong>Subscriptions</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-three-tier-subscriptions.png" alt="three tier subscriptions" /></p>
<p>In the Subscriptions page, click the name of your subscription. On the subscription details page, copy your <strong>Subscription ID</strong> and save it for a later step.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-subscription-essentials-copy.png" alt="subscription essentials copy" /></p>
<p>In the navigation pane, select <strong>Access control (IAM)</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-subscription-access-control.png" alt="subscription access control" /></p>
<p>Click <strong>Add</strong> and select <strong>Add role assignment</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-subscription-access-control-add-role-assignment.png" alt="subscription access control add role assignment" /></p>
<p>On the <strong>Role</strong> tab, select the <strong>Monitoring Reader</strong> role and then click <strong>Next</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-role-assignment-monitoring-readers.png" alt="add role assignment monitoring reader" /></p>
<p>On the <strong>Members</strong> tab, select the option to assign access to <strong>User, group, or service principal</strong>. Click <strong>Select members</strong> , and then search for and select the principal you created earlier. For the description, enter the name of your service principal. Click <strong>Next</strong> to review the role assignment.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-role-assignment-description.png" alt="add role assignment description" /></p>
<p>Click <strong>Review + assign</strong> to grant the service principal access to your subscription.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-role-assignment-review-assign.png" alt="add role assignment review assign" /></p>
<h3>Step 3: Create an Azure VM instance</h3>
<p>In the Azure Portal, search for and select <strong>Virtual machines</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-search-virtual-machines.png" alt="search virtual machines" /></p>
<p>On the <strong>Virtual machines</strong> page, click <strong>+ Create</strong> and select <strong>Azure virtual machine</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-azure-virtual-machine.png" alt="azure virtual machine" /></p>
<p>On the Virtual machine creation page, enter a name like “metrics-vm” for the virtual machine name and select VM Size to be “Standard_D2s_v3 - 2 vcpus, 8 GiB memory.” Click the <strong>Next : Disks</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-create-virtual-macine-next-disks.png" alt="create a virtual machine next disks" /></p>
<p>On the <strong>Disks</strong> page, keep the default settings and click the <strong>Next : Networking</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-create-virtual-machine-next-networking.png" alt="create a virtual machine next networking" /></p>
<p>On the <strong>Networking</strong> page, demo-vnet should be selected for <strong>Virtual network</strong> and demo-biz-subnet should be selected for <strong>Subnet</strong>. These resources are created as part of the three-tier example app’s deployment that was done in Step 1.</p>
<p>Click the <strong>Review + create</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-create-virtual-machine-review-create.png" alt="create virtual machine review create" /></p>
<p>On the <strong>Review</strong> page, click the <strong>Create</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-create-virtual-machine-validation-passed.png" alt="create virtual machine validation passed" /></p>
<h3>Step 4: Install the Azure Resource Metrics integration</h3>
<p>In your <a href="https://cloud.elastic.co/home">Elastic Cloud</a> deployment, navigate to the Elastic Azure integrations by selecting <strong>Integrations</strong> from the top-level menu. Search for azure resource and click the <strong>Azure Resource Metrics</strong> tile.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-integrations-azure-resource-metrics.png" alt="integrations azure resource metrics" /></p>
<p>Click <strong>Add Azure Resource Metrics.</strong></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-azure-resource-metrics.png" alt="azure resource metrics" /></p>
<p>Click <strong>Add integration only (skip agent installation)</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-integration-only.png" alt="add integration only" /></p>
<p>Enter the values that you saved previously for Client ID, Client Secret, Tenant ID, and Subscription ID.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-azure-resource-metrics-integration.png" alt="add azure resource metrics integration" /></p>
<p>As you can see, the Azure Resource Metrics integration will collect a significant amount of data from eight Azure services. Click <strong>Save and continue</strong>.</p>
<p>You’ll be presented with a confirmation dialog window. Click <strong>Add Elastic Agent to your hosts</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-azure-resource-metrics-integration-added.png" alt="azure resource metrics integration added" /></p>
<p>This will display the instructions required to install the Elastic agent. Copy the command under the <strong>Linux Tar</strong> tab.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-agent.png" alt="add agent linux tar" /></p>
<p>Next you will need to use SSH to log in to the Azure VM instance and run the commands copied from <strong>Linux Tar</strong> tab. Go to <a href="https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Compute/VirtualMachines">Azure Virtual Machines</a> in the Azure portal. Then click the name of the VM instance that you created in Step 3.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-metrics-vm.png" alt="metrics vm" /></p>
<p>Click the <strong>Select</strong> button in the <strong>SSH Using Azure CLI</strong> section.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-metrics-vm-connect.png" alt="metrics vm connect" /></p>
<p>Select the “I understand …” checkbox and then click the <strong>Configure + connect</strong> button.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-ssh-using-azure-cli.png" alt="ssh using azure cli" /></p>
<p>Once you are SSH’d inside the VM instance terminal window, run the commands copied previously from <strong>Linux Tar tab</strong> in the <strong>Install Elastic Agent on your host</strong> instructions. When the installation completes, you’ll see a confirmation message in the Install Elastic Agent on your host form.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-add-agent-confirmed.png" alt="add agent confirmed" /></p>
<p>Super! The Elastic agent is sending data to Elastic Cloud. Now let’s observe some metrics.</p>
<h3>Step 5: Run traffic against the application</h3>
<p>While getting the application running is fairly easy, there is nothing to monitor or observe with Elastic unless you add a load on the application.</p>
<p>Here is a simple script you can also run using <a href="https://playwright.dev/">Playwright</a> to add traffic and exercise the functionality of the Azure three-tier application:</p>
<pre><code class="language-javascript">import { test, expect } from &quot;@playwright/test&quot;;

test(&quot;homepage for Microsoft Azure three tier app&quot;, async ({ page }) =&gt; {
  // Load web app
  await page.goto(&quot;http://20.172.198.231/&quot;);
  // Add lunch suggestions
  await page.fill(&quot;id=txtAdd&quot;, &quot;tacos&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  await page.fill(&quot;id=txtAdd&quot;, &quot;sushi&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  await page.fill(&quot;id=txtAdd&quot;, &quot;pizza&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  await page.fill(&quot;id=txtAdd&quot;, &quot;burgers&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  await page.fill(&quot;id=txtAdd&quot;, &quot;salad&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  await page.fill(&quot;id=txtAdd&quot;, &quot;sandwiches&quot;);
  await page.keyboard.press(&quot;Enter&quot;);
  await page.waitForTimeout(1000);
  // Click vote buttons
  await page.getByRole(&quot;button&quot;).nth(1).click();
  await page.getByRole(&quot;button&quot;).nth(3).click();
  await page.getByRole(&quot;button&quot;).nth(5).click();
  await page.getByRole(&quot;button&quot;).nth(7).click();
  await page.getByRole(&quot;button&quot;).nth(9).click();
  await page.getByRole(&quot;button&quot;).nth(11).click();
  // Click remove buttons
  await page.getByRole(&quot;button&quot;).nth(12).click();
  await page.getByRole(&quot;button&quot;).nth(10).click();
  await page.getByRole(&quot;button&quot;).nth(8).click();
  await page.getByRole(&quot;button&quot;).nth(6).click();
  await page.getByRole(&quot;button&quot;).nth(4).click();
  await page.getByRole(&quot;button&quot;).nth(2).click();
});
</code></pre>
<h3>Step 6: View Azure dashboards in Elastic</h3>
<p>With Elastic Agent running, you can go to Elastic Dashboards to view what’s being ingested. Simply search for “dashboard” in Elastic and choose <strong>Dashboard</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-dashboard.png" alt="dashboard" /></p>
<p>This will open the Elastic Dashboards page. In the Dashboards search box, search for azure vm and click the <strong>[Azure Metrics] Compute VMs Overview</strong> dashboard, one of the many out-of-the-box dashboards available.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-dashboards-create.png" alt="dashboards create" /></p>
<p>You will see a Dashboard populated with your deployed application’s VM metrics.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/blog-elastic-azure-compute-vm.png" alt="azure compute vm" /></p>
<p>On the Azure Compute VM dashboard, we can see the following sampling of some of the many available metrics:</p>
<ul>
<li>CPU utilization</li>
<li>Available memory</li>
<li>Network sent and received bytes</li>
<li>Disk writes and reads metrics</li>
</ul>
<p>For metrics not covered by out-of-the-box dashboards, custom dashboards can be easily created to visualize metrics that are important to you.</p>
<p><strong>Congratulations, you have now started monitoring metrics from Microsoft Azure services for your application!</strong></p>
<h2>Analyze your data with Elastic AI Assistant</h2>
<p>Once metrics and logs (or either one) are in Elastic, start analyzing your data with <a href="https://www.elastic.co/blog/context-aware-insights-elastic-ai-assistant-observability">context-aware insights using the Elastic AI Assistant for Observability</a>.</p>
<h2>Conclusion: Monitoring Microsoft Azure service metrics with Elastic Observability is easy!</h2>
<p>We hope you’ve gotten an appreciation for how Elastic Observability can help you monitor Azure service metrics. Here’s a quick recap of what you learned:</p>
<ul>
<li>Elastic Observability supports ingest and analysis of Azure service metrics.</li>
<li>It’s easy to set up ingest from Azure services via the Elastic Agent.</li>
<li>Elastic Observability has multiple out-of-the-box Azure service dashboards you can use to preliminarily review information and then modify for your needs.</li>
</ul>
<p>Try it out for yourself by signing up via <a href="https://portal.azure.com/#view/Microsoft_Azure_Marketplace/GalleryItemDetailsBladeNopdl/id/elastic.ec-azure-pp">Microsoft Azure Marketplace</a> and quickly spin up a deployment in minutes on any of the <a href="https://www.elastic.co/guide/en/cloud/current/ec-reference-regions.html#ec_azure_regions">Elastic Cloud regions on Microsoft Azure</a> around the world. Your Azure Marketplace purchase of Elastic will be included in your monthly consolidated billing statement and will draw against your committed spend with Microsoft Azure.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/observability-monitors-metrics-microsoft-azure/Azure_Dark_(1).png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[How to deploy a Hello World web app with Elastic Observability on Azure Container Apps]]></title>
            <link>https://www.elastic.co/observability-labs/blog/deploy-app-observability-azure-container-apps</link>
            <guid isPermaLink="false">deploy-app-observability-azure-container-apps</guid>
            <pubDate>Mon, 23 Oct 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[Follow the step-by-step process of instrumenting Elastic Observability for a Hello World web app running on Azure Container Apps.]]></description>
            <content:encoded><![CDATA[<p>Elastic Observability is the optimal tool to provide visibility into your running web apps. Microsoft Azure Container Apps is a fully managed environment that enables you to run containerized applications on a serverless platform so that your applications scale up and down. This allows you to accomplish the dual objective of serving every customer’s need for availability while meeting your needs to do so as efficiently as possible.</p>
<p>Using Elastic Observability and Azure Container Apps is a perfect combination for developers to deploy <a href="https://www.elastic.co/blog/observability-powerful-flexible-efficient">web apps that are auto-scaled with fully observable operations</a>.</p>
<p>This blog post will show you how to deploy a simple Hello World web app to Azure Container Apps and then walk you through the steps to instrument the Hello World web app to enable observation of the application’s operations with Elastic Cloud.</p>
<h2>Elastic Observability setup</h2>
<p>We’ll start with setting up an Elastic Cloud deployment, which is where observability will take place for the web app we’ll be deploying.</p>
<p>From the <a href="https://cloud.elastic.co">Elastic Cloud console</a>, select <strong>Create deployment</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-1-create-deployment.png" alt="create deployment" /></p>
<p>Enter a deployment name and click <strong>Create deployment</strong>. It takes a few minutes for your deployment to be created. While waiting, you are prompted to save the admin credentials for your deployment, which provides you with superuser access to your Elastic® deployment. Keep these credentials safe as they are shown only once.</p>
<p>Elastic Observability requires an APM Server URL and an APM Secret token for an app to send observability data to Elastic Cloud. Once the deployment is created, we’ll copy the Elastic Observability server URL and secret token and store them somewhere safely for adding to our web app code in a later step.</p>
<p>To copy the APM Server URL and the APM Secret Token, go to <a href="https://cloud.elastic.co/home">Elastic Cloud</a> . Then go to the <a href="https://cloud.elastic.co/deployments">Deployments</a> page, which lists all of the deployments you have created. Select the deployment you want to use, which will open the deployment details page. In the <strong>Kibana</strong> row of links, click on <strong>Open</strong> to open Kibana® for your deployment.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-2-my-deployment.png" alt="my deployment" /></p>
<p>Select <strong>Integrations</strong> from the top-level menu. Then click the <strong>APM</strong> tile.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-3-apm.png" alt="apm" /></p>
<p>On the APM Agents page, copy the secretToken and the serverUrl values and save them for use in a later step.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-4-apm-agents.png" alt="apm agents" /></p>
<p>Now that we’ve completed the Elastic Cloud setup, the next step is to set up our account in Azure for deploying apps to the Container Apps service.</p>
<h2>Azure Container Apps setup</h2>
<p>First we’ll need an Azure account, so let’s create one by going to the <a href="https://azure.microsoft.com">Microsoft Azure portal</a> and creating a new project. Click the <strong>Start free</strong> button and follow the steps to sign in or create a new account.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-5-azure-start-free.png" alt="azure start free" /></p>
<h2>Deploy a Hello World web app to Container Apps</h2>
<p>We’ll perform the process of deploying a C# Hello World web app to Container Apps using the handy Azure tool called <a href="https://azure.microsoft.com/en-us/get-started/azure-portal/cloud-shell">Cloud Shell</a>. To deploy the Hello World app, we’ll perform the following 12 steps:</p>
<ol>
<li>From the <a href="https://portal.azure.com/">Azure portal</a>, click the Cloud Shell icon at the top of the portal to open Cloud Shell…</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-6-cloud-shell.png" alt="cloud shell" /></p>
<p>… and when the Cloud Shell first opens, select <strong>Bash</strong> as the shell type to use.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-7-bash.png" alt="bash" /></p>
<ol start="2">
<li>If you’re prompted that “You have no storage mounted,” then click the <strong>Create storage</strong> button to create a file store to be used for saving and editing files from Cloud Shell.</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-8-create-storage.png" alt="create storage" /></p>
<ol start="3">
<li>In Cloud Shell, clone a <a href="https://github.com/elastic/observability-examples/tree/main/azure/container-apps/helloworld">C# Hello World sample app</a> repo from GitHub by entering the following command.</li>
</ol>
<pre><code class="language-bash">git clone https://github.com/elastic/observability-examples
</code></pre>
<ol start="4">
<li>Change directory to the location of the Hello World web app code.</li>
</ol>
<pre><code class="language-bash">cd observability-examples/azure/container-apps/helloworld
</code></pre>
<ol start="5">
<li>Define the environment variables that we’ll be using in the commands throughout this blog post.</li>
</ol>
<pre><code class="language-bash">RESOURCE_GROUP=&quot;helloworld-containerapps&quot;
LOCATION=&quot;centralus&quot;
ENVIRONMENT=&quot;env-helloworld-containerapps&quot;
APP_NAME=&quot;elastic-helloworld&quot;
</code></pre>
<ol start="6">
<li>Define a registry container name that is unique by running the following command.</li>
</ol>
<pre><code class="language-bash">ACR_NAME=&quot;helloworld&quot;$RANDOM
</code></pre>
<ol start="7">
<li>Create an Azure resource group by running the following command.</li>
</ol>
<pre><code class="language-bash">az group create --name $RESOURCE_GROUP --location &quot;$LOCATION&quot;
</code></pre>
<ol start="8">
<li>Run the following command to create a registry container in Azure Container Registry.</li>
</ol>
<pre><code class="language-bash">az acr create --resource-group $RESOURCE_GROUP \
--name $ACR_NAME --sku Basic --admin-enable true
</code></pre>
<ol start="9">
<li>Build the app image and push it to Azure Container Registry by running the following command.</li>
</ol>
<pre><code class="language-bash">az acr build --registry $ACR_NAME --image $APP_NAME .
</code></pre>
<ol start="10">
<li>Register the Microsoft.OperationalInsights namespace as a provider by running the following command.</li>
</ol>
<pre><code class="language-bash">az provider register -n Microsoft.OperationalInsights --wait
</code></pre>
<ol start="11">
<li>Run the following command to create a Container App environment for deploying your app into.</li>
</ol>
<pre><code class="language-bash">az containerapp env create --name $ENVIRONMENT \
--resource-group $RESOURCE_GROUP --location &quot;$LOCATION&quot;
</code></pre>
<ol start="12">
<li>Create a new Container App by deploying the Hello World app’s image to Container Apps, using the following command.</li>
</ol>
<pre><code class="language-bash">az containerapp create \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT \
  --image $ACR_NAME.azurecr.io/$APP_NAME \
  --target-port 3500 \
  --ingress 'external' \
  --registry-server $ACR_NAME.azurecr.io \
  --query properties.configuration.ingress.fqdn
</code></pre>
<p>This command will output the deployed Hello World app's fully qualified domain name (FQDN). Copy and paste the FQDN into a browser to see your running Hello World app.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-9-hello-world.png" alt="hello world" /></p>
<h2>Instrument the Hello World web app with Elastic Observability</h2>
<p>With a web app successfully running in Container Apps, we’re now ready to add the minimal code necessary to enable observability for the Hello World app in Elastic Cloud. We’ll perform the following eight steps:</p>
<ol>
<li>In Azure Cloud Shell, create a new file named Telemetry.cs by typing the following command.</li>
</ol>
<pre><code class="language-bash">touch Telemetry.cs
</code></pre>
<ol start="2">
<li>Open the Azure Cloud Shell file editor by typing the following command in Cloud Shell.</li>
</ol>
<pre><code class="language-bash">code .
</code></pre>
<ol start="3">
<li>In the Azure Cloud Shell editor, open the Telemetry.cs file and paste in the following code. Save the edited file in Cloud Shell by pressing the [Ctrl] + [s] keys on your keyboard (or if you’re on a macOS computer, use the [⌘] + [s] keys). This class file is used to create a tracer ActivitySource, which can generate trace Activity spans for observability.</li>
</ol>
<pre><code class="language-csharp">using System.Diagnostics;

public static class Telemetry
{
	public static readonly ActivitySource activitySource = new(&quot;Helloworld&quot;);
}
</code></pre>
<ol start="4">
<li>In the Azure Cloud Shell editor, edit the file named Dockerfile to add the following Elastic OpenTelemetry environment variables. Replace the ELASTIC_APM_SERVER_URL text and the ELASTIC_APM_SECRET_TOKEN text with the APM Server URL and the APM Secret Token values that you copied and saved in an earlier step.</li>
</ol>
<p>Save the edited file in Cloud Shell by pressing the [Ctrl] + [s] keys on your keyboard (or if you’re on a macOS computer, use the [⌘] + [s] keys).</p>
<p>The updated Dockerfile should look something like this:</p>
<pre><code class="language-dockerfile">FROM ${ARCH}mcr.microsoft.com/dotnet/aspnet:7.0. AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:8.0-preview AS build
ARG TARGETPLATFORM

WORKDIR /src
COPY [&quot;helloworld.csproj&quot;, &quot;./&quot;]
RUN dotnet restore &quot;./helloworld.csproj&quot;
COPY . .
WORKDIR &quot;/src/.&quot;
RUN dotnet build &quot;helloworld.csproj&quot; -c Release -o /app/build

FROM build AS publish
RUN dotnet publish &quot;helloworld.csproj&quot; -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
EXPOSE 3500
ENV ASPNETCORE_URLS=http://+:3500

ENV OTEL_EXPORTER_OTLP_ENDPOINT='https://******.apm.us-east-2.aws.elastic-cloud.com:443'
ENV OTEL_EXPORTER_OTLP_HEADERS='Authorization=Bearer ***********'
ENV OTEL_LOG_LEVEL=info
ENV OTEL_METRICS_EXPORTER=otlp
ENV OTEL_RESOURCE_ATTRIBUTES=service.version=1.0,deployment.environment=production
ENV OTEL_SERVICE_NAME=helloworld
ENV OTEL_TRACES_EXPORTER=otlp

ENTRYPOINT [&quot;dotnet&quot;, &quot;helloworld.dll&quot;]
</code></pre>
<ol start="5">
<li>In the Azure Cloud Shell editor, edit the helloworld.csproj file to add the Elastic APM and OpenTelemetry dependencies. The updated helloworld.csproj file should look something like this:</li>
</ol>
<pre><code class="language-xml">
&lt;Project Sdk=&quot;Microsoft.NET.Sdk.Web&quot;&gt;

  &lt;PropertyGroup&gt;
	&lt;TargetFramework&gt;net7.0&lt;/TargetFramework&gt;
	&lt;Nullable&gt;enable&lt;/Nullable&gt;
	&lt;ImplicitUsings&gt;enable&lt;/ImplicitUsings&gt;
  &lt;/PropertyGroup&gt;
  &lt;ItemGroup&gt;
	&lt;PackageReference Include=&quot;Elastic.Apm&quot; Version=&quot;1.24.0&quot; /&gt;
	&lt;PackageReference Include=&quot;Elastic.Apm.NetCoreAll&quot; Version=&quot;1.24.0&quot; /&gt;
	&lt;PackageReference Include=&quot;OpenTelemetry&quot; Version=&quot;1.6.0&quot; /&gt;
	&lt;PackageReference Include=&quot;OpenTelemetry.Exporter.Console&quot; Version=&quot;1.6.0&quot; /&gt;
	&lt;PackageReference Include=&quot;OpenTelemetry.Exporter.OpenTelemetryProtocol&quot; Version=&quot;1.6.0&quot; /&gt;
	&lt;PackageReference Include=&quot;OpenTelemetry.Extensions.Hosting&quot; Version=&quot;1.6.0&quot; /&gt;
	&lt;PackageReference Include=&quot;OpenTelemetry.Instrumentation.AspNetCore&quot; Version=&quot;1.5.0-beta.1&quot; /&gt;
  &lt;/ItemGroup&gt;

&lt;/Project&gt;
</code></pre>
<ol start="6">
<li>In the Azure Cloud Shell editor, edit the Program.cs:</li>
</ol>
<ul>
<li>Add a using statement at the top of the file to import System.Diagnostics, which is used to create Activities that are equivalent to “spans” in OpenTelemetry. Also import the OpenTelemetry.Resources and OpenTelemetry.Trace packages.</li>
</ul>
<pre><code class="language-csharp">using System.Diagnostics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
</code></pre>
<ul>
<li>Update the “builder” initialization code block to include configuration to enable Elastic OpenTelemetry observability.</li>
</ul>
<pre><code class="language-csharp">builder.Services.AddOpenTelemetry().WithTracing(builder =&gt; builder.AddOtlpExporter()
                	.AddSource(&quot;helloworld&quot;)
                	.AddAspNetCoreInstrumentation()
                	.AddOtlpExporter()
    	.ConfigureResource(resource =&gt;
        	resource.AddService(
            	serviceName: &quot;helloworld&quot;))
);
builder.Services.AddControllers();
</code></pre>
<ul>
<li>Replace the “Hello World!” HTML output string…</li>
</ul>
<pre><code class="language-html">&lt;h1&gt;Hello World!&lt;/h1&gt;
</code></pre>
<ul>
<li>...with the “Hello Elastic Observability” HTML output string.</li>
</ul>
<pre><code class="language-html">&lt;div style=&quot;text-align: center;&quot;&gt;
  &lt;h1 style=&quot;color: #005A9E; font-family:'Verdana'&quot;&gt;
    Hello Elastic Observability - Azure Container Apps - C#
  &lt;/h1&gt;
  &lt;img
    src=&quot;https://elastichelloworld.blob.core.windows.net/elastic-helloworld/elastic-logo.png&quot;
  /&gt;
&lt;/div&gt;
</code></pre>
<ul>
<li>Add a telemetry trace span around the output response utilizing the Telemetry class’ ActivitySource.</li>
</ul>
<pre><code class="language-csharp">using (Activity activity = Telemetry.activitySource.StartActivity(&quot;HelloSpan&quot;)!)
   	{
   		Console.Write(&quot;hello&quot;);
   		await context.Response.WriteAsync(output);
   	}
</code></pre>
<p>The updated Program.cs file should look something like this:</p>
<pre><code class="language-csharp">using System.Diagnostics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry().WithTracing(builder =&gt; builder.AddOtlpExporter()
                	.AddSource(&quot;helloworld&quot;)
                	.AddAspNetCoreInstrumentation()
                	.AddOtlpExporter()
    	.ConfigureResource(resource =&gt;
        	resource.AddService(
            	serviceName: &quot;helloworld&quot;))
);
builder.Services.AddControllers();
var app = builder.Build();

string output =
&quot;&quot;&quot;
&lt;div style=&quot;text-align: center;&quot;&gt;
&lt;h1 style=&quot;color: #005A9E; font-family:'Verdana'&quot;&gt;
Hello Elastic Observability - Azure Container Apps - C#
&lt;/h1&gt;
&lt;img src=&quot;https://elastichelloworld.blob.core.windows.net/elastic-helloworld/elastic-logo.png&quot;&gt;
&lt;/div&gt;
&quot;&quot;&quot;;

app.MapGet(&quot;/&quot;, async context =&gt;
	{
    	using (Activity activity = Telemetry.activitySource.StartActivity(&quot;HelloSpan&quot;)!)
    		{
        		Console.Write(&quot;hello&quot;);
        		await context.Response.WriteAsync(output);
    		}
	}
);
app.Run();
</code></pre>
<ol start="7">
<li>Rebuild the Hello World app image and push the image to the Azure Container Registry by running the following command.</li>
</ol>
<pre><code class="language-bash">az acr build --registry $ACR_NAME --image $APP_NAME .
</code></pre>
<ol start="8">
<li>Redeploy the updated Hello World app to Azure Container Apps, using the following command.</li>
</ol>
<pre><code class="language-bash">az containerapp create \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT \
  --image $ACR_NAME.azurecr.io/$APP_NAME \
  --target-port 3500 \
  --ingress 'external' \
  --registry-server $ACR_NAME.azurecr.io \
  --query properties.configuration.ingress.fqdn
</code></pre>
<p>This command will output the deployed Hello World app's fully qualified domain name (FQDN). Copy and paste the FQDN into a browser to see the updated Hello World app running in Azure Container Apps.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-10-elastic-hello-observability.png" alt="hello observability" /></p>
<h2>Observe the Hello World web app</h2>
<p>Now that we’ve instrumented the web app to send observability data to Elastic Observability, we can now use Elastic Cloud to monitor the web app’s operations.</p>
<ol>
<li>
<p>In Elastic Cloud, select the Observability <strong>Services</strong> menu item.</p>
</li>
<li>
<p>Click the <strong>helloworld</strong> service.</p>
</li>
<li>
<p>Click the <strong>Transactions</strong> tab.</p>
</li>
<li>
<p>Scroll down and click the <strong>GET /</strong> transaction.Scroll down to the <strong>Trace Sample</strong> section to see the <strong>GET /</strong> , <strong>HelloSpan</strong> trace sample.</p>
</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/elastic-blog-12-latency-distribution.png" alt="latency-distribution" /></p>
<h2>Observability made to scale</h2>
<p>You’ve seen the entire process of deploying a web app to Azure Container Apps that is instrumented with Elastic Observability. This web app is now fully available on the web running on a platform that will auto-scale to serve visitors worldwide. And it’s instrumented for Elastic Observability APM using OpenTelemetry to ingest data into Elastic Cloud’s Kibana dashboards.</p>
<p>Now that you’ve seen how to deploy a Hello World web app with a basic observability setup, visit <a href="https://www.elastic.co/observability">Elastic Observability</a> to learn more about expanding to a full scale observability coverage solution for your apps. Or visit <a href="https://www.elastic.co/getting-started/microsoft-azure">Getting started with Elastic on Microsoft Azure</a> for more examples of how you can drive the data insights you need by combining Microsoft Azure’s cloud computing services with Elastic’s search-powered platform.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/deploy-app-observability-azure-container-apps/library-branding-elastic-observability-midnight-1680x980.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Gain insights into Kubernetes errors with Elastic Observability logs and OpenAI]]></title>
            <link>https://www.elastic.co/observability-labs/blog/kubernetes-errors-observability-logs-openai</link>
            <guid isPermaLink="false">kubernetes-errors-observability-logs-openai</guid>
            <pubDate>Thu, 18 May 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[This blog post provides an example of how one can analyze error messages in Elasticsearch with ChatGPT using the OpenAI API via Elasticsearch.]]></description>
            <content:encoded><![CDATA[<p>As we’ve shown in previous blogs, Elastic&lt;sup&gt;®&lt;/sup&gt; provides a way to ingest and manage telemetry from the <a href="https://www.elastic.co/blog/kubernetes-cluster-metrics-logs-monitoring">Kubernetes cluster</a> and the <a href="https://www.elastic.co/blog/opentelemetry-observability">application</a> running on it. Elastic provides out-of-the-box dashboards to help with tracking metrics, <a href="https://www.elastic.co/blog/log-management-observability-operations">log management and analytics</a>, <a href="https://www.elastic.co/blog/adding-free-and-open-elastic-apm-as-part-of-your-elastic-observability-deployment">APM functionality</a> (which also supports <a href="https://www.elastic.co/blog/opentelemetry-observability">native OpenTelemetry</a>), and the ability to analyze everything with <a href="https://www.elastic.co/blog/observability-logs-machine-learning-aiops">AIOps features</a> and <a href="https://www.elastic.co/what-is/elasticsearch-machine-learning?elektra=home">machine learning</a> (ML). While you can use pre-existing <a href="https://www.elastic.co/blog/improving-information-retrieval-elastic-stack-search-relevance">ML models in Elastic</a>, <a href="https://www.elastic.co/blog/aiops-automation-analytics-elastic-observability-use-cases">out-of-the-box AIOps features</a>, or your own ML models, there is a need to dig deeper into the root cause of an issue.</p>
<p>Elastic helps reduce the operational work to support more efficient operations, but users still need a way to investigate and understand everything from the cause of an issue to the meaning of specific error messages. As an operations user, if you haven’t run into a particular error before or it's part of some runbook, you will likely go to Google and start searching for information.</p>
<p>OpenAI’s ChatGPT is becoming an interesting generative AI tool that helps provide more information using the models behind it. What if you could use OpenAI to obtain deeper insights (even simple semantics) for an error in your production or development environment? You can easily tie Elastic to OpenAI’s API to achieve this.</p>
<p>Kubernetes, a mainstay in most deployments (on-prem or in a cloud service provider) requires a significant amount of expertise — even if that expertise is to manage a service like GKE, EKS, or AKS.</p>
<p>In this blog, I will cover how you can use <a href="https://www.elastic.co/guide/en/kibana/current/watcher-ui.html">Elastic’s watcher</a> capability to connect Elastic to OpenAI and ask it for more information about the error logs Elastic is ingesting from a Kubernetes cluster(s). More specifically, we will use <a href="https://azure.microsoft.com/en-us/products/cognitive-services/openai-service">Azure’s OpenAI Service</a>. Azure OpenAI is a partnership between Microsoft and OpenAI, so the same models from OpenAI are available in the Microsoft version.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-azure-openai.png" alt="elastic azure openai" /></p>
<p>While this blog goes over a specific example, it can be modified for other types of errors Elastic receives in logs. Whether it's from AWS, the application, databases, etc., the configuration and script described in this blog can be modified easily.</p>
<h2>Prerequisites and config</h2>
<p>If you plan on following this blog, here are some of the components and details we used to set up the configuration:</p>
<ul>
<li>Ensure you have an account on <a href="http://cloud.elastic.co">Elastic Cloud</a> and a deployed stack (<a href="https://www.elastic.co/guide/en/elastic-stack/current/installing-elastic-stack.html">see instructions here</a>).</li>
<li>We used a GCP GKE Kubernetes cluster, but you can use any Kubernetes cluster service (on-prem or cloud based) of your choice.</li>
<li>We’re also running with a version of the OpenTelemetry Demo. Directions for using Elastic with OpenTelemetry Demo are <a href="https://github.com/elastic/opentelemetry-demo">here</a>.</li>
<li>We also have an Azure account and <a href="https://azure.microsoft.com/en-us/products/cognitive-services/openai-service">Azure OpenAI service configured</a>. You will need to get the appropriate tokens from Azure and the proper URL endpoint from Azure’s OpenAI service.</li>
<li>We will use <a href="https://www.elastic.co/guide/en/kibana/current/devtools-kibana.html">Elastic’s dev tools</a>, the console to be specific, to load up and run the script, which is an <a href="https://www.elastic.co/guide/en/kibana/current/watcher-ui.html">Elastic watcher</a>.</li>
<li>We will also add a new index to store the results from the OpenAI query.</li>
</ul>
<p>Here is the configuration we will set up in this blog:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-configuration.png" alt="Configuration to analyze Kubernetes cluster errors" /></p>
<p>As we walk through the setup, we’ll also provide the alternative setup with OpenAI versus Azure OpenAI Service.</p>
<h2>Setting it all up</h2>
<p>Over the next few steps, I’ll walk through:</p>
<ul>
<li>Getting an account on Elastic Cloud and setting up your K8S cluster and application</li>
<li>Gaining Azure OpenAI authorization (alternative option with OpenAI)</li>
<li>Identifying Kubernetes error logs</li>
<li>Configuring the watcher with the right script</li>
<li>Comparing the output from Azure OpenAI/OpenAI versus ChatGPT UI</li>
</ul>
<h3>Step 0: Create an account on Elastic Cloud</h3>
<p>Follow the instructions to <a href="https://cloud.elastic.co/registration?fromURI=/home">get started on Elastic Cloud</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-start-cloud-trial.png" alt="elastic start cloud trial" /></p>
<p>Once you have the Elastic Cloud login, set up your Kubernetes cluster and application. A complete step-by-step instructions blog is available <a href="https://www.elastic.co/blog/kubernetes-cluster-metrics-logs-monitoring">here</a>. This also provides an overview of how to see Kubernetes cluster metrics in Elastic and how to monitor them with dashboards.</p>
<h3>Step 1: Azure OpenAI Service and authorization</h3>
<p>When you log in to your Azure subscription and set up an instance of Azure OpenAI Service, you will be able to get your keys under Manage Keys.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-microsoft-azure-manage-keys.png" alt="microsoft azure manage keys" /></p>
<p>There are two keys for your OpenAI instance, but you only need KEY 1 .</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-pme-openai-keys-and-endpoint.png" alt="Used with permission from Microsoft." /></p>
<p>Additionally, you will need to get the service URL. See the image above with our service URL blanked out to understand where to get the KEY 1 and URL.</p>
<p>If you are not using Azure OpenAI Service and the standard OpenAI service, then you can get your keys at:</p>
<pre><code class="language-bash">**https** ://platform.openai.com/account/api-keys
</code></pre>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-api-keys.png" alt="api keys" /></p>
<p>You will need to create a key and save it. Once you have the key, you can go to Step 2.</p>
<h3>Step 2: Identifying Kubernetes errors in Elastic logs</h3>
<p>As your Kubernetes cluster is running, <a href="https://docs.elastic.co/en/integrations/kubernetes">Elastic’s Kubernetes integration</a> running on the Elastic agent daemon set on your cluster is sending logs and metrics to Elastic. <a href="https://www.elastic.co/blog/log-monitoring-management-enterprise">The telemetry is ingested, processed, and indexed</a>. Kubernetes logs are stored in an index called .ds-logs-kubernetes.container_logs-default-* (* is for the date), and an automatic data stream logs-kubernetes.container_logs is also pre-loaded. So while you can use some of the out-of-the-box dashboards to investigate the metrics, you can also look at all the logs in Elastic Discover.</p>
<p>While any error from Kubernetes can be daunting, the more nuanced issues occur with errors from the pods running in the kube-system namespace. Take the pod konnectivity agent, which is essentially a network proxy agent running on the node to help establish tunnels and is a vital component in Kubernetes. Any error will cause the cluster to have connectivity issues and lead to a cascade of issues, so it’s important to understand and troubleshoot these errors.</p>
<p>When we filter out for error logs from the konnectivity agent, we see a good number of errors.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-expanded-document.png" alt="expanded document" /></p>
<p>But unfortunately, we still can’t understand what these errors mean.</p>
<p>Enter OpenAI to help us understand the issue better. Generally, you would take the error message from Discover and paste it with a question in ChatGPT (or run a Google search on the message).</p>
<p>One error in particular that we’ve run into but do not understand is:</p>
<pre><code class="language-bash">E0510 02:51:47.138292       1 client.go:388] could not read stream err=rpc error: code = Unavailable desc = error reading from server: read tcp 10.120.0.8:46156-&gt;35.230.74.219:8132: read: connection timed out serverID=632d489f-9306-4851-b96b-9204b48f5587 agentID=e305f823-5b03-47d3-a898-70031d9f4768
</code></pre>
<p>The OpenAI output is as follows:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-openai-output.png" alt="openai output" /></p>
<p>ChatGPT has given us a fairly nice set of ideas on why this rpc error is occurring against our konnectivity-agent.</p>
<p>So how can we get this output automatically for any error when those errors occur?</p>
<h3>Step 3: Configuring the watcher with the right script</h3>
<p><a href="https://www.elastic.co/guide/en/kibana/current/watcher-ui.html">What is an Elastic watcher?</a> Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Watchers are helpful for analyzing mission-critical and business-critical streaming data. For example, you might watch application logs for errors causing larger operational issues.</p>
<p>Once a watcher is configured, it can be:</p>
<ol>
<li>Manually triggered</li>
<li>Run periodically</li>
<li>Created using a UI or a script</li>
</ol>
<p>In this scenario, we will use a script, as we can modify it easily and run it as needed.</p>
<p>We’re using the DevTools Console to enter the script and test it out:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-test-script.png" alt="test script" /></p>
<p>The script is listed at the end of the blog in the <strong>appendix</strong>. It can also be downloaded <a href="https://github.com/elastic/chatgpt-error-analysis"><strong>here</strong></a> <strong>.</strong></p>
<p>The script does the following:</p>
<ol>
<li>It runs continuously every five minutes.</li>
<li>It will search the logs for errors from the container konnectivity-agent.</li>
<li>It will take the first error’s message, transform it (re-format and clean up), and place it into a variable first_hit.</li>
</ol>
<pre><code class="language-json">&quot;script&quot;: &quot;return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\&quot;', \&quot;\&quot;)]&quot;
</code></pre>
<ol start="4">
<li>The error message is sent into OpenAI with a query:</li>
</ol>
<pre><code class="language-yaml">What are the potential reasons for the following kubernetes error:
  { { ctx.payload.second.first_hit } }
</code></pre>
<ol start="5">
<li>If the search yielded an error, it will proceed to then create an index and place the error message, pod.name (which is konnectivity-agent-6676d5695b-ccsmx in our setup), and OpenAI output into a new index called chatgpt_k8_analyzed.</li>
</ol>
<p>To see the results, we created a new data view called chatgpt_k8_analyzed against the newly created index:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-edit-data-view.png" alt="edit data view" /></p>
<p>In Discover, the output on the data view provides us with the analysis of the errors.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-analysis-of-errors.png" alt="analysis of errors" /></p>
<p>For every error the script sees in the five minute interval, it will get an analysis of the error. We could alternatively also use a range as needed to analyze during a specific time frame. The script would just need to be modified accordingly.</p>
<h3>Step 4. Output from Azure OpenAI/OpenAI vs. ChatGPT UI</h3>
<p>As you noticed above, we got relatively the same result from the Azure OpenAI API call as we did by testing out our query in the ChatGPT UI. This is because we configured the API call to run the same/similar model as what was selected in the UI.</p>
<p>For the API call, we used the following parameters:</p>
<pre><code class="language-json">&quot;request&quot;: {
             &quot;method&quot; : &quot;POST&quot;,
             &quot;Url&quot;: &quot;https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview&quot;,
             &quot;headers&quot;: {&quot;api-key&quot; : &quot;XXXXXXX&quot;,
                         &quot;content-type&quot; : &quot;application/json&quot;
                        },
             &quot;body&quot; : &quot;{ \&quot;messages\&quot;: [ { \&quot;role\&quot;: \&quot;system\&quot;, \&quot;content\&quot;: \&quot;You are a helpful assistant.\&quot;}, { \&quot;role\&quot;: \&quot;user\&quot;, \&quot;content\&quot;: \&quot;What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\&quot;}], \&quot;temperature\&quot;: 0.5, \&quot;max_tokens\&quot;: 2048}&quot; ,
              &quot;connection_timeout&quot;: &quot;60s&quot;,
               &quot;read_timeout&quot;: &quot;60s&quot;
                            }
</code></pre>
<p>By setting the role: system with You are a helpful assistant and using the gpt-35-turbo url portion, we are essentially setting the API to use the davinci model, which is the same as the ChatGPT UI model set by default.</p>
<p>Additionally, for Azure OpenAI Service, you will need to set the URL to something similar the following:</p>
<pre><code class="language-bash">https://YOURSERVICENAME.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview
</code></pre>
<p>If you use OpenAI (versus Azure OpenAI Service), the request call (against <a href="https://api.openai.com/v1/completions">https://api.openai.com/v1/completions</a>) would be as such:</p>
<pre><code class="language-json">&quot;request&quot;: {
            &quot;scheme&quot;: &quot;https&quot;,
            &quot;host&quot;: &quot;api.openai.com&quot;,
            &quot;port&quot;: 443,
            &quot;method&quot;: &quot;post&quot;,
            &quot;path&quot;: &quot;\/v1\/completions&quot;,
            &quot;params&quot;: {},
            &quot;headers&quot;: {
               &quot;content-type&quot;: &quot;application\/json&quot;,
               &quot;authorization&quot;: &quot;Bearer YOUR_ACCESS_TOKEN&quot;
                        },
            &quot;body&quot;: &quot;{ \&quot;model\&quot;: \&quot;text-davinci-003\&quot;,  \&quot;prompt\&quot;: \&quot;What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\&quot;,  \&quot;temperature\&quot;: 1,  \&quot;max_tokens\&quot;: 512,     \&quot;top_p\&quot;: 1.0,      \&quot;frequency_penalty\&quot;: 0.0,   \&quot;presence_penalty\&quot;: 0.0 }&quot;,
            &quot;connection_timeout_in_millis&quot;: 60000,
            &quot;read_timeout_millis&quot;: 60000
          }
</code></pre>
<p>If you are interested in creating a more OpenAI-based version, you can <a href="https://elastic-content-share.eu/downloads/watcher-job-to-integrate-chatgpt-in-elasticsearch/">download an alternative script</a> and look at <a href="https://mar1.hashnode.dev/unlocking-the-power-of-aiops-with-chatgpt-and-elasticsearch">another blog from an Elastic community member</a>.</p>
<h2>Gaining other insights beyond Kubernetes logs</h2>
<p>Now that the script is up and running, you can modify it using different:</p>
<ul>
<li>Inputs</li>
<li>Conditions</li>
<li>Actions</li>
<li>Transforms</li>
</ul>
<p>Learn more on how to modify it <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-alerting.html">here</a>. Some examples of modifications could include:</p>
<ol>
<li>Look for error logs from application components (e.g., cartService, frontEnd, from the OTel demo), cloud service providers (e.g., AWS/Azure/GCP logs), and even logs from components such as Kafka, databases, etc.</li>
<li>Vary the time frame from running continuously to running over a specific <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html">range</a>.</li>
<li>Look for specific errors in the logs.</li>
<li>Query for analysis on a set of errors at once versus just one, which we demonstrated.</li>
</ol>
<p>The modifications are endless, and of course you can run this with OpenAI rather than Azure OpenAI Service.</p>
<h2>Conclusion</h2>
<p>I hope you’ve gotten an appreciation for how Elastic Observability can help you connect to OpenAI services (Azure OpenAI, as we showed, or even OpenAI) to better analyze an error log message instead of having to run several Google searches and hunt for possible insights.</p>
<p>Here’s a quick recap of what we covered:</p>
<ul>
<li>Developing an Elastic watcher script that can be used to find and send Kubernetes errors into OpenAI and insert them into a new index</li>
<li>Configuring Azure OpenAI Service or OpenAI with the right authorization and request parameters</li>
</ul>
<p>Ready to get started? Sign up <a href="https://cloud.elastic.co/registration">for Elastic Cloud</a> and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.</p>
<h2>Appendix</h2>
<p>Watcher script</p>
<pre><code class="language-bash">PUT _watcher/watch/chatgpt_analysis
{
    &quot;trigger&quot;: {
      &quot;schedule&quot;: {
        &quot;interval&quot;: &quot;5m&quot;
      }
    },
    &quot;input&quot;: {
      &quot;chain&quot;: {
          &quot;inputs&quot;: [
              {
                  &quot;first&quot;: {
                      &quot;search&quot;: {
                          &quot;request&quot;: {
                              &quot;search_type&quot;: &quot;query_then_fetch&quot;,
                              &quot;indices&quot;: [
                                &quot;logs-kubernetes*&quot;
                              ],
                              &quot;rest_total_hits_as_int&quot;: true,
                              &quot;body&quot;: {
                                &quot;query&quot;: {
                                  &quot;bool&quot;: {
                                    &quot;must&quot;: [
                                      {
                                        &quot;match&quot;: {
                                          &quot;kubernetes.container.name&quot;: &quot;konnectivity-agent&quot;
                                        }
                                      },
                                      {
                                        &quot;match&quot; : {
                                          &quot;message&quot;:&quot;error&quot;
                                        }
                                      }
                                    ]
                                  }
                                },
                                &quot;size&quot;: &quot;1&quot;
                              }
                            }
                        }
                    }
                },
                {
                    &quot;second&quot;: {
                        &quot;transform&quot;: {
                            &quot;script&quot;: &quot;return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\&quot;', \&quot;\&quot;)]&quot;
                        }
                    }
                },
                {
                    &quot;third&quot;: {
                        &quot;http&quot;: {
                            &quot;request&quot;: {
                                &quot;method&quot; : &quot;POST&quot;,
                                &quot;url&quot;: &quot;https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview&quot;,
                                &quot;headers&quot;: {
                                    &quot;api-key&quot; : &quot;XXX&quot;,
                                    &quot;content-type&quot; : &quot;application/json&quot;
                                },
                                &quot;body&quot; : &quot;{ \&quot;messages\&quot;: [ { \&quot;role\&quot;: \&quot;system\&quot;, \&quot;content\&quot;: \&quot;You are a helpful assistant.\&quot;}, { \&quot;role\&quot;: \&quot;user\&quot;, \&quot;content\&quot;: \&quot;What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\&quot;}], \&quot;temperature\&quot;: 0.5, \&quot;max_tokens\&quot;: 2048}&quot; ,
                                &quot;connection_timeout&quot;: &quot;60s&quot;,
                                &quot;read_timeout&quot;: &quot;60s&quot;
                            }
                        }
                    }
                }
            ]
        }
    },
    &quot;condition&quot;: {
      &quot;compare&quot;: {
        &quot;ctx.payload.first.hits.total&quot;: {
          &quot;gt&quot;: 0
        }
      }
    },
    &quot;actions&quot;: {
        &quot;index_payload&quot; : {
            &quot;transform&quot;: {
                &quot;script&quot;: {
                    &quot;source&quot;: &quot;&quot;&quot;
                        def payload = [:];
                        payload.timestamp = new Date();
                        payload.pod_name = ctx.payload.first.hits.hits[0]._source.kubernetes.pod.name;
                        payload.error_message = ctx.payload.second.first_hit;
                        payload.chatgpt_analysis = ctx.payload.third.choices[0].message.content;
                        return payload;
                    &quot;&quot;&quot;
                }
            },
            &quot;index&quot; : {
                &quot;index&quot; : &quot;chatgpt_k8s_analyzed&quot;
            }
        }
    }
}
</code></pre>
<h3>Additional logging resources:</h3>
<ul>
<li><a href="https://www.elastic.co/getting-started/observability/collect-and-analyze-logs">Getting started with logging on Elastic (quickstart)</a></li>
<li><a href="https://www.elastic.co/guide/en/observability/current/logs-metrics-get-started.html">Ingesting common known logs via integrations (compute node example)</a></li>
<li><a href="https://docs.elastic.co/integrations">List of integrations</a></li>
<li><a href="https://www.elastic.co/blog/log-monitoring-management-enterprise">Ingesting custom application logs into Elastic</a></li>
<li><a href="https://www.elastic.co/blog/observability-logs-parsing-schema-read-write">Enriching logs in Elastic</a></li>
<li>Analyzing Logs with <a href="https://www.elastic.co/blog/reduce-mttd-ml-machine-learning-observability">Anomaly Detection (ML)</a> and <a href="https://www.elastic.co/blog/observability-logs-machine-learning-aiops">AIOps</a></li>
</ul>
<h3>Common use case examples with logs:</h3>
<ul>
<li><a href="https://youtu.be/ax04ZFWqVCg">Nginx log management</a></li>
<li><a href="https://www.elastic.co/blog/vpc-flow-logs-monitoring-analytics-observability">AWS VPC Flow log management</a></li>
<li><a href="https://www.elastic.co/blog/kubernetes-errors-observability-logs-openai">Using OpenAI to analyze Kubernetes errors</a></li>
<li><a href="https://youtu.be/Li5TJAWbz8Q">PostgreSQL issue analysis with AIOps</a></li>
</ul>
<p><em>In this blog post, we may have used third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.</em></p>
<p><em>Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.</em></p>
<p><em>Screenshots of Microsoft products used with permission from Microsoft.</em></p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/kubernetes-errors-observability-logs-openai/blog-elastic-configuration.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[LLM Observability with Elastic’s Azure AI Foundry Integration]]></title>
            <link>https://www.elastic.co/observability-labs/blog/llm-observability-azure-ai-foundry</link>
            <guid isPermaLink="false">llm-observability-azure-ai-foundry</guid>
            <pubDate>Fri, 25 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Gain comprehensive visibility into your generative AI workloads on Azure AI Foundry. Monitor token usage, latency, and cost, while leveraging built-in content filters to ensure safe and compliant application behavior—all with out-of-the-box observability powered by Elastic.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM Observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Azure AI Foundry, while minimizing downtime and keeping costs in check.</p>
<p>Elastic is expanding support for LLM Observability with Elastic Observability's new Azure AI Foundry integration. This is now available as a tech preview on Elastic Cloud. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models, such as <strong>GPT-4, Mistral, Llama</strong>, and thousands of others from leading AI companies and from Azure available through Azure AI Foundry. The new Azure AI Foundry Integration in Elastic Observability integration offers an out-of-the-box experience by simplifying the collection of metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Azure AI Foundry.</p>
<p>This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.</p>
<h2>Prerequisites</h2>
<p>To get started with the Azure AI Foundry integration, you will need:</p>
<ul>
<li>An account on Elastic Cloud and a deployed stack in Azure (<a href="https://azuremarketplace.microsoft.com/en-us/marketplace/apps/elastic.ec-azure-pp?ocid=Elastic-Microsoft-Partner-Page-Get-Started">see instructions here</a>). Ensure you are using version 9.0.0 or higher.</li>
<li>An Azure account with permissions to pull the necessary data from Azure and Azure AI Foundry. See details in our <a href="https://www.elastic.co/docs/reference/integrations/azure_ai_foundry">documentation</a>.</li>
</ul>
<h2>Configuring Azure AI Foundry Integration</h2>
<p>To collect logs and metrics from Azure AI Foundry ensure you properly configure Azure logs and metrics from the following links:</p>
<ul>
<li>
<p><a href="https://www.elastic.co/docs/reference/integrations/azure_metrics#setup">Configure to receive Azure Metrics</a> - This integration specifically collects Azure AI Foundry metrics which will come from the service, and ensure you have the client id, subscription id, and tenant id from Azure AI Foundry to collect metrics.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_metrics.png" alt="Azure AI Foundry metrics" /></p>
</li>
<li>
<p><a href="https://www.elastic.co/docs/reference/integrations/azure">Configure to receive Azure Logs</a> and more specifically ensure that you <a href="https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create">configure Azure event hub</a> to properly allow Elastic to ingest logs. Once you have the Azure event hub information, you will need it to configure the logs section of the Azure AI Foundry Integration.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_logs.png" alt="Azure AI Foundry logs" /></p>
</li>
</ul>
<h2>Maximize Visibility with Out-of-the-box dashboards</h2>
<p>Azure AI Foundry integration offers rich out-of-the-box visibility into the performance and usage information of models in Azure AI Foundry, including text and image models. There are several dashboards currently available. More will be coming as the integration goes to GA.</p>
<ul>
<li>Azure AI Foundry Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.</li>
<li>Azure AI Foundry Billing dashboard - which provides total costs and daily usage costs from Azure cognitive services.</li>
<li>Azure AI Foundry Advanced Monitoring - which focuses on logs generated by the Azure AI Foundry service when connected through the API Management Service. Provides request rate, error rate, model usage, latency, LLM prompt input, response completion.</li>
</ul>
<p>Each dashboard provides specific insights important to SREs. Here is a quick overview of some of these insights:</p>
<ul>
<li>
<p><strong>Model Usage and Token Trends</strong> – Visualize token consumption and completion counts by model, endpoint, and time window.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_tokens.png" alt="Azure AI Foundry token usage metrics" /></p>
</li>
<li>
<p><strong>Latency Metrics</strong> – Monitor average and percentile latency per prompt, per endpoint, and correlate with prompt types or user IDs.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_model_latency.png" alt="Azure AI Foundry latency metrics" /></p>
</li>
<li>
<p><strong>Cost Estimation</strong> – Estimate API usage cost based on token consumption and model pricing.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_billing.png" alt="Azure AI Foundry cost estimation metrics" /></p>
</li>
<li>
<p><strong>Prompt/Completion Logging</strong> – View prompt-response pairs for debugging and quality monitoring.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_prompt_response.png" alt="Azure AI Foundry prompt/completions metrics" /></p>
</li>
<li>
<p><strong>Content Filtering and Guardrails</strong> – See which prompts or completions are being filtered, and why.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_guardrails.png" alt="Azure AI Foundry guardrails metrics" />
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/azure_ai_foundry_prompt_filtered.png" alt="Azure AI Foundry guardrails prompt filtered" /></p>
</li>
</ul>
<p>You can drill into specific users or sessions, slice by model type or region, and export reports for usage reviews or compliance.</p>
<hr />
<h2>Try it out today</h2>
<p>The Azure AI Foundry Integration is currently available in Elastic Cloud (both serverless and hosted options). Sign up for a 7 day trial by signing up to Elastic Cloud directly or through Azure Marketplace.
Alternatively you can also deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from Azure Marketplace then spin up the new technical preview of Azure AI Foundry integration, open the curated dashboards in Kibana and start monitoring your Azure AI Foundry service!</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-ai-foundry/LLM-observability.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Optimizing Spend and Content Moderation on Azure OpenAI with Elastic]]></title>
            <link>https://www.elastic.co/observability-labs/blog/llm-observability-azure-openai-content-filter</link>
            <guid isPermaLink="false">llm-observability-azure-openai-content-filter</guid>
            <pubDate>Tue, 13 May 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[We have added further capabilities to the Azure OpenAI GA package, which now offer content filter monitoring and enhancements to the billing insights!]]></description>
            <content:encoded><![CDATA[<p>In a previous blog we showed you how to set up observability for your models hosted on Azure OpenAI using Elastic’s integration. We’ve expanded the integration to also include Azure OpenAI content filtering, and cost analysis for Azure OpenAI. If you previously onboarded the Azure OpenAI integration, just upgrade it and you will automatically get all new features we discuss in this blog. The enhanced integration now provides multiple dashboards including a general Azure OpenAI Overview, Azure Provisioned Throughput Unit dashboard, Azure Content filtering, and a dashboard for Azure OpenAI billing.</p>
<p>In this blog we will cover how to use Azure OpenAI Content Filtering and tracking Azure OpenAI usage costs. Let’s first review what these two capabilities from Azure OpenAI enable you to do:</p>
<h2>Azure OpenAI Content Filtering: Enhancing AI Safety</h2>
<p>Content filtering for Azure OpenAI plays a critical role in addressing AI safety challenges by helping to mitigate the risks associated with harmful or inappropriate content generated by AI models. By implementing robust content filtering mechanisms, organizations can proactively identify and filter out potentially harmful content, such as hate speech, misinformation, or violent imagery, before it is disseminated to users. This helps prevent the spread of harmful content and reduces the potential negative impact on individuals and communities.</p>
<p>Monitoring Azure OpenAI content filtering is essential for staying proactive in addressing emerging content moderation challenges. By closely monitoring the system, businesses can quickly detect any new types of harmful content or patterns of misuse that may arise. This enables organizations to stay ahead of potential content moderation issues and take timely action to protect their users and uphold their brand reputation.</p>
<h2>Tracking Azure OpenAI Usage Costs</h2>
<p>Monitoring Azure OpenAI model usage costs is crucial for managing budget and resource allocation effectively. By keeping track of usage costs, organizations can optimize their operations to avoid unnecessary expenses and ensure that they are getting the best value from their investment in AI technologies. Additionally, it helps in forecasting future expenses and aids in scaling resources according to the demand without compromising performance or incurring excessive costs. Effective monitoring also allows for transparency and accountability, enabling better decision-making in terms of AI deployment and utilization within Azure environments.</p>
<p>As we walk through this blog, we will provide you with prerequisites to set up and use the pre-configured dashboards for both of these capabilities, which are part of the Azure OpenAI integration.</p>
<h2>Prerequisites</h2>
<p>In order to follow along in this blog you will have to</p>
<ol>
<li>
<p>Set up and install the Azure billing integration to monitor the usage costs. Once the integration is installed, you can track the usage in the enhanced Azure OpenAI Billing dashboard.</p>
</li>
<li>
<p>Additionally, make sure you have enabled the Azure API Management service to access the Azure OpenAI models.</p>
</li>
</ol>
<h3>How to Use Azure API Management with Azure OpenAI:</h3>
<ul>
<li><strong>Provision an Azure OpenAI resource:</strong> Create an Azure OpenAI resource and select a model for your application.</li>
<li><strong>Create an API Management instance:</strong> Establish an Azure API Management instance to manage the Azure OpenAI APIs.</li>
<li><strong>Import the Azure OpenAI API:</strong> Import the Azure OpenAI API into your API Management instance using its OpenAPI specification.</li>
<li><strong>Configure Policies:</strong> Implement policies in API Management to manage request authentication, rate limiting, traffic shaping, and more.</li>
</ul>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_create_APM.png" alt="LLM Observability: Azure OpenAI Create API Management Service" /></p>
<h2>Steps to create a content filter for Azure OpenAI</h2>
<p>Before you set up observability for the content filtering, ensure that you have configured the Azure content filtering for your model. Follow the steps below to create an Azure OpenAI content filtering,</p>
<ol>
<li><strong>Access the Azure OpenAI service console:</strong>
<ul>
<li>Sign in to the Azure Console with the appropriate permissions and navigate to the Azure OpenAI service console.</li>
</ul>
</li>
<li><strong>Navigate to Safety + security:</strong>
<ul>
<li>From the left-hand menu, select <strong>Safety + security</strong>.</li>
</ul>
</li>
<li><strong>Create a New Content filter:</strong>
<ul>
<li>Select <strong>Create content filter</strong>.</li>
<li>Configure various content filter policies including the following
<ul>
<li><strong>Set input filter:</strong> Content will be annotated by category and blocked according to the threshold you set for prompts.</li>
<li><strong>Set output filter:</strong> Content will be annotated by category and blocked according to the threshold you set for response output.</li>
<li><strong>Blocklists:</strong> Define specific words or phrases to block.</li>
<li><strong>Deployments:</strong> Apply filters to model deployments.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Review and Create:</strong>
<ul>
<li>Review your settings and select Create to finalize the content filter configurations.</li>
</ul>
</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_create_content_filter.png" alt="LLM Observability: Azure OpenAI Create Content Filter" /></p>
<p>Customers can also configure content filters and create custom safety policies that are tailored to their use case requirements. The configurability feature allows customers to adjust the settings, separately for prompts and completions, to filter content for each content category at different severity levels.</p>
<h2>Content filter types</h2>
<ul>
<li>The content filtering categories,
<ul>
<li>(hate, sexual, violence, self-harm)</li>
<li>Other optional classification models aimed at detecting jailbreak risk and known content for text and code.</li>
</ul>
</li>
<li>Severity level within each content filter category,
<ul>
<li>(low, medium, high)</li>
<li>Content detected at the 'safe' severity level is labeled in annotations but isn't subject to filtering and isn't configurable.</li>
</ul>
</li>
</ul>
<h2>Understanding the pre-configured dashboard for Azure OpenAI Content Filtering</h2>
<p>Now that you have set up the filter, you can see what is being filtered in Elastic through the Azure OpenAI content filtering dashboard.</p>
<ol>
<li>Navigate to the Dashboard Menu – Select the <strong>Dashboard</strong> menu option in Elastic and search for <strong>[Azure OpenAI] Content Filtering Overview</strong> to open the dashboard.</li>
<li>Navigate to the Integrations Menu – Open the <strong>Integrations</strong> menu in Elastic, select <strong>Azure OpenAI</strong>, go to the <strong>Assets</strong> tab, and choose <strong>[Azure OpenAI] Content Filtering Overview</strong> from the dashboard assets.</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_content_filter_overview.png" alt="LLM Observability: Azure OpenAI Content Filtering Overview" /></p>
<p>The Azure OpenAI Content Filtering Overview dashboard in the Elastic integration provides insights into blocked requests, API latency, error rates. This dashboard also provides detailed breakdown of content being filtered by the content filtering policy.</p>
<h2>Content Filter overview</h2>
<p>When the content filtering system detects harmful content, you receive either an error on the API call if the prompt was deemed inappropriate, or the finish_reason on the response will be content_filter to signify that some of the completion was filtered.</p>
<p>This can be summarized as,</p>
<ul>
<li>
<p><strong>Prompt filters:</strong> The prompt content that is classified in the filtered category will return HTTP 400 error.</p>
</li>
<li>
<p><strong>Non-streaming completion:</strong> When the content is filtered, non-streaming completions calls won't return any content. In rare cases with longer responses, a partial result can be returned. In these cases, the finish_reason is updated.</p>
</li>
<li>
<p><strong>Streaming completion:</strong> For streaming completions calls, segments are returned back to the user as they're completed. The service continues streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.</p>
</li>
</ul>
<h2>Prompt and response where content has been blocked</h2>
<p>This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding completion response. The panel below gives a view on the responses after applying content filtering policy for prompts and completions.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_content_filter_logs.png" alt="LLM Observability: Azure OpenAI Content Filtered Logs" /></p>
<p>You can use the following code snippet  to start integrating your current prompt and settings into your application to test the content filter:</p>
<pre><code>chat_prompt = [
   {
       &quot;role&quot;: &quot;user&quot;,
       &quot;content&quot;: &quot;How to kill a mocking bird?&quot;
   }
]
</code></pre>
<p>After running the code, you can find the content being filtered by violence category with the severity level medium.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_content-filter_response.png" alt="LLM Observability: Azure OpenAI Content Filtered Response" /></p>
<h2>Content filtered by content source (Input &amp; Output)</h2>
<p>The content filtering system helps monitor and moderate different categories of content based on severity levels. The categories typically include things like adult content, offensive language, hate speech, violence, and more. The severity levels indicate the degree of sensitivity or potential harm associated with the content. This panel helps the user to effectively monitor and filter out inappropriate or harmful content to maintain a safe environment.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_content_filter_category_serverity.png" alt="LLM Observability: Azure OpenAI Content Filter Category &amp; Severity Level" /></p>
<p>These metrics can be categorized into the following groups:</p>
<ul>
<li><strong>Blocked requests by category:</strong> Provides insights into the total blocked requests by category.</li>
<li><strong>Severity distribution by categories:</strong> Monitors the blocked requests by categories and severity distribution. The severity distribution may be either low, medium or high.</li>
<li><strong>Content filtered categories:</strong> Provides insights into the content filtered categories over time.</li>
</ul>
<h2>Reviewing the Azure OpenAI Billing dashboard</h2>
<p>You can now look at what you are spending on Azure OpenAI.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/llm-observability-azure_openai_billing.png" alt="LLM Observability: Azure OpenAI Billing" /></p>
<p>Here is what you see on this dashboard:</p>
<ul>
<li><strong>Total costs:</strong> This measures the total usage cost across all the model deployments.</li>
<li><strong>Overall Usage by model:</strong> This tracks the total usage costs broken down by model.</li>
<li><strong>Daily usage:</strong> Monitors usage costs on a daily basis.</li>
<li><strong>Daily usage costs by model:</strong> Monitors daily usage costs broken down by model deployments.</li>
</ul>
<h2>Conclusion</h2>
<p>The Azure OpenAI integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Azure OpenAI along with content filtered responses. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.</p>
<p>Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-content-filter/LLM-observability.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[LLM Observability with Elastic: Azure OpenAI Part 2]]></title>
            <link>https://www.elastic.co/observability-labs/blog/llm-observability-azure-openai-v2</link>
            <guid isPermaLink="false">llm-observability-azure-openai-v2</guid>
            <pubDate>Fri, 23 Aug 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[We have added further capabilities to the Azure OpenAI GA package, which now offer prompt and response monitoring, PTU deployment performance tracking, and billing insights!]]></description>
            <content:encoded><![CDATA[<p>We recently announced GA of the Azure OpenAI integration. You can find details in our previous blog <a href="https://www.elastic.co/observability-labs/blog/llm-observability-azure-openai">LLM Observability: Azure OpenAI</a>.</p>
<p>Since then, we have added further capabilities to the Azure OpenAI GA package, which now offer prompt and response monitoring, PTU deployment performance tracking, and billing insights. Read on to learn more!</p>
<h2>Advanced Logging and Monitoring</h2>
<p>The initial GA release of the integration focused mainly on the native logs, to track the telemetry of the service by using <strong>cognitive services logging</strong>. This version of the Azure OpenAI integration allows you to process the advanced logs which gives a more holistic view of OpenAI resource usage.</p>
<p>To achieve this, you have to setup API Management services in Azure. The API Management service is a centralized place where you can put all OpenAI services endpoints to manage all of them end-to-end. Enable the API Management services and configure the Azure event hub to stream the logs.</p>
<p>To learn more about setting up the API Management service to access Azure OpenAI, please refer to the <a href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/openai/architecture/log-monitor-azure-openai">Azure documentation</a>.</p>
<p>By using advanced logging, you can collect the following log data:</p>
<ul>
<li>Request input text</li>
<li>Response output text</li>
<li>Content filter results</li>
<li>Usage Information
<ul>
<li>Input prompt tokens</li>
<li>Output completion tokens</li>
<li>Total tokens</li>
</ul>
</li>
</ul>
<p>Azure OpenAI integration now collects the API Management Gateway logs. When a question from the user goes to the API Management, it logs the questions and the responses from the GPT models.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-azure-openai-log-categories.png" alt="LLM Observability: Azure OpenAI Logs Overview" /></p>
<p>Here’s what a sample log looks like,
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-advance-log-monitoring.png" alt="LLM Observability: Azure OpenAI Advanced Logs" /></p>
<h3>Content filtered results</h3>
<p>Azure OpenAI’s content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. With Azure OpenAI model deployments, you can use the default content filter or create your own content filter.</p>
<p>Now, The integration collects the content filtered result logs. In this example let's create a custom filter in the Azure OpenAI Studio that generates an error log.</p>
<p>By leveraging the <strong>Azure Content Filters</strong>, you can create your own custom lists of terms or phrases to block or flag.
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-azure-content-filters.png" alt="LLM Observability: Azure OpenAI Set Content Filter" /></p>
<p>And the document ingested in Elastic would look like this:
<img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-content-filter-logs.png" alt="LLM Observability: Azure OpenAI Content Filter Logs" />
This screenshot provides insights into the content filtered request.</p>
<h2>PTU Deployment Monitoring</h2>
<p><a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput">Provisioned throughput units (PTU)</a> are units of model processing capacity that you can reserve and deploy for processing prompts and generating completions.</p>
<p>The curated dashboard for PTU Deployment gives comprehensive visibility into metrics such as request latency, active token usage, PTU utilization, and fine-tuning activities, offering a quick snapshot of your deployment's health and performance.</p>
<p>Here are the essential PTU metrics captured by default:</p>
<ul>
<li><strong>Time to Response:</strong> Time taken for the first response to appear after a user send a prompt.</li>
<li><strong>Active Tokens:</strong> Use this metric to understand your TPS or TPM based utilization for PTUs and compare to the benchmarks for target TPS or TPM scenarios.</li>
<li><strong>Provision-managed Utilization V2:</strong> Provides insights into utilization percentages, helping prevent overuse and ensuring efficient resource allocation.</li>
<li><strong>Prompt Token Cache Match Rate:</strong> The prompt token cache hit ratio expressed as a percentage.</li>
</ul>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-azure_open_ai_ptu_deployment.png" alt="LLM Observability: Azure OpenAI PTU Deployment Metrics Monitoring" /></p>
<h2>Using Billing for cost</h2>
<p>Using the curated overview dashboard you can now monitor the actual usage cost for the AI applications. You are one step away from processing the billing information.</p>
<p>You need to configure and install the <a href="https://www.elastic.co/docs/current/integrations/azure_billing">Azure billing metrics integration</a>. Once the installation is complete the usage cost is visualized for the cognitive services in the Azure OpenAI overview dashboard.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/llm-observability-azure_openai_billing_overview.png" alt="LLM Observability: Azure OpenAI Usage Cost Monitoring" /></p>
<h2>Try it out today</h2>
<p>Deploy a cluster on our <a href="https://www.elastic.co/cloud/elasticsearch-service">Elasticsearch Service</a> or <a href="https://www.elastic.co/downloads/">download</a> the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai-v2/LLM-observability.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[LLM Observability: Azure OpenAI]]></title>
            <link>https://www.elastic.co/observability-labs/blog/llm-observability-azure-openai</link>
            <guid isPermaLink="false">llm-observability-azure-openai</guid>
            <pubDate>Mon, 24 Jun 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[We are excited to announce the general availability of the Azure OpenAI Integration that provides comprehensive Observability into the performance and usage of the Azure OpenAI Service!]]></description>
            <content:encoded><![CDATA[<p>We are excited to announce the general availability of the <a href="https://www.elastic.co/integrations/data-integrations?solution=all-solutions&amp;category=azure">Azure OpenAI Integration</a> that provides comprehensive Observability into the performance and usage of the <a href="https://azure.microsoft.com/en-us/products/ai-services/openai-service">Azure OpenAI Service</a>! Also look at <a href="https://www.elastic.co/observability-labs/blog/llm-observability-azure-openai-v2">Part 2 of this blog</a></p>
<p>While we have offered <a href="https://www.elastic.co/observability-labs/blog/monitor-openai-api-gpt-models-opentelemetry">visibility into LLM environments</a> for a while now, the addition of our Azure OpenAI integration enables richer out-of-the-box visibility into the performance and usage of your Azure OpenAI based applications, further enhancing LLM Observability.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai/llm-observability-azure-openai-monitoring.png" alt="LLM Observability: Azure OpenAI Monitoring" /></p>
<p>The Azure OpenAI integration leverages <a href="https://www.elastic.co/elastic-agent">Elastic Agent</a>’s Azure integration capabilities to collect both logs (using <a href="https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/stream-monitoring-data-event-hubs">Azure EventHub</a>) and metrics (using <a href="https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/metrics-index">Azure Monitor</a>) to provide deep visibility on the usage of the <a href="https://azure.microsoft.com/en-us/products/ai-services/openai-service">Azure OpenAI Service</a>.</p>
<p>The integration includes an out-of-the-box dashboard that summarizes the most relevant aspects of the service usage, including request and error rates, token usage and chat completion latency.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai/llm-observability-azure-openai-monitoring-overview.png" alt="LLM Observability: Azure OpenAI Monitoring Overview" /></p>
<h2>Creating Alerts and SLOs to monitor Azure OpenAI</h2>
<p>As with every other Elastic integration, all the <a href="https://www.elastic.co/docs/current/integrations/azure_openai#logs">logs</a> and <a href="https://www.elastic.co/docs/current/integrations/azure_openai#metrics">metrics</a> information is fully available to leverage in every capability in <a href="https://www.elastic.co/observability">Elastic Observability</a>, including <a href="https://www.elastic.co/guide/en/observability/current/slo.html">SLOs</a>, <a href="https://www.elastic.co/guide/en/observability/current/create-alerts.html">alerting</a>, custom <a href="https://www.elastic.co/guide/en/kibana/current/dashboard.html">dashboards</a>, in-depth <a href="https://www.elastic.co/guide/en/observability/current/monitor-logs.html">logs exploration</a>, etc.</p>
<p>To create an alert to monitor token usage, for example, start with the Custom Threshold rule on the Azure OpenAI datastream and set an aggregation condition to track and report violations of token usage past a certain threshold.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai/llm-observability-azure-openai-create-alert.png" alt="LLM Observability: Azure OpenAI Monitoring Alert Creation" /></p>
<p>When a violation occurs, the Alert Details view linked in the alert notification for that alert provides rich context surrounding the violation, such as when the violation started, its current status, and any previous history of such violations, enabling quick triaging, investigation and root cause analysis.</p>
<p>Similarly, to create an SLO to monitor error rates in Azure OpenAI calls, start with the custom query SLI definition adding in the good events to be any result signature at or above 400 over a total value that includes all responses. Then, by setting an appropriate SLO target such as 99%, start monitoring your Azure OpenAI error rate SLO over a period of 7, 30, or 90 days to track degradation and take action before it becomes a pervasive problem.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai/llm-observability-azure-openai-create-slo.png" alt="LLM Observability: Azure OpenAI Monitoring SLO Creation" /></p>
<p>Please refer to the <a href="https://www.elastic.co/guide/en/observability/current/monitor-azure-openai.html">User Guide</a> to learn more and to get started!</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/llm-observability-azure-openai/AI_fingertip_touching_human_fingertip.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Migrate Logstash Pipelines from Azure Event Hubs to Kafka Input Plugin]]></title>
            <link>https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin</link>
            <guid isPermaLink="false">migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin</guid>
            <pubDate>Mon, 06 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Step-by-step guide to migrating Logstash pipelines from the Azure Event Hubs plugin to the Kafka input plugin to eliminate offset storage costs and improve performance.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>Azure Event Hubs natively supports the Apache Kafka protocol, which means you no longer need the <code>logstash-input-azure_event_hubs</code> plugin or an external Blob Storage account for offset checkpointing. Switching to <code>logstash-input-kafka</code> removes that storage dependency, reduces costs, and delivers up to 2.5x higher throughput.</p>
<p>This guide walks you through the migration: why it matters, how to convert your existing configuration, parameter mapping between the two plugins, and how to adapt proxy setups.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin/amqp-vs-kafka.png" alt="AMQP vs Kafka protocol path comparison for Logstash with Azure Event Hubs" /></p>
<h2>Why migrate?</h2>
<p>The migration from the Azure Event Hubs plugin to the Kafka input plugin is motivated by several factors:</p>
<ol>
<li>
<p><strong>Azure Event Hubs already speaks Kafka natively.</strong> Event Hubs exposes a <a href="https://learn.microsoft.com/en-us/azure/event-hubs/azure-event-hubs-kafka-overview">built-in Apache Kafka endpoint</a> on Standard, Premium, and Dedicated tiers. This means the <code>logstash-input-azure_event_hubs</code> plugin is no longer necessary. The standard <code>logstash-integration-kafka</code> (input) plugin connects directly to the same service with no extra Azure-side configuration.</p>
</li>
<li>
<p><strong>No more Blob Storage for offset checkpointing.</strong> The AMQP-based plugin requires an <a href="https://learn.microsoft.com/en-us/azure/event-hubs/event-processor-balance-partition-load#checkpoint">external Azure Blob Storage account</a> to track consumer offsets. This means provisioning and maintaining a storage account, plus paying for every checkpoint write. With the Kafka protocol, <a href="https://learn.microsoft.com/en-us/azure/event-hubs/apache-kafka-frequently-asked-questions#event-hubs-consumer-group-vs--kafka-consumer-group">offset tracking is handled internally by Azure Event Hubs at no extra cost</a>, removing the need for external storage.</p>
</li>
<li>
<p><strong>GPv1 storage retirement is coming, and GPv2 costs more.</strong> Microsoft will <a href="https://learn.microsoft.com/en-us/azure/storage/common/general-purpose-version-1-account-migration-overview">retire general-purpose v1 storage accounts in October 2026</a>. Accounts not manually <a href="https://learn.microsoft.com/en-us/azure/storage/common/storage-account-upgrade">upgraded to GPv2</a> by then will be migrated automatically. The <code>logstash-input-azure_event_hubs</code> plugin works correctly with GPv2, so existing pipelines will not break. However, GPv2 can bring <a href="https://learn.microsoft.com/en-us/azure/storage/common/storage-account-upgrade#billing-impact-of-upgrading">higher transactional costs</a>, especially for checkpoint-heavy workloads. By switching to the Kafka input plugin, this concern is eliminated: no storage account means nothing to upgrade and nothing to pay for.</p>
<p><strong>Not ready to migrate yet? Reducing GPv2 costs in the meantime is possible.</strong> GPv2 transaction pricing is significantly more expensive than GPv1's flat rate. Increasing the <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-checkpoint_interval"><code>checkpoint_interval</code></a> setting above its default of 5 seconds reduces write operations and lowers the cost impact. The cost difference can be estimated using the <a href="https://azure.microsoft.com/en-us/pricing/calculator/">Azure Pricing Calculator</a>.</p>
<p>Example for East US and Local Retention Storage. Write operation cost comparison (per 10,000 write operations):</p>
<ul>
<li>
<p><strong>GPv1 (flat):</strong> $0.00036</p>
</li>
<li>
<p><strong>GPv2 (Hot tier):</strong> $0.050</p>
</li>
</ul>
<p>That's roughly a 140x increase in write operation costs.</p>
</li>
<li>
<p><strong>Broader community and active maintenance.</strong> The Kafka input plugin is more widely used across Logstash deployments and receives regular updates aligned with the Kafka ecosystem. Moving to it reduces long-term operational risk and keeps your pipeline on a well-supported path.</p>
</li>
<li>
<p><strong>Better throughput.</strong> The Kafka input plugin consistently outperforms the Azure Event Hubs plugin when consuming from the same namespace. See the <a href="#performance-comparison">Performance Comparison</a> section for measured results.</p>
</li>
</ol>
<h2>Requirements to enable the Kafka interface</h2>
<p>The Kafka interface is built into Azure Event Hubs. You don't need to enable or configure anything in the Azure portal.</p>
<p>The only requirement is that your Event Hubs namespace is on the <strong>Standard</strong>, <strong>Premium</strong>, or <strong>Dedicated</strong> tier. The Basic tier does not support the Kafka protocol.</p>
<p>See the <a href="https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-quotas#basic-vs-standard-vs-premium-vs-dedicated-tiers">Tiers comparison table</a> for details.</p>
<h2>Converting your configuration</h2>
<p>This section walks through converting an existing <code>logstash-input-azure_event_hubs</code> configuration to <code>logstash-input-kafka</code>, starting with the simplest single-hub scenario and building up to multi-hub and advanced use cases.</p>
<h3>Key behavior changes</h3>
<p>Before changing any configuration, be aware of two important differences:</p>
<ol>
<li>
<p><strong>No more Blob Storage for offsets.</strong> The Kafka input plugin tracks offsets internally through the Azure Event Hubs service at no extra cost. The <code>storage_connection</code> and <code>storage_container</code> parameters have no equivalent. There is nothing to provision, maintain, or pay for.</p>
</li>
<li>
<p><strong>Consumer offsets don't carry over.</strong> AMQP consumer groups and Kafka consumer groups are completely separate, even if they share the same name. When the Kafka input plugin connects for the first time, Azure auto-creates the Kafka consumer group specified in <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-group_id"><code>group_id</code></a> (default: <code>logstash</code>). <strong>It will not read the old Blob Storage checkpoints or resume from where the legacy plugin left off.</strong> It starts fresh.</p>
</li>
</ol>
<table>
<thead>
<tr>
<th></th>
<th>Event Hubs (AMQP) consumer groups</th>
<th>Kafka consumer groups</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Protocol</strong></td>
<td>AMQP</td>
<td>Kafka</td>
</tr>
<tr>
<td><strong>Offset storage</strong></td>
<td>External Azure Blob Storage</td>
<td>Internal to the Event Hubs service</td>
</tr>
<tr>
<td><strong>Creation</strong></td>
<td>Must be created via portal, SDK, or ARM</td>
<td>Auto-created on first connection</td>
</tr>
<tr>
<td><strong>Namespace scope</strong></td>
<td>Scoped to a single Event Hub</td>
<td>Span the entire namespace</td>
</tr>
</tbody>
</table>
<p><strong>Limit:</strong> A maximum of 1,000 simultaneous Kafka consumer groups per namespace is allowed. See the <a href="https://learn.microsoft.com/en-us/azure/event-hubs/apache-kafka-frequently-asked-questions#event-hubs-consumer-group-vs--kafka-consumer-group">Event Hubs vs. Kafka Consumer Groups FAQ</a>.</p>
<h3>Authentication</h3>
<p>The <code>logstash-input-azure_event_hubs</code> plugin only supports <strong>SAS (Shared Access Signature)</strong> authentication via connection strings. The same SAS credentials work with the Kafka plugin through SASL PLAIN, as shown in the <a href="#single-event-hub-basic-migration">Single Event Hub (basic migration)</a> example below.</p>
<h3>Single Event Hub (basic migration)</h3>
<p>Most pipelines start with a single Event Hub, SAS authentication, and Blob Storage checkpointing. The following example shows the baseline <code>azure_event_hubs</code> configuration and its direct Kafka equivalent.</p>
<p><strong>Before</strong> (legacy Azure Event Hubs input):</p>
<pre><code class="language-ruby">input {
  azure_event_hubs {
    event_hub_connections =&gt; [&quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;ACCESS_KEY_NAME&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;;EntityPath=&lt;EVENT_HUB_NAME&gt;&quot;]
    storage_connection =&gt; &quot;DefaultEndpointsProtocol=https;AccountName=&lt;STORAGE_ACCOUNT_NAME&gt;;AccountKey=&lt;STORAGE_ACCOUNT_KEY&gt;;EndpointSuffix=core.windows.net&quot;
    consumer_group =&gt; &quot;&lt;CONSUMER_GROUP_NAME&gt;&quot;
    storage_container =&gt; &quot;&lt;STORAGE_NAME&gt;&quot;
  }
}
</code></pre>
<p><strong>After</strong> (Kafka input):</p>
<pre><code class="language-ruby">input {
  kafka {
    # The Namespace name and the mandatory Kafka SSL port
    bootstrap_servers =&gt; &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    
    topics =&gt; [&quot;&lt;EVENT_HUB_NAME&gt;&quot;]
    group_id =&gt; &quot;&lt;KAFKA_CONSUMER_GROUP_NAME&gt;&quot;
    security_protocol =&gt; &quot;SASL_SSL&quot;
    sasl_mechanism =&gt; &quot;PLAIN&quot;
    
    # Need to create a 'jaas.conf' file storing Username and Password (username is always '$ConnectionString')
    jaas_path =&gt; &quot;path/to/jaas.conf&quot;
  }
}
</code></pre>
<pre><code class="language-text">KafkaClient {
    org.apache.kafka.common.security.plain.PlainLoginModule required
    username=&quot;$ConnectionString&quot; 
    password=&quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;ACCESS_KEY_NAME&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;&quot;;
};
</code></pre>
<pre><code class="language-ruby"># Inline JAAS configuration (substitutes jaas_path)
    sasl_jaas_config =&gt; &quot;org.apache.kafka.common.security.plain.PlainLoginModule required username='$ConnectionString' password='Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;ACCESS_KEY_NAME&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;';&quot;
</code></pre>
<h3>Multiple Event Hubs with a single Kafka input</h3>
<p>If your SAS policy has <strong>namespace-level read rights</strong> (not just a single Event Hub), you can consume from multiple Event Hubs with a single <code>kafka</code> input by listing multiple topics:</p>
<pre><code class="language-ruby">input {
  kafka {
    bootstrap_servers =&gt; &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    topics =&gt; [&quot;&lt;EVENT_HUB_1&gt;&quot;, &quot;&lt;EVENT_HUB_2&gt;&quot;, &quot;&lt;EVENT_HUB_3&gt;&quot;]
    group_id =&gt; &quot;&lt;KAFKA_CONSUMER_GROUP_NAME&gt;&quot;
    security_protocol =&gt; &quot;SASL_SSL&quot;
    sasl_mechanism =&gt; &quot;PLAIN&quot;
    jaas_path =&gt; &quot;path/to/jaas.conf&quot;
  }
}
</code></pre>
<h2>Configuration parameters mapping</h2>
<p>The following section maps each <code>logstash-input-azure_event_hubs</code> parameter to its <code>logstash-input-kafka</code> equivalent, with usage notes and example configurations.</p>
<ol>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-config_mode"><code>config_mode</code></a>: No direct equivalent. Kafka doesn't have &quot;basic&quot; vs &quot;advanced&quot; modes. To consume from multiple hubs with different settings, define multiple <code>kafka {}</code> input blocks or list multiple topics. The basic mode conversion is covered in <a href="#single-event-hub-basic-migration">Single Event Hub (basic migration)</a>.</p>
<p>Here is an advanced-mode example with two Event Hubs in the same namespace:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        config_mode =&gt; &quot;advanced&quot;
        storage_connection =&gt; &quot;DefaultEndpointsProtocol=https;AccountName=&lt;STORAGE_ACCOUNT&gt;;...&quot;
        event_hubs =&gt; [
            {&quot;&lt;EVENT_HUB_1&gt;&quot; =&gt; {
                event_hub_connection =&gt; &quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;KEY_1&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;;EntityPath=&lt;EVENT_HUB_1&gt;&quot;
                consumer_group =&gt; &quot;&lt;CONSUMER_GROUP_1&gt;&quot;
            }},
            {&quot;&lt;EVENT_HUB_2&gt;&quot; =&gt; {
                event_hub_connection =&gt; &quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;KEY_2&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;;EntityPath=&lt;EVENT_HUB_2&gt;&quot;
                consumer_group =&gt; &quot;&lt;CONSUMER_GROUP_2&gt;&quot;
            }}
        ]
    }
}
</code></pre>
<pre><code class="language-ruby">input {
    kafka {
        bootstrap_servers =&gt; &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
        topics =&gt; [&quot;&lt;EVENT_HUB_1&gt;&quot;]
        group_id =&gt; &quot;&lt;KAFKA_CONSUMER_GROUP_1&gt;&quot;
        security_protocol =&gt; &quot;SASL_SSL&quot;
        sasl_mechanism =&gt; &quot;PLAIN&quot;
        sasl_jaas_config =&gt; &quot;...&lt;KEY_1&gt;...&quot;
    }
    kafka {
        bootstrap_servers =&gt; &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
        topics =&gt; [&quot;&lt;EVENT_HUB_2&gt;&quot;]
        group_id =&gt; &quot;&lt;KAFKA_CONSUMER_GROUP_2&gt;&quot;
        security_protocol =&gt; &quot;SASL_SSL&quot;
        sasl_mechanism =&gt; &quot;PLAIN&quot;
        sasl_jaas_config =&gt; &quot;...&lt;KEY_2&gt;...&quot;
    }
}
</code></pre>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-checkpoint_interval"><code>checkpoint_interval</code></a>: This corresponds to <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-auto_commit_interval_ms"><code>auto_commit_interval_ms</code></a>.</p>
<p>In the Azure plugin, this controls how often a write operation hits the Blob Storage container to save the reading offset. In the Kafka plugin, it controls how often the consumer commits its offset to the Event Hubs service.</p>
<p><strong>Note</strong> Keep <code>enable_auto_commit</code> set to <code>true</code> (default) while configuring <code>auto_commit_interval_ms</code> parameter.</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        # ... other params ...
        checkpoint_interval =&gt; 10 # in seconds
    }
}
</code></pre>
<p>Kafka equivalent:</p>
<pre><code class="language-ruby">input {
    kafka {
        # ... other params ...
        auto_commit_interval_ms =&gt; 10000 # in milliseconds 
    }
}
</code></pre>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-decorate_events"><code>decorate_events</code></a>: This parameter exists in both plugins with the same name and behavior.</p>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-initial_position"><code>initial_position</code></a>: This corresponds to <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-auto_offset_reset"><code>auto_offset_reset</code></a>.</p>
<p>Both parameters control where to start reading when no prior offset is found at checkpoint storage. Options differ slightly:</p>
<ul>
<li>
<p>Azure: <code>beginning</code>, <code>end</code>, <code>look_back</code></p>
</li>
<li>
<p>Kafka: <code>earliest</code>, <code>latest</code>, <code>by_duration:&lt;duration&gt;</code>, <code>none</code></p>
</li>
</ul>
<p>The difference between beginning-end and earliest-latest is purely terminology.</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        initial_position =&gt; &quot;beginning&quot;
    }
}
</code></pre>
<p>Kafka equivalent:</p>
<pre><code class="language-ruby">input {
    kafka {
        auto_offset_reset =&gt; &quot;earliest&quot;
    }
}
</code></pre>
<table>
<thead>
<tr>
<th>Azure Value</th>
<th>Kafka Value</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>beginning</code></td>
<td><code>earliest</code></td>
<td></td>
</tr>
<tr>
<td><code>end</code></td>
<td><code>latest</code></td>
<td></td>
</tr>
<tr>
<td><code>look_back</code></td>
<td><code>by_duration:&lt;duration&gt;</code></td>
<td>Duration in ISO 8601 format (e.g., <code>by_duration:PT1H</code> for 1 hour). Requires <code>logstash-integration-kafka</code> 12.1.0+.</td>
</tr>
</tbody>
</table>
<p>The <code>by_duration</code> option was introduced in Apache Kafka client 4.0.0 and is available in <code>logstash-integration-kafka</code> version 12.1.0 and later. The version bundled with the latest Logstash release is older than 12.1.0, so a manual gem update is needed:</p>
<pre><code class="language-bash">&lt;LOGSTASH_HOME&gt;/bin/logstash-plugin install --version 12.1.0 logstash-integration-kafka
</code></pre>
<p>Replace <code>&lt;LOGSTASH_HOME&gt;</code> with the Logstash installation directory (e.g., <code>/usr/share/logstash</code> for DEB/RPM packages).</p>
<p><strong>Note:</strong> Since Kafka can't read the old Blob Storage checkpoints, it treats the migration as a first-time connection. To avoid reprocessing data the legacy plugin already handled, set <code>auto_offset_reset =&gt; &quot;latest&quot;</code> for the initial deployment.</p>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-max_batch_size"><code>max_batch_size</code></a>: This corresponds to <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-max_poll_records"><code>max_poll_records</code></a>.</p>
<p>Both parameters define the maximum number of messages to fetch in a single poll/batch operation.</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        max_batch_size =&gt; 125
    }
}
</code></pre>
<p>Kafka equivalent:</p>
<pre><code class="language-ruby">input {
    kafka {
        max_poll_records =&gt; &quot;125&quot;
    }
}
</code></pre>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-threads"><code>threads</code></a>: This corresponds to <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-consumer_threads"><code>consumer_threads</code></a>.</p>
<p>Both parameters control the number of threads used to consume messages concurrently. In Azure, the minimum is 2 (with 1 Event Hub + 1), while in Kafka the default is 1 thread.</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        threads =&gt; 8
    }
}
</code></pre>
<p>Kafka equivalent:</p>
<pre><code class="language-ruby">input {
    kafka {
        consumer_threads =&gt; 8
    }
}
</code></pre>
</li>
</ol>
<h2>Performance Comparison</h2>
<p>We tested both plugins under identical conditions: same Logstash instance, same Event Hub namespace, same number of partitions, and same batch/thread configuration. The absolute numbers are environment-specific, but the relative difference is what matters.</p>
<table>
<thead>
<tr>
<th><strong>Plugin</strong></th>
<th><strong>Payload</strong></th>
<th><strong>Throughput (events/s)</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><code>azure_event_hubs</code></td>
<td>100B</td>
<td>~5700</td>
</tr>
<tr>
<td><code>kafka</code></td>
<td>100B</td>
<td>~14500</td>
</tr>
<tr>
<td><code>azure_event_hubs</code></td>
<td>1KB</td>
<td>~1500</td>
</tr>
<tr>
<td><code>kafka</code></td>
<td>1KB</td>
<td>~3200</td>
</tr>
<tr>
<td><code>azure_event_hubs</code></td>
<td>10KB</td>
<td>~170</td>
</tr>
<tr>
<td><code>kafka</code></td>
<td>10KB</td>
<td>~290</td>
</tr>
</tbody>
</table>
<p>Across all payload sizes, the Kafka input plugin delivers 1.7x to 2.5x higher throughput. The gain is most noticeable with small payloads, where protocol overhead dominates. Beyond the infrastructure simplification (no Blob Storage, no GPv2 concerns), you also get a clear performance win.</p>
<h2>Proxy connection configuration</h2>
<blockquote>
<p>If the Logstash instance connects directly to Azure Event Hubs without a proxy, this section can be skipped.</p>
</blockquote>
<p>Proxy setups require special attention during this migration because the two plugins use fundamentally different protocols.</p>
<h3>Azure Event Hubs plugin setup (reference)</h3>
<p>The <code>logstash-input-azure_event_hubs</code> plugin supports HTTPS proxies. The setup involves:</p>
<ol>
<li>
<p>Set the proxy environment variable:</p>
<pre><code class="language-bash">export https_proxy=&quot;https://my_proxy:8080&quot;
</code></pre>
</li>
<li>
<p>Add the WebSockets transport flag to the Event Hubs connection string:</p>
<pre><code class="language-text">;TransportType=AmqpWebSockets
</code></pre>
</li>
<li>
<p>Add the following JVM options (Logstash <code>jvm.options</code>):</p>
<pre><code class="language-text">-Dhttp.proxyHost=my_proxy
-Dhttp.proxyPort=8080
-Dhttps.proxyHost=my_proxy
-Dhttps.proxyPort=8443
-Dhttp.nonProxyHosts=localhost|127.0.0.1
</code></pre>
</li>
</ol>
<h3>Migrating to a TCP (Layer 4) proxy</h3>
<p>The proxy setup from the Azure plugin is not compatible with the Kafka client. The Azure plugin communicates over AMQP/WebSockets (HTTP layer), which is why JVM proxy settings and <code>TransportType=AmqpWebSockets</code> work. The Kafka plugin opens a raw TCP socket to the broker. It never makes an HTTP request, so <strong>JVM HTTP proxy settings are ignored entirely</strong>. If the environment requires a proxy, the HTTP proxy needs to be replaced with a TCP (Layer 4) proxy.</p>
<h4>Step 1: Configure <code>/etc/hosts</code></h4>
<p>The Kafka client verifies that the TLS certificate matches the hostname in <code>bootstrap_servers</code>. Since the certificate is issued for <code>*.servicebus.windows.net</code>, <code>bootstrap_servers</code> must use the real Event Hubs FQDN, not the proxy address. A DNS override routes the FQDN to the proxy IP:</p>
<pre><code class="language-text"># /etc/hosts
&lt;PROXY_HOST_IP&gt;  &lt;NAMESPACE&gt;.servicebus.windows.net
</code></pre>
<h4>Step 2: Logstash configuration</h4>
<p>The Logstash configuration is identical to a non-proxied setup. The <code>/etc/hosts</code> override transparently routes traffic through the proxy, so <code>bootstrap_servers</code> still uses the Event Hubs FQDN:</p>
<pre><code class="language-ruby">input {
  kafka {
    bootstrap_servers =&gt; &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    topics =&gt; [&quot;&lt;EVENT_HUB_NAME&gt;&quot;]
    security_protocol =&gt; &quot;SASL_SSL&quot;
    sasl_mechanism =&gt; &quot;PLAIN&quot;
    group_id =&gt; &quot;&lt;GROUP_ID&gt;&quot;
    jaas_path =&gt; &quot;&lt;PATH_TO_JAAS_FILE&gt;&quot;
  }
}
</code></pre>
<p>If the TCP proxy runs on the same host as Logstash or within a trusted network segment, the DNS override is not needed. Instead, point <code>bootstrap_servers</code> directly to the proxy IP (e.g., <code>&lt;PROXY_HOST_IP&gt;:9093</code>) and change <code>security_protocol</code> to <code>SASL_PLAINTEXT</code>. This delegates the TLS handshake to the proxy, while the link between Logstash and the proxy stays unencrypted. Only use this configuration when the Logstash-to-proxy path is secure.</p>
<pre><code class="language-ruby">input {
  kafka {
    bootstrap_servers =&gt; &quot;&lt;PROXY_HOST_IP&gt;:9093&quot;
    security_protocol =&gt; &quot;SASL_PLAINTEXT&quot;
  }
}
</code></pre>
<h2>Frequently asked questions</h2>
<p><strong>Are events lost when switching from the Azure Event Hubs plugin to the Kafka plugin?</strong></p>
<p>No. Events remain available within the configured retention period regardless of which protocol reads them. What changes is where the consumer starts reading. Since the Kafka plugin cannot access the AMQP plugin's Blob Storage checkpoints, it starts from scratch. Set <code>auto_offset_reset =&gt; &quot;earliest&quot;</code> to reprocess all retained events, or <code>auto_offset_reset =&gt; &quot;latest&quot;</code> to consume only new ones from the switchover point. See the <a href="#configuration-parameters-mapping"><code>initial_position</code> mapping</a> for details.</p>
<p><strong>What happens to the Azure Blob Storage account after migration?</strong></p>
<p>It is no longer needed for offset checkpointing. Once the Kafka plugin is confirmed to be consuming correctly and the <code>azure_event_hubs</code> input has been decommissioned, the storage account (or at least the checkpoint container) can be safely deleted. If the storage account is used for other purposes, only remove the specific container referenced in <code>storage_container</code>.</p>
<p><strong>Can the same consumer group name be reused?</strong></p>
<p>Yes, but it has no practical effect. AMQP and Kafka consumer groups are completely independent even if they share the same name. They use different protocols, different offset storage, and different scoping rules. Reusing the name will not cause the Kafka plugin to resume from the AMQP plugin's last checkpoint.</p>
<p><strong>Are other authentication methods supported?</strong></p>
<p>The <code>logstash-input-azure_event_hubs</code> plugin only supports SAS connection strings, so SAS is the only credential that needs to be carried over. There is no Entra ID, OAUTHBEARER, or managed identity configuration to migrate. The <code>logstash-input-kafka</code> plugin does support SASL OAUTHBEARER, so adopting token-based authentication becomes possible after migration.</p>
<p><strong>What if the proxy only allows traffic on port 443?</strong></p>
<p>The Kafka endpoint on Azure Event Hubs requires port 9093. If the TCP proxy only forwards port 443, it must be reconfigured to also allow port 9093 for the Event Hubs FQDN (<code>*.servicebus.windows.net</code>). Azure Event Hubs does not expose a Kafka listener on port 443.</p>
<h2>Next steps</h2>
<p>With the GPv1 retirement deadline (October 2026) approaching, starting this migration sooner reduces the time spent managing storage infrastructure that is no longer needed.</p>
<p>If any issues arise during migration:</p>
<ul>
<li>
<p><strong>Usage questions or help with configuration</strong>: Post on the <a href="https://discuss.elastic.co/c/logstash/">Elastic Discuss forum</a>.</p>
</li>
<li>
<p><strong>Bugs or unexpected behavior in the Kafka plugin</strong>: Open an issue in the <a href="https://github.com/logstash-plugins/logstash-integration-kafka/issues">logstash-integration-kafka</a>.</p>
</li>
</ul>
<h2>Related resources</h2>
<ul>
<li><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html">Kafka input plugin documentation</a>: Full reference for all <code>logstash-input-kafka</code> configuration parameters.</li>
<li><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html">Azure Event Hubs input plugin documentation</a>: Full reference for the legacy plugin being replaced.</li>
<li><a href="https://learn.microsoft.com/en-us/azure/event-hubs/azure-event-hubs-kafka-overview">Azure Event Hubs for Apache Kafka overview</a>: Microsoft's documentation on the built-in Kafka endpoint in Event Hubs.</li>
<li><a href="https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-quotas#basic-vs-standard-vs-premium-vs-dedicated-tiers">Event Hubs quotas and tier comparison</a>: Tier requirements for Kafka protocol support.</li>
</ul>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin/elastic-blog-logstash-kafka.jpeg" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Migrate Logstash Pipelines from Azure Event Hubs to OTel Collector Kafka Receiver]]></title>
            <link>https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-otel-collector-kafka-receiver</link>
            <guid isPermaLink="false">migrate-logstash-pipelines-from-azure-event-hubs-to-otel-collector-kafka-receiver</guid>
            <pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Step-by-step guide to migrating Logstash pipelines from the Azure Event Hubs plugin to the OpenTelemetry Collector Kafka receiver.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>This article is a companion guide to the <a href="https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin">Logstash Azure Event Hubs to Kafka input plugin migration</a>, covering an alternative path: replacing <code>logstash-input-azure_event_hubs</code> with the OpenTelemetry Collector <code>kafka</code> receiver to consume from the Azure Event Hubs Kafka endpoint. For the reasons to migrate, authentication considerations, and key behavior changes such as offset handling, refer to the original article.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-logstash-pipelines-from-azure-event-hubs-to-otel-collector-kafka-receiver/amqp-vs-kafka_OTel.png" alt="AMQP vs Kafka protocol path comparison in Otel Collector connected to Azure Event Hubs" /></p>
<blockquote>
<p><strong>Reference</strong>: For detailed OTel Kafka receiver configuration options or parameter default values, see the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kafkareceiver">Kafka Receiver README</a>.</p>
</blockquote>
<h2>Converting your configuration</h2>
<h3>TLS configuration</h3>
<p>Azure Event Hubs requires TLS for all Kafka connections on port 9093. The <code>tls: {}</code> block enables TLS with default settings (system CA certificates, no client certificate), which is sufficient for Azure Event Hubs. Omitting this block will cause the connection to fail because the broker expects a TLS handshake.</p>
<h3>Encoding</h3>
<p>The <code>encoding</code> field controls how the receiver interprets each Kafka message payload. For events consumed from Azure Event Hubs, the most common options are:</p>
<ul>
<li><code>text</code>: decodes the payload as text and inserts it as the body of a log record. Uses UTF-8 by default; use <code>text_&lt;ENCODING&gt;</code> (e.g., <code>text_shift_jis</code>) for other character sets.</li>
<li><code>raw</code>: inserts the payload bytes as-is into the log record body.</li>
<li><code>json</code>: decodes the payload as JSON and inserts it as the log record body.</li>
<li><code>azure_resource_logs</code>: converts Azure Resource Logs format to OpenTelemetry format.</li>
</ul>
<p>Additional encodings such as <code>otlp_proto</code>, <code>otlp_json</code>, and trace-specific formats (<code>jaeger_proto</code>, <code>zipkin_json</code>, etc.) are also available. See the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kafkareceiver">Kafka Receiver README</a> for the full list.</p>
<h3>Basic configuration</h3>
<p>Minimal configuration to consume logs from one Event Hub with SASL/PLAIN.</p>
<pre><code class="language-yaml">receivers:
  kafka:
    brokers:
      - &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    group_id: &quot;&lt;CONSUMER_GROUP_NAME&gt;&quot;
    auth:
      sasl:
        username: &quot;$ConnectionString&quot;
        password: &quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;ACCESS_KEY_NAME&gt;;SharedAccessKey=&lt;ACCESS_KEY&gt;&quot;
        mechanism: &quot;PLAIN&quot;
    tls: {}
    logs:
      topics:
        - &quot;&lt;EVENT_HUB_NAME&gt;&quot;
      encoding: text
</code></pre>
<h3>Advanced configuration</h3>
<p>Example with multiple Event Hubs.</p>
<pre><code class="language-yaml">receivers:
  kafka/eh1:
    brokers:
      - &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    group_id: &quot;&lt;CONSUMER_GROUP_1&gt;&quot;
    auth:
      sasl:
        username: &quot;$ConnectionString&quot;
        password: &quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;KEY_1&gt;;SharedAccessKey=&lt;ACCESS_KEY_1&gt;&quot;
        mechanism: &quot;PLAIN&quot;
    tls: {}
    logs:
      topics:
        - &quot;&lt;EVENT_HUB_1&gt;&quot;
      encoding: text

  kafka/eh2:
    brokers:
      - &quot;&lt;NAMESPACE&gt;.servicebus.windows.net:9093&quot;
    group_id: &quot;&lt;CONSUMER_GROUP_2&gt;&quot;
    auth:
      sasl:
        username: &quot;$ConnectionString&quot;
        password: &quot;Endpoint=sb://&lt;NAMESPACE&gt;.servicebus.windows.net/;SharedAccessKeyName=&lt;KEY_2&gt;;SharedAccessKey=&lt;ACCESS_KEY_2&gt;&quot;
        mechanism: &quot;PLAIN&quot;
    tls: {}
    logs:
      topics:
        - &quot;&lt;EVENT_HUB_2&gt;&quot;
      encoding: text
</code></pre>
<h2>Configuration parameters mapping</h2>
<p>The following section maps each <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html"><code>logstash-input-azure_event_hubs</code></a> parameter to its OpenTelemetry Collector <code>kafka</code> receiver equivalent.</p>
<ol>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-checkpoint_interval"><code>checkpoint_interval</code></a>: Direct mapping to <code>autocommit.interval</code>.</p>
<p><strong>Units</strong>: Azure <code>checkpoint_interval</code> is in <strong>seconds</strong>. OTel <code>autocommit.interval</code> requires a duration string (e.g., <code>10s</code>, <code>500ms</code>).</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        # ... other params ...
        checkpoint_interval =&gt; 10 # Default 5
    }
}
</code></pre>
<p>OTel receiver equivalent:</p>
<pre><code class="language-yaml">receivers:
  kafka:
    # ... other params ...
    autocommit:
      interval: 10s # Default 1s
</code></pre>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-initial_position"><code>initial_position</code></a>: Maps to <code>initial_offset</code>.</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        initial_position =&gt; &quot;end&quot;
    }
}
</code></pre>
<p>OTel receiver equivalent:</p>
<pre><code class="language-yaml">receivers:
  kafka:
    initial_offset: latest
</code></pre>
<p>Value mapping:</p>
<table>
<thead>
<tr>
<th>Azure value</th>
<th>OTel value</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>beginning</code></td>
<td><code>earliest</code></td>
</tr>
<tr>
<td><code>end</code></td>
<td><code>latest</code> (default)</td>
</tr>
<tr>
<td><code>look_back</code></td>
<td>Not directly supported</td>
</tr>
</tbody>
</table>
<p><strong>Note:</strong> Since the Kafka receiver can't read the old Blob Storage checkpoints, it treats the migration as a first-time connection. To avoid reprocessing data the legacy plugin already handled, set <code>initial_offset: latest</code> for the initial deployment.</p>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-max_batch_size"><code>max_batch_size</code></a>: No direct 1:1 mapping.</p>
<p>In OTel, the maximum batch of events processed cannot be directly controlled by the receiver. The receiver only controls how much data is read per fetch request using <code>min_fetch_size</code>, <code>max_fetch_size</code>, and <code>max_fetch_wait</code>.</p>
<p>The actual event batching happens at the processing layer via the <a href="https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md"><code>batch processor</code></a>, which groups telemetry at the configured pipeline stage.</p>
<p><strong>Units</strong>: <code>min_fetch_size</code> and <code>max_fetch_size</code> are in <strong>bytes</strong>. <code>max_fetch_wait</code> uses duration strings (e.g., <code>250ms</code>). <code>send_batch_size</code> is the <strong>number of records</strong>. <code>timeout</code> uses duration strings (e.g., <code>5s</code>).</p>
<p>Azure config:</p>
<pre><code class="language-ruby">input {
    azure_event_hubs {
        max_batch_size =&gt; 125
    }
}
</code></pre>
<p>OTel receiver example:</p>
<pre><code class="language-yaml">receivers:
  kafka:
    max_fetch_size: 2097152  # bytes (2 MiB)
    max_fetch_wait: 250ms

processors:
  batch:
    send_batch_size: 125  # number of log records
</code></pre>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-threads"><code>threads</code></a>: No direct mapping.</p>
<p>Event Hubs distribute work by partition. A single Collector Kafka client can read from multiple partitions in parallel because the underlying Kafka client (<a href="https://pkg.go.dev/github.com/twmb/franz-go">franz-go</a>) uses internal goroutines to fetch and process partition data concurrently. This concurrency is handled internally and is not configurable via a user-facing <code>threads</code> setting.</p>
</li>
<li>
<p><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html#plugins-inputs-azure_event_hubs-decorate_events"><code>decorate_events</code></a>: Not supported by Kafka receiver.</p>
</li>
</ol>
<h2>Performance comparison</h2>
<p>These results use the same test environment described in the <a href="https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin">companion article</a>: same Event Hub namespace, same number of partitions, and same batch/thread configuration. The absolute numbers are environment-specific, but the relative difference is what matters.</p>
<table>
<thead>
<tr>
<th><strong>Component</strong></th>
<th><strong>Payload</strong></th>
<th><strong>Throughput (events/s)</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Logstash <code>azure_event_hubs</code> plugin</td>
<td>100B</td>
<td>~5700</td>
</tr>
<tr>
<td>OTel Collector <code>kafka</code> receiver</td>
<td>100B</td>
<td>~10900</td>
</tr>
<tr>
<td>Logstash <code>azure_event_hubs</code> plugin</td>
<td>1KB</td>
<td>~1500</td>
</tr>
<tr>
<td>OTel Collector <code>kafka</code> receiver</td>
<td>1KB</td>
<td>~1900</td>
</tr>
<tr>
<td>Logstash <code>azure_event_hubs</code> plugin</td>
<td>10KB</td>
<td>~170</td>
</tr>
<tr>
<td>OTel Collector <code>kafka</code> receiver</td>
<td>10KB</td>
<td>~190</td>
</tr>
</tbody>
</table>
<p>Across all payload sizes, the OTel Collector <code>kafka</code> receiver outperforms the Logstash <code>azure_event_hubs</code> plugin, with the largest gain at small payloads (~1.9x at 100B) where protocol overhead dominates, narrowing at larger sizes (~1.3x at 1KB, ~1.1x at 10KB). It does not reach the throughput of the Logstash <code>kafka</code> plugin from the <a href="https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin">companion article</a>, but it improves on the legacy plugin across all tested payload sizes. Combined with the removal of the Blob Storage and GPv2 dependencies, the OTel Collector path removes two pieces of infrastructure that need to be provisioned, secured, and monitored.</p>
<h2>Conclusions</h2>
<p>Both migration paths eliminate the Blob Storage checkpoint dependency and improve throughput over the legacy <code>azure_event_hubs</code> plugin. The Logstash <code>kafka</code> plugin is the lower-friction option: the configuration change is minimal, the offset model carries over, and it delivers the highest throughput of the options tested. The OTel Collector <code>kafka</code> receiver is the better fit if you want to remove Logstash from the pipeline entirely and align with OpenTelemetry. It trades a lower peak throughput and no <code>decorate_events</code> equivalent for a vendor-neutral ingestion layer that can run alongside other OTel Collector pipelines in the same Collector.</p>
<h2>Next steps</h2>
<p>With the GPv1 retirement deadline (October 2026) approaching, starting this migration sooner reduces the time spent managing storage infrastructure that is no longer needed.</p>
<p>If any issues arise during migration:</p>
<ul>
<li>
<p><strong>Usage questions or help with configuration</strong>: Post on the <a href="https://github.com/open-telemetry/opentelemetry-collector/discussions">OpenTelemetry Collector GitHub Discussions</a> or the <a href="https://discuss.elastic.co/c/observability/">Elastic Discuss forum</a>.</p>
</li>
<li>
<p><strong>Bugs or unexpected behavior in the Kafka receiver</strong>: Open an issue in the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/issues">opentelemetry-collector-contrib</a> repository.</p>
</li>
</ul>
<h2>Related resources</h2>
<ul>
<li><a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kafkareceiver">Kafka receiver documentation</a>: Full reference for all OTel Collector <code>kafka</code> receiver configuration parameters.</li>
<li><a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-azure_event_hubs.html">Azure Event Hubs input plugin documentation</a>: Full reference for the legacy plugin being replaced.</li>
<li><a href="https://www.elastic.co/observability-labs/blog/migrate-logstash-pipelines-from-azure-event-hubs-to-kafka-plugin">Logstash Azure Event Hubs to Kafka input plugin migration</a>: Companion guide covering the alternative migration path to the <code>logstash-input-kafka</code> plugin.</li>
<li><a href="https://learn.microsoft.com/en-us/azure/event-hubs/azure-event-hubs-kafka-overview">Azure Event Hubs for Apache Kafka overview</a>: Microsoft's documentation on the built-in Kafka endpoint in Event Hubs.</li>
<li><a href="https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-quotas#basic-vs-standard-vs-premium-vs-dedicated-tiers">Event Hubs quotas and tier comparison</a>: Tier requirements for Kafka protocol support.</li>
</ul>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/migrate-logstash-pipelines-from-azure-event-hubs-to-otel-collector-kafka-receiver/elastic-blog-otel-kafka.jpeg" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Trace your Azure Function application with Elastic Observability]]></title>
            <link>https://www.elastic.co/observability-labs/blog/trace-azure-function-application-observability</link>
            <guid isPermaLink="false">trace-azure-function-application-observability</guid>
            <pubDate>Tue, 16 May 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[Serverless applications deployed on Azure Functions are growing in usage. This blog shows how to deploy a serverless application on Azure functions with Elastic Agent and use Elastic's APM capability to manage and troubleshoot issues.]]></description>
            <content:encoded><![CDATA[<p>Adoption of Azure Functions in cloud-native applications on Microsoft Azure has been increasing exponentially over the last few years. Serverless functions, such as the Azure Functions, provide a high level of abstraction from the underlying infrastructure and orchestration, given these tasks are managed by the cloud provider. Software development teams can then focus on the implementation of business and application logic. Some additional benefits include billing for serverless functions based on the actual compute and memory resources consumed, along with automatic on-demand scaling.</p>
<p>While the benefits of using serverless functions are manifold, it is also necessary to make them observable in the wider end-to-end microservices architecture context.</p>
<h2>Elastic Observability (APM) for Azure Functions: The architecture</h2>
<p><a href="https://www.elastic.co/blog/whats-new-elastic-observability-8-7-0">Elastic Observability 8.7</a> introduced distributed tracing for Microsoft Azure Functions — available for the Elastic APM Agents for .NET, Node.js, and Python. Auto-instrumentation of HTTP requests is supported out-of-the-box, enabling the detection of performance bottlenecks and sources of errors.</p>
<p>The key components of the solution for observing Azure Functions are:</p>
<ol>
<li>The Elastic APM Agent for the relevant language</li>
<li>Elastic Observability</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-azure-function.png" alt="azure function" /></p>
<p>The APM server validates and processes incoming events from individual APM Agents and transforms them into Elasticsearch documents. The APM Agent provides auto-instrumentation capabilities for the application being observed. The Node.js APM Agent can trace function invocations in an Azure Functions app.</p>
<h2>Setting up Elastic APM for Azure Functions</h2>
<p>To demonstrate the setup and usage of Elastic APM, we will use a <a href="https://github.com/elastic/azure-functions-apm-nodejs-sample-app">sample Node.js application</a>.</p>
<h3>Application overview</h3>
<p>The Node.js application has two <a href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook">HTTP-triggered</a> functions named &quot;<a href="https://github.com/elastic/azure-functions-apm-nodejs-sample-app/blob/main/Hello/index.js">Hello</a>&quot; and &quot;<a href="https://github.com/elastic/azure-functions-apm-nodejs-sample-app/blob/main/Goodbye/index.js">Goodbye</a>.&quot; Once deployed, they can be called as follows, and tracing data will be sent to the configured Elastic Observability deployment.</p>
<pre><code class="language-bash">curl -i https://&lt;APP_NAME&gt;.azurewebsites.net/api/hello
curl -i https://&lt;APP_NAME&gt;.azurewebsites.net/api/goodbye
</code></pre>
<h3>Setup</h3>
<p><strong>Step 0. Prerequisites</strong></p>
<p>To run the sample application, you will need:</p>
<ul>
<li>
<p>An installation of <a href="https://nodejs.org/">Node.js</a> (v14 or later)</p>
</li>
<li>
<p>Access to an Azure subscription with an appropriate role to create resources</p>
</li>
<li>
<p>The <a href="https://learn.microsoft.com/en-us/cli/azure/install-azure-cli">Azure CLI (az)</a> logged into an Azure subscription</p>
<ol>
<li>Use az login to login</li>
<li>See the output of az account show</li>
</ol>
</li>
<li>
<p>The <a href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-run-local?tabs=v4%2Cwindows%2Ccsharp%2Cportal%2Cbash#install-the-azure-functions-core-tools">Azure Functions Core Tools (func)</a> (func --version should show a 4.x version)</p>
</li>
<li>
<p>An Elastic Observability deployment to which monitoring data will be sent</p>
<ol>
<li>The simplest way to get started with Elastic APM Microsoft Azure is through Elastic Cloud. <a href="https://www.elastic.co/guide/en/elastic-stack-deploy/current/azure-marketplace-getting-started.html">Get started with Elastic Cloud on Azure Marketplace</a> or <a href="https://www.elastic.co/cloud/elasticsearch-service/signup">sign up for a trial on Elastic Cloud</a>.</li>
</ol>
</li>
<li>
<p>The APM server URL (serverUrl) and secret token (secretToken) from your Elastic stack deployment for configuration below</p>
<ol>
<li><a href="https://www.elastic.co/guide/en/apm/guide/8.7/install-and-run.html">How to get the serverUrl and secretToken documentation</a></li>
</ol>
</li>
</ul>
<p><strong>Step 1. Clone the sample application repo and install dependencies</strong></p>
<pre><code class="language-bash">git clone https://github.com/elastic/azure-functions-apm-nodejs-sample-app.git
cd azure-functions-apm-nodejs-sample-app
npm install
</code></pre>
<p><strong>Step 2. Deploy the Azure Function App</strong><br />
Caution icon! Deploying a function app to Azure can incur <a href="https://azure.microsoft.com/en-us/pricing/details/functions/">costs</a>. The following setup uses the free tier of Azure Functions. Step 5 covers the clean-up of resources.</p>
<p><strong>Step 2.1</strong><br />
To avoid name collisions with others that have independently run this demo, we need a short unique identifier for some resource names that need to be globally unique. We'll call it the DEMO_ID. You can run the following to generate one and save it to DEMO_ID and the &quot;demo-id&quot; file.</p>
<pre><code class="language-bash">if [[ ! -f demo-id ]]; then node -e 'console.log(crypto.randomBytes(3).toString(&quot;hex&quot;))' &gt;demo-id; fi
export DEMO_ID=$(cat demo-id)
echo $DEMO_ID
</code></pre>
<p><strong>Step 2.2</strong><br />
Before you can deploy to Azure, you will need to create some Azure resources: a Resource Group, Storage Account, and the Function App. For this demo, you can use the following commands. (See <a href="https://learn.microsoft.com/en-us/azure/azure-functions/create-first-function-cli-node#create-supporting-azure-resources-for-your-function">this Azure docs section</a> for more details.)</p>
<pre><code class="language-bash">REGION=westus2   # Or use another region listed in 'az account list-locations'.
az group create --name &quot;AzureFnElasticApmNodeSample-rg&quot; --location &quot;$REGION&quot;
az storage account create --name &quot;eapmdemostor${DEMO_ID}&quot; --location &quot;$REGION&quot; \
    --resource-group &quot;AzureFnElasticApmNodeSample-rg&quot; --sku Standard_LRS
az functionapp create --name &quot;azure-functions-apm-nodejs-sample-app-${DEMO_ID}&quot; \
    --resource-group &quot;AzureFnElasticApmNodeSample-rg&quot; \
    --consumption-plan-location &quot;$REGION&quot; --runtime node --runtime-version 18 \
    --functions-version 4 --storage-account &quot;eapmdemostor${DEMO_ID}&quot;
</code></pre>
<p><strong>Step 2.3</strong><br />
Next, configure your Function App with the APM server URL and secret token for your Elastic deployment. This can be done in the <a href="https://portal.azure.com/">Azure Portal</a> or with the az CLI.</p>
<p>In the Azure portal, browse to your Function App, then its Application Settings (<a href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-how-to-use-azure-function-app-settings?tabs=portal#settings">Azure user guide</a>). You'll need to add two settings:</p>
<p>First set your APM URL and token.</p>
<pre><code class="language-bash">export ELASTIC_APM_SERVER_URL=&quot;&lt;your serverUrl&gt;&quot;
export ELASTIC_APM_SECRET_TOKEN=&quot;&lt;your secretToken&gt;&quot;
</code></pre>
<p>Or you can use the az functionapp config appsettings set ... CLI command as follows:</p>
<pre><code class="language-bash">az functionapp config appsettings set \
  -g &quot;AzureFnElasticApmNodeSample-rg&quot; -n &quot;azure-functions-apm-nodejs-sample-app-${DEMO_ID}&quot; \
  --settings &quot;ELASTIC_APM_SERVER_URL=${ELASTIC_APM_SERVER_URL}&quot;
az functionapp config appsettings set \
  -g &quot;AzureFnElasticApmNodeSample-rg&quot; -n &quot;azure-functions-apm-nodejs-sample-app-${DEMO_ID}&quot; \
  --settings &quot;ELASTIC_APM_SECRET_TOKEN=${ELASTIC_APM_SECRET_TOKEN}&quot;
</code></pre>
<p>The ELASTIC_APM_SERVER_URL and ELASTIC_APM_SECRET_TOKEN are set in Azure function’s settings for the app and used by the Elastic APM Agent. This is initiated by the initapm.js file, which starts the Elastic APM agent with:</p>
<pre><code class="language-javascript">require(&quot;elastic-apm-node&quot;).start();
</code></pre>
<p>When you log in to Azure and look at the function’s configuration, you will see them set:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-azure-functions-application-settings.png" alt="azure functions application settings" /></p>
<p><strong>Step 2.4</strong><br />
Now you can publish your app. (Re-run this command every time you make a code change.)</p>
<pre><code class="language-bash">func azure functionapp publish &quot;azure-functions-apm-nodejs-sample-app-${DEMO_ID}&quot;
</code></pre>
<p>You should log in to Azure to see the function running.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-azure-function-app.png" alt="azure function app" /></p>
<p><strong>Step 3. Try it out</strong></p>
<pre><code class="language-bash">% curl https://azure-functions-apm-nodejs-sample-app-${DEMO_ID}.azurewebsites.net/api/Hello
{&quot;message&quot;:&quot;Hello.&quot;}
% curl https://azure-functions-apm-nodejs-sample-app-${DEMO_ID}.azurewebsites.net/api/Goodbye
{&quot;message&quot;:&quot;Goodbye.&quot;}
</code></pre>
<p>In a few moments, the APM app in your Elastic deployment will show tracing data for your Azure Function app.</p>
<p><strong>Step 4. Apply some load to your app</strong><br />
To get some more interesting data, you can run the following to generate some load on your deployed function app:</p>
<pre><code class="language-bash">npm run loadgen
</code></pre>
<p>This uses the <a href="https://github.com/mcollina/autocannon">autocannon</a> node package to generate some light load (2 concurrent users, each calling at 5 requests/s for 60s) on the &quot;Goodbye&quot; function.</p>
<p><strong>Step 5. Clean up resources</strong><br />
If you deployed to Azure, you should make sure to delete any resources so you don't incur any costs.</p>
<pre><code class="language-bash">az group delete --name &quot;AzureFnElasticApmNodeSample-rg&quot;
</code></pre>
<h2>Analyzing Azure Function APM data in Elastic</h2>
<p>Once you have successfully set up the sample application and started generating load, you should see APM data appearing in the Elastic Observability APM Services capability.</p>
<h2>Service map</h2>
<p>With the default setup, you will see two services in the APM Service map.</p>
<p>The main function: azure-functions-apm-nodejs-sample-app</p>
<p>And the end point where your function is accessible: azure-functions-apm-nodejs-sample-app-ec7d4c.azurewebsites.net</p>
<p>You will see that there is a connection between the two as your application is taking requests and answering through the endpoint.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-observability-services.png" alt="observability services" /></p>
<p>From the <a href="https://www.elastic.co/observability/application-performance-monitoring">APM Service</a> map you can further investigate the function, analyze traces, look at logs, and more.</p>
<h3>Service details</h3>
<p>When we dive into the details, we can see several items.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-observability-azure-functions-apm.png" alt="observability azure functions apm" /></p>
<ul>
<li>Latency for the recent load we ran against the application</li>
<li>Transactions (Goodbye and Hello)</li>
<li>Average throughput</li>
<li>And more</li>
</ul>
<h3>Transaction details</h3>
<p>We can see transaction details.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-observability-get-api-goodbye.png" alt="observability get api goodbye" /></p>
<p>An individual trace shows us that the &quot;Goodbye&quot; function <a href="https://github.com/elastic/azure-functions-apm-nodejs-sample-app/blob/main/Goodbye/index.js#L6-L10">calls the &quot;Hello&quot; function</a> in the same function app before returning:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-latency-distribution-trace-sample.png" alt="latency distribution trace sample" /></p>
<h3>Machine learning based latency correlation</h3>
<p>As we’ve mentioned in other blogs, we can also correlate issues such as higher than normal latency. Since we see a spike at 1s, we run the embedded latency correlation, which uses machine learning to help analyze the potential impacting component by analyzing logs, metrics, and traces.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-latency-distribution-correlations.png" alt="latency distribution correlations" /></p>
<p>The correlation indicated there is a potential cause (25%) due to the host sending the load (my machine).</p>
<h3>Cold start detection</h3>
<p>Also, we can see the impact a <a href="https://azure.microsoft.com/en-ca/blog/understanding-serverless-cold-start/">cold start</a> can have on the latency of a request:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/blog-elastic-trace-sample.png" alt="trace sample" /></p>
<h2>Summary</h2>
<p>Elastic Observability provides real-time monitoring of Azure Functions in your production environment for a broad range of use cases. Curated dashboards assist DevOps teams in performing root cause analysis for performance bottlenecks and errors. SRE teams can quickly view upstream and downstream dependencies, as well as perform analyses in the context of distributed microservices architecture.</p>
<h2>Learn more</h2>
<p>To learn how to add the Elastic APM Agent to an existing Node.js Azure Function app, read <a href="https://www.elastic.co/guide/en/apm/agent/nodejs/master/azure-functions.html">Monitoring Node.js Azure Functions</a>. Additional resources include:</p>
<ul>
<li><a href="https://www.elastic.co/blog/getting-started-with-the-azure-integration-enhancement">How to deploy and manage Elastic Observability on Microsoft Azure</a></li>
<li><a href="https://www.elastic.co/guide/en/apm/guide/current/apm-quick-start.html">Elastic APM Quickstart</a></li>
</ul>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/trace-azure-function-application-observability/09-road.jpeg" length="0" type="image/jpeg"/>
        </item>
    </channel>
</rss>