<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Articles by Aleksandar Panov</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Wed, 17 Jun 2026 16:50:13 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Articles by Aleksandar Panov</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[From alert to root cause in seconds: AI-powered observability with Elastic Agent Builder and Workflows]]></title>
            <link>https://www.elastic.co/observability-labs/blog/beyond-the-dashboard-the-move-toward-agentic-troubleshooting</link>
            <guid isPermaLink="false">beyond-the-dashboard-the-move-toward-agentic-troubleshooting</guid>
            <pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Elastic Agent Builder and Workflows replace dashboard hunting: one question surfaces the root cause, correlates metrics across weeks, and calculates business impact; then the workflow files the ticket.]]></description>
            <content:encoded><![CDATA[<p>Elastic Agent Builder and Workflows turn observability from dashboard hunting into agentic troubleshooting. In a single conversation, the agent writes and runs ES|QL queries, correlates trip volume, delivery duration, and dispute rate across a 3-week window, and surfaces an estimated $36,669 in undelivered revenue, without a human navigating a single panel. This post walks through what that looks like end-to-end: from a FastFreight Co volume alert to a scoped operations agent that knows what to check, and a workflow that opens the Jira ticket automatically.</p>
<h2>Pre-AI approach: dashboards, thresholds, and manual correlation</h2>
<p>In the pre-AI era, you would configure <a href="https://www.elastic.co/docs/explore-analyze/alerting">Kibana Alerts or Watcher</a> with threshold rules, such as alert if the error rate is 5% in the last 10 minutes. When an alert is fired, you would open related dashboards and views, correlate logs, traces and metrics, and look for the root cause of an issue. So, the first thing that gets your attention is the alert itself - some threshold has been reached and the alert is fired, in this case, a <strong>collapse in the trip volume</strong>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/01-broken-dashboard.jpg" alt="Trip volume dashboard panel showing a sharp drop below the daily baseline" /></p>
<p>With the right dashboards, you can spot that something is wrong, but the dashboards cannot show you the reason if you are not familiar enough, cannot correlate issues across multiple panels, and most importantly, cannot calculate business impact.</p>
<p>Identifying &quot;why&quot; and answering questions about impact require a high level of expertise. For example, while the first 3 panels show issues related to one vendor, the last one is unrelated and shows issues regarding another vendor.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/02-multi-vendor-panels.jpg" alt="Four observability panels covering two different vendors, where the first three correlate and the fourth is unrelated" /></p>
<p>An experienced dashboard operator would recognize the pattern:</p>
<table>
<thead>
<tr>
<th>Signal</th>
<th>Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Volume down, duration up, disputes up</td>
<td>Vendor's operational failure</td>
</tr>
<tr>
<td>All metrics normal, but costs spiking</td>
<td>Billing anomaly</td>
</tr>
</tbody>
</table>
<p>Knowledge about what certain dashboards and views indicate requires a huge effort. Understanding the nature and causes of issues you deal with takes time. Usually, that knowledge lives in your head or eventually in some wiki. The main disadvantage is that the system can detect and display, but not reason about the behavior of the underlying data. Agents can.</p>
<h2>Elastic Agent Builder: ask questions, get answers</h2>
<p>AI Agents are the effort of converting dashboards and metrics hardcoded to specific issues and knowledge that some people in the organization know, to flexible agents that can be used to explore the data by anyone in the company, and generate insights on the fly.</p>
<p><a href="https://www.elastic.co/docs/explore-analyze/ai-features/elastic-agent-builder">Agent Builder</a> is Elastic's AI conversational platform for interacting with your data using natural language. We are going to troubleshoot some vendor transaction logs based on the trip collapse alert described previously.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/03-ai-assistant-launch.jpg" alt="Agent Builder chat panel showing the results from a question" /></p>
<p>To use Agent Builder, you can use <a href="https://www.elastic.co/docs/explore-analyze/ai-features/agent-builder/models">the built-in models</a> or <a href="https://www.elastic.co/docs/explore-analyze/ai-features/llm-guides/llm-connectors">connectors</a> to plug in other model providers, including local LLMs running in your environment. In this example, we used GPT-4o through a connector, but any supported model will work.</p>
<p>After you choose an LLM, you can navigate to Agent Builder and ask the agent to analyze data and provide you with instant queries, tables, and charts.</p>
<p>Let's see what happens when you stop navigating panels and just ask.</p>
<blockquote>
<p>We have an active alert: FastFreight Co (vendor_id=1) trip volume has dropped below 100 in the last 24 hours. Our baseline is ~229 trips per day. Can you confirm the current daily trip volume for FastFreight and show me how it has changed over the past 3 weeks?</p>
</blockquote>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/04-trip-volume-question.jpg" alt="Agent response explaining data availability and a table of daily trip counts showing a sustained decline over three weeks" /></p>
<p>One question was enough. The agent pulled the data, and the table confirms that trips are dropping consistently.</p>
<p>Now we need to find when the degradation started. The agent generates and runs ES|QL queries to pinpoint the first date where daily trips fell below the baseline average.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/05-degradation-timeline.jpg" alt="Agent processing steps showing ES|QL query generation and execution to find when the volume drop started" /></p>
<p>The screenshot above shows what happens behind the scenes. The agent writes ES|QL, runs it, and finds that the degradation started around March 1st. From March 11th, the drop becomes severe.</p>
<p>From there, one follow-up question reveals two more red flags: average delivery duration jumped from 12.6 to 46 minutes, and disputes rose from 0.3% to 20%. The agent correlates across metrics in a single answer.</p>
<p>A few more iterations take you from diagnosis to business impact.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/06-business-impact-summary.jpg" alt="Agent response computing expected vs actual trips, missing trips, and estimated lost revenue" /></p>
<p>The agent computes the following summary:</p>
<ul>
<li>Expected trips: March 1 to 19 at ~228/day = ~4,328.</li>
<li>Actual trips: 1,884.</li>
<li>Missing trips: ~2,444.</li>
<li>Lost revenue: missing trips × baseline average <code>total_amount</code> (~$15.00).</li>
</ul>
<p>Estimated loss: ~$36,669 in undelivered revenue.</p>
<p>So, business impact is generated with one request. In the pre-AI era, that would not have been possible without using external tools.</p>
<p>You can connect the agent to your private data and get RAG-based answers. This allows it to use information your organization owns and return precise answers instead of generic LLM responses.</p>
<h2>From generic agent to scoped analyst: custom agents with domain memory</h2>
<p>So, observability has shifted from looking into specific graphs and knowing where to look to describing what to check while the agent navigates data back to you, in the form of auto-generated graphs, charts, tables, and reports. You can ask &quot;anything unusual&quot; and the agent will query your indices for outliers.</p>
<p>That said, every conversation with a generic agent starts from scratch. You have to re-explain data, thresholds, and what to check. Customization is also limited. You can't define specific prompts or tools, or consume your agent outside Kibana. Agent Builder solves exactly this by letting you create domain-based specialized analysts (agents) that different teams can use, each from its own perspective.</p>
<p>Here is what it looks like when you select a custom agent and start a conversation.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/07-agent-builder-overview.jpg" alt="Agent Chat interface with the OpsWatch agent selected and a trip volume question ready to submit" /></p>
<p>In terms of using agents, there are several options available, from using built-in agents and tools to creating <a href="https://www.elastic.co/docs/explore-analyze/ai-features/agent-builder/agent-builder-agents">your own agents and tools</a>. Connect them through an MCP server to external tools or use them as external resources. Custom agents are built using custom instructions (like &quot;You are the Senior Fleet Operations Analyst…&quot;), together with ES|QL references that provide granular control over accuracy and security.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/08-opswatch-agent.jpg" alt="OpsWatch agent definition in Agent Builder showing its scope and configured tools" /></p>
<p>By using a specific agent, you can discover issues faster, drill into related data, diagnose problems, and act more quickly.</p>
<p>In our example, &quot;OpsWatch&quot; is the operations team's agent. It knows trip volumes, delivery durations, and staffing levels. It doesn't know about fees, it doesn't know about SLAs.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/09-opswatch-investigation.jpg" alt="OpsWatch returning an operational assessment with supporting data pulled from the operations domain" /></p>
<p>After it is asked about operational assessment, it provides conclusions and recommendations grounded in real data. What would take hours with other approaches is done in seconds.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/10-opswatch-recommendations.jpg" alt="OpsWatch recommendations grounded in the queries it ran" /></p>
<p>At the end, a few words about scope boundaries. When being asked about costs, it declines to answer and redirects to another custom agent since it understands that the question goes beyond its boundaries and suggests asking another custom-built agent - CostGuard. This is a design feature of scoped agents.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/11-scope-redirect.jpg" alt="OpsWatch declining a cost question and redirecting the user to the CostGuard agent" /></p>
<h2>Closing the loop: Elastic Workflows from diagnosis to action</h2>
<p>What we walked through here shows where observability is heading, away from staring at dashboards and toward asking questions. They will continue being useful, but investigations will start from the agent.</p>
<p>What previously was business knowledge in the head of a few can now be part of an agent's instructions, so when a new employee asks an open question, the agent knows exactly what queries to run to answer the question, and may even surface new insights.</p>
<p>After finding how agents diagnose incidents, the next question is: can they open tickets or notify teams? They recommend, but don't act. The gap between conversational AI and automated responses is covered by <a href="https://www.elastic.co/elasticsearch/workflows">Elastic Workflows</a>.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/12-workflow-architecture.png" alt="Architecture diagram of alert to workflow to agent diagnosis to follow-up action across Jira and Slack" /></p>
<p>A workflow is a rule-triggered automation that can call external APIs (Jira, Slack, Teams, etc.) or execute follow-up Elasticsearch queries. For instance, create a Jira ticket with details or post a Slack message to a specific Slack topic, like #ops-alerts, with a summary.</p>
<p>When an alert fires, a Workflow triggers and calls the Agent Builder agent. The agent runs ES|QL queries, correlates the relevant metrics, and returns a diagnosis. The Workflow then executes the follow-up action (creating a Jira ticket, posting to Slack, or both) without any manual intervention.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/beyond-the-dashboard-the-move-toward-agentic-troubleshooting/cover.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>