Elastic Observability Labs - GenAI

Bringing observability insights from Elastic AI Assistant to the world of GitHub Copilot

Thu, 23 May 2024 00:00:00 GMT

GitHub announced GitHub Copilot Extensions this week at Microsoft Build. We are working with the GitHub team in the Limited Beta Program to explore bringing observability insights from Elastic AI Assistant to GitHub Copilot users.

Elastic’s GitHub Copilot Extension aims to combine the capabilities of GitHub Copilot and Elastic AI Assistant for Observability. This could enable developers to access critical insights from Elastic AI Assistant from GitHub Copilot Chat on GitHub.com, Visual Studio, GitHub.com, Visual Studio, and VS Code - places where they write their code.

Developers will be able ask questions such as

What errors are active?
What’s the latest stacktrace for my application?
What caused a slowdown in the application after the last push to the dev environment?
How to write an ES|QL for query that my app will send to Elasticsearch?
What runbook from Github has been loaded into Elasticsearch and is related to the issue I’m investigating And many more!

Watch Jeff's PoC Demo@Microsoft Build 2024

Elastic AI Assistant surfaced in GitHub Copilot Chat from our Extension (Proof of Concept)

What is the Elastic AI Assistant for Observability

The Elastic Observability AI Assistant for Observability, a user-centric tool, is a game-changer in providing contextual insights and streamlining troubleshooting within the Elastic Observability environment. By harnessing generative AI capabilities, the assistant offers open prompts that decipher error messages and propose remediation actions. It adopts a Retrieval-Augmented Generation (RAG) approach to fetch the most pertinent internal information, such as APM traces, log messages, SLOs, GitHub issues, runbooks, and more. This contextual assistance is a huge leap forward for Site Reliability Engineers (SREs) and operations teams, offering immediate, relevant solutions to issues based on existing documentation and resources, boosting developer productivity.

For more information on setting up and using the AI Assistant for Observability check out the blog Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI. Additionally, learn how Elastic Observability AI Assistant uses RAG to help analyze application issues with GitHub issues.

One unique feature of the AI Assistant is its API support. This allows you to take advantage of all the capabilities provided by the Elastic AI Assistant, and integrate them right into your workflow.

What is a GitHub Copilot Extension

GitHub Copilot Extensions, a new addition to GitHub Copilot, revolutionizes the developer experience by integrating a diverse array of tools and services directly into the developer's workflow. These unique extensions, crafted by partners, enable developers to interact with various services and tools using natural language within their Integrated Development Environment (IDE) or GitHub.com. This integration eliminates the need for context-switching, allowing developers to maintain their flow state, troubleshoot issues, and deploy solutions with unparalleled efficiency. These extensions will be accessible through GitHub Copilot Chat in the GitHub Marketplace, with options for organizations to create private extensions tailored to their internal tooling.

What’s next

We are participating in the Github Limited Beta Program as a partner and exploring the possibility of bringing Elastic GitHub Copilot Extension to the GitHub Marketplace. We are excited to unlock insights from Elastic Observability to GitHub Copilot users side by side to the code behind those services. Stay tuned!

Resources:

Analyzing OpenTelemetry apps with Elastic AI Assistant and APM

Tue, 12 Mar 2024 00:00:00 GMT

OpenTelemetry is rapidly becoming the most expansive project within the Cloud Native Computing Foundation (CNCF), boasting as many commits as Kubernetes and garnering widespread support from customers. Numerous companies are adopting OpenTelemetry and integrating it into their applications. Elastic® offers detailed guides on implementing OpenTelemetry for applications. However, like many applications, pinpointing and resolving issues can be time-consuming.

The Elastic AI Assistant significantly enhances the process, not only in identifying but also in resolving issues. This is further enhanced by Elastic’s new Service Level Objective (SLO) capability, allowing you to streamline your entire site reliability engineering (SRE) process from detecting potential issues to enhancing the overall customer experience.

In this blog, we will demonstrate how you, as an SRE, can detect issues in a service equipped with OpenTelemetry. We will explore problem identification using Elastic APM, Elastic’s AIOps capabilities, and the Elastic AI Assistant.

We will illustrate this using the OpenTelemetry demo, with a feature flag (cartService) that is activated.

Our walkthrough will encompass two scenarios:

When the SLO for cart service becomes noncompliant, we will analyze the error through Elastic APM. The Elastic AI Assistant will assist by providing a runbook and a GitHub issue to facilitate issue analysis.
Should the SLO for the cart service be noncompliant, we will examine the trace that indicates a high failure rate. We will employ AIOps for failure correlation and the AI Assistant to analyze logs and Kubernetes metrics directly from the Assistant.

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up the configuration:

Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
We used the OpenTelemetry Demo. Directions for using Elastic with OpenTelemetry Demo are here.
Additionally you will need to connect your AI Assistant to your favorite LLM. We used Azure OpenAI GPT-4.
We also ran the OpenTelemetry Demo on Kubernetes, specifically on GKE.

SLO noncompliance

Elastic APM recently released the SLO (Service Level Objectives) feature in 8.12. This feature enables setting measurable performance targets for services, such as availability, latency, traffic, errors, and saturation or define your own. Key components include:

Defining and monitoring SLIs (Service Level Indicators)
Monitoring error budgets indicating permissible performance shortfalls
Alerting on burn rates showing error budget consumption

We set up two SLOs for cart service:

Availability SLO , which monitors its availability by ensuring that transactions succeed. We set up the feature flag in the OpenTelemetry application, which generates an error for EmptyCart transactions 10% of the time.
Latency SLO to ensure transactions are not going below a specific latency, which will reduce customer experiences.

Because of the OTel cartservice feature flag, the availability SLO is triggered, and within the SLO details, we see that over a seven-day period the availability is well below our target of 99.9, at 95.5. Additionally all the error budget that was available is also exhausted.

With SLO, you can easily identify when issues with customer experience occur, or when potential issues with services arise before they become potentially worse.

Scenario 1: Analyzing APM trace and logs with AI Assistant

Once the SLO is found as non-compliant, we can dive into cart service to investigate in Elastic APM. The following walks through the set of steps you can take in Elastic APM and how to use the AI Assistant to analyze the issue:

From the video, we can see that once in APM, we took the following steps.

Investigated the trace EmptyCart, which was experiencing larger than normal failure rates.
The trace showed a significant number of failures, which also resulted in slightly larger latency.
We used AIOps failure correlation to identify the potential component causing the failure, which correlated to a field value of FailedPrecondition.
While filtering on that value and reviewing the logs, we still couldn’t understand what this meant.
This is where you can use Elastic’s AI Assistant to further your understanding of the issue.

AI Assistant helped us analyze the following:

It helped us understand what the log message meant and that it was related to the Redis connection failure issue.
Because we couldn’t connect to Redis, we asked the AI Assistant to give us the metrics for the Redis Kubernetes pods.
We learned there were two pods for Redis from the logs over the last two hours.
However, we also learned that the memory of one seems to be increasing.
It seems that Redis restarted (hence the second pod), and with this information we could dive deeper into what happened to Redis.

You can see how quickly we could correlate a significant amount of information, logs, metrics, and traces through the AI Assistant and Elastic’s APM capabilities. We didn’t have to go through multiple screens to hunt down information.

Scenario 2: Analyzing APM error with AI Assistant

Once the SLO is found as noncompliant, we can dive into cart service to investigate in Elastic APM. The following walks through the set of steps you can take in Elastic APM and use the AI Assistant to analyze the issue:

From the video, we can see that once in APM, we took the following steps:

We noticed a specific error for the APM service.
We investigated this in the error tab, and while we see it’s an issue with connection to Redis, we still need more information.
The AI Assistant helps us understand the stacktrace and provides some potential causes for the error and ways to diagnose and resolve it.
We also asked it for a runbook, created by our SRE team, which gives us steps to work through this particular issue.

But as you can see, AI Assistant provides us not only with information about the error message but also how to diagnose it and potentially resolve it with an internal runbook.

Achieving operational excellence, optimal performance, and reliability

We’ve shown how an OpenTelemetry instrumented application (OTel demo) can be analyzed using Elastic’s features, especially the AI Assistant coupled with Elastic APM, AIOps, and the latest SLO features. Elastic significantly streamlines the process of identifying and resolving issues within your applications.

Through our detailed walkthrough of two distinct scenarios, we have seen how Elastic APM and the AI Assistant can efficiently analyze and address noncompliance with SLOs in a cart service. The ability to quickly correlate information, logs, metrics, and traces through these tools not only saves time but also enhances the overall effectiveness of the troubleshooting process.

The use of Elastic's AI Assistant in these scenarios underscores the value of integrating advanced AI capabilities into operational workflows. It goes beyond simple error analysis, offering insights into potential causes and providing actionable solutions, sometimes even with customized runbooks. This integration of technology fundamentally changes how SREs approach problem-solving, making the process more efficient and less reliant on manual investigation.

Overall, the advancements in Elastic’s APM, AIOps capabilities, and the AI Assistant, particularly in handling OpenTelemetry data, represent a significant step forward in operational excellence. These tools enable SREs to not only react swiftly to emerging issues but also proactively manage and optimize the performance and reliability of their services, thereby ensuring an enhanced customer experience.

Try it out

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on cloud? Start a free trial.

Build better Service Level Objectives (SLOs) from logs and metrics

Elastic Observability 8.12: GA for AI Assistant, SLO, and Mobile APM support

Native Observability support in Elastic Observability

Context-aware insights using the Elastic AI Assistant for Observability

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Automated log parsing in Streams with ML

Tue, 10 Feb 2026 00:00:00 GMT

In modern observability stacks, ingesting unstructured logs from diverse data providers into platforms like Elasticsearch remains a challenge. Reliance on manually crafted parsing rules creates brittle pipelines, where even minor upstream code updates lead to parsing failures and unindexed data. This fragility is compounded by the scalability challenge: in dynamic microservices environments, the continuous addition of new services turns manual rule maintenance into an operational nightmare.

Our goal was to transition to an automated, adaptive approach capable of handling both log parsing (field extraction) and log partitioning (source identification). We hypothesized that Large Language Models (LLMs), with their inherent understanding of code syntax and semantic patterns, could automate these tasks with minimal human intervention.

We are happy to announce that this feature is already available in Streams!

Dataset Description

We chose a Loghub collection of logs for PoC purposes. For our investigation, we selected representative samples from the following key areas:

Distributed systems: We used the HDFS (Hadoop Distributed File System) and Spark datasets. These contain a mix of info, debug, and error messages typical of big data platforms.
Server & web applications: Logs from Apache web servers and OpenSSH provided a valuable source of access, error, and security-relevant events. These are critical for monitoring web traffic and detecting potential threats.
Operating systems: We included logs from Linux and Windows. These datasets represent the common, semi-structured system-level events that operations teams encounter daily.
Mobile systems: To ensure our model could handle logs from mobile environments, we included the Android dataset. These logs are often verbose and capture a wide range of application and system-level activities on mobile devices.
Supercomputers: To test performance on high-performance computing (HPC) environments, we incorporated the BGL (Blue Gene/L) dataset, which features highly structured logs with specific domain terminology.

A key advantage of the Loghub collection is that the logs are largely unsanitized and unlabeled, mirroring a noisy live production environment with microservice architecture.

Log examples:

[Sun Dec 04 20:34:21 2005] [notice] jk2_init() Found child 2008 in scoreboard slot 6
[Sun Dec 04 20:34:25 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 11:06:51 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
17/06/09 20:10:58 INFO output.FileOutputCommitter: Saved output of task 'attempt_201706092018_0024_m_000083_1138' to hdfs://10.10.34.11:9000/pjhe/test/1/_temporary/0/task_201706092018_0024_m_000083
17/06/09 20:10:58 INFO mapred.SparkHadoopMapRedUtil: attempt_201706092018_0024_m_000083_1138: Committed

In addition, we created a Kubernetes cluster with a typical web application + database set up to mine extra logs in the most common domain.

Example of common log fields: timestamp, log level (INFO, WARN, ERROR), source, message.

Few-Shot Log Parsing with an LLM

Our first set of experiments focused on a fundamental question: Can an LLM reliably identify key fields and generate consistent parsing rules to extract them?

We asked a model to analyse raw log samples and generate log parsing rules in regular expression (regex) and Grok formats. Our results showed that this approach has a lot of potential, but also significant implementation challenges.

High Confidence & Context Awareness

Initial results were promising. The LLM demonstrated a strong ability to generate parsing rules that matched the provided few-shot examples with high confidence. Besides simple pattern matching, the model showed a capacity for log understanding —it could correctly identify and name the log source (e.g., health tracking app, Nginx web app, Mongo database).

The "Goldilocks" Dilemma of Input Samples

Our experiments quickly surfaced a significant lack of robustness because of extreme sensitivity to the input sample. The model's performance fluctuates wildly based on the specific log examples included in the prompt. We observed a log similarity problem where the log sample needs to include just diverse enough logs:

Too homogeneous (overfitting): If the input logs are too similar, the LLM tends to overspecify. It treats variable data—such as specific Java class names in a stack trace—as static parts of the template. This results in brittle rules that cover a tiny ratio of logs and extract unusable fields.
Too heterogeneous (confusion): Conversely, if the sample contains significant formatting variance—or worse, "trash logs" like progress bars, memory tables, or ASCII art—the model struggles to find a common denominator. It often resorts to generating complex, broken regexes or lazily over-generalizing the entire line into a single message blob field.

The Context Window Constraint

We also encountered a context window bottleneck. When input logs were long, heterogeneous, or rich in extractable fields, the model's output often deteriorated, becoming "messy" or too long to fit into the output context window. Naturally, chunking helps in this case. By splitting logs using character-based and entity-based delimiters, we could help the model focus on extracting the main fields without being overwhelmed by noise.

The consistency & standardization gap

Even when the model successfully generated rules, we noted slight inconsistencies:

Service naming variations: The model proposes different names for the same entity (e.g., labeling the source as "Spark," "Apache Spark," and "Spark Log Analytics" in different runs).
Field naming variations: Field names lacked standardization (e.g., id vs. service.id vs. device.id). We normalized names using a standardized Elastic field naming.
Resolution variance: The resolution of the field extraction varied depending on how similar the input logs were to one another.

Log Format Fingerprint

To address the challenge of log similarity, we introduce a high-performance heuristic: log format fingerprint (LFF).

Instead of feeding raw, noisy logs directly into an LLM, we first apply a deterministic transformation to reveal the underlying structure of each message. This pre-processing step abstracts away variable data, generating a simplified "fingerprint" that allows us to group related logs.

The mapping logic is simple to ensure speed and consistency:

Digit abstraction: Any sequence of digits (0-9) is replaced by a single ‘0’.
Text abstraction: Any sequence of alphabetical characters with whitespace is replaced by a single ‘a’.
Whitespace normalization: All sequences of whitespace (spaces, tabs, newlines) are collapsed into a single space.
Symbol preservation: Punctuation and special characters (e.g., :, [, ], /) are preserved, as they are often the strongest indicators of log structure.

We introduce the log mapping approach. The basic mapping patterns include the following:

Digits 0-9 of any length -> to ‘0.’

Text (alphabetical characters with spaces) of any length -> to ‘a’.
White spaces, tabs, and new lines -> to a single space.
Let's look at an example of how this mapping allows us to transform the logs.

As a result, we obtain the following log masks:

Notice the fingerprints of the first two logs. Despite different timestamps, source classes, and message content, their prefixes (0/0/0 0:0:0 a a.a:) are identical. This structural alignment allows us to automatically bucket these logs into the same cluster.

The third log, however, produces a completely divergent fingerprint (0-0-0...). This allows us to algorithmically separate it from the first group before we ever invoke an LLM.

Bonus Part: Instant Implementation with ES|QL

It’s as easy as passing this query in Discover.

FROM loghub |
EVAL pattern = REPLACE(REPLACE(REPLACE(REPLACE(raw_message, "[ \t\n]+", " "), "[A-Za-z]+", "a"), "[0-9]+", "0"), "a( a)+", "a") |
STATS total_count = COUNT(), ratio = COUNT() / 2000.0, datasources=VALUES(filename), example=TOP(raw_message, 3, "desc") BY SUBSTRING(pattern, 0, 15) |
SORT total_count DESC |
LIMIT 100

Query breakdown:

FROM loghub: Targets our index containing the raw log data.

EVAL pattern = …: The core mapping logic. We chain REPLACE functions to perform the abstraction (e.g., digits to '0', text to 'a', etc.) and save the result in a “pattern” field.

STATS [column1 =] expression1, … BY SUBSTRING(pattern, 0, 15):

This is a clustering step. We group logs that share the first 15 characters of their pattern and create aggregated fields such as total log count per group, list of log datasources, pattern prefix, 3 log examples

SORT total_count DESC | LIMIT 100 : Surfaces the top 100 most frequent log patterns

The query results on LogHub are displayed below:

As demonstrated in the visualization, this “LLM-free” approach partitions logs with high accuracy. It successfully clustered 10 out of 16 data sources (based on LogHub labels) completely (>90%) and achieved majority clustering in 13 out of 16 sources (>60%) —all without requiring additional cleaning, preprocessing, or fine-tuning.

Log format fingerprint offers a pragmatic, high-impact alternative and addition to sophisticated ML solutions like log pattern analysis. It provides immediate insights into log relationships and effectively manages large log clusters.

Versatility as a primitive

Thanks to ES|QL implementation, LFF serves both as a standalone tool for fast data diagnostics/visualisations, and as a building block in log analysis pipelines for high-volume use cases.

Flexibility

LFF is easy to customize and extend to capture specific patterns, i.e. hexadecimal numbers and IP addresses.

Deterministic stability

Unlike ML-based clustering algorithms, LFF logic is straightforward and deterministic. New incoming logs do not retroactively affect existing log clusters.

Performance and Memory

It requires minimal memory, no training or GPU making it ideal for real-time high-throughput environments.

Combining Log Format Fingerprint with an LLM

To validate the proposed hybrid architecture, each experiment contained a random 20% subset of the logs from each data source. This constraint simulates a real-world production environment where logs are processed in batches rather than as a monolithic historical dump.

The objective was to demonstrate that LFF acts as an effective compression layer. We aimed to prove that high-coverage parsing rules could be generated from small, curated samples and successfully generalized to the entire dataset.

Execution Pipeline

We implemented a multi-stage pipeline that filters, clusters, and applies stratified sampling to the data before it reaches the LLM.

Two-stage hierarchical clustering

Subclasses (exact match): Logs are aggregated by identical fingerprints. Every log in one subclass shares the exact same format structure.
Outlier cleaning: We discard any subclasses that represent less than 5% of the total log volume. This ensures the LLM focuses on the dominant signal and won’t be sidetracked by noise or malformed logs.
Metaclasses (prefix match): Remaining subclasses are grouped into Metaclasses by the first N characters of the format fingerprint match. This grouping strategy effectively splits lexically similar formats under a single umbrella.We chose N=5 for Log parsing and N=15 for Log partitioning when data sources are unknown.

Stratified sampling. Once the hierarchical tree is built, we construct the log sample for the LLM. The strategic goal is to maximize variance coverage while minimizing token usage.

We select representative logs from each valid subclass within the broader metaclass.
To manage an edge case of too numerous subclasses, we apply random down-sampling to fit the target window size.

Rule generation Finally, we prompt the LLM to generate a regex parsing rule that fits all logs in the provided sample for each Metaclass. For our PoC, we used the GPT-4o mini model.

Experimental Results & Observations

We achieved 94% parsing accuracy and 91% partitioning accuracy on the Loghub dataset.

The confusion matrix above illustrates log partitioning results. The vertical axis represents the actual data sources, and the horizontal axis represents the predicted data sources. The heatmap intensity corresponds to log volume, with lighter tiles indicating a higher count. The diagonal alignment demonstrates the model's high fidelity in source attribution, with minimal scattering.

Our Performance Benchmarks Insights

Optimal baseline: a context window of 30–40 log samples per category proved to be the "sweet spot," consistently producing robust parsing with both Regex and Grok patterns.
Input minimisation: we pushed the input size to 10 logs per category for Regex patterns and observed only 2% drop in parsing performance, confirming that diversity-based sampling is more critical than raw volume.

AWS VPC Flow log analysis with GenAI in Elastic

Fri, 07 Jun 2024 00:00:00 GMT

Elastic Observability provides a full observability solution, by supporting metrics, traces and logs for applications and infrastructure. In managing AWS deployments, VPC flow logs are critical in managing performance, network visibility, security, compliance, and overall management of your AWS environment. Several examples of :

Where traffic is coming in from and going out to from the deployment, and within the deployment. This helps identify unusual or unauthorized communications
Traffic volumes detecting spikes or drops which could indicate service issues in production or an increase in customer traffic
Latency and Performance bottlenecks - with VPC Flow logs, you can look at latency for a flow (in and outflows), and understand patterns
Accepted and rejected traffic helps determine where potential security threats and misconfigurations lie.

AWS VPC Logs is a great example of how logs are great. Logging is an important part of Observability, for which we generally think of metrics and tracing. However, the amount of logs an application and the underlying infrastructure output can be significantly daunting with VPC Logs. However, it also provides a significant amount of insight.

Before we proceed, it is important to understand what Elastic provides in managing AWS and VPC Flow logs:

A full set of integrations to manage VPC Flows and the entire end-to-end deployment on AWS.
Elastic has a simple-to-use AWS Firehose integration.
Elastic’s tools such as Discover, spike analysis, and anomaly detection help provide you with better insights and analysis.
And a set of simple Out-of-the-box dashboards

In today’s blog, we’ll cover how Elastics’ other features can support analyzing and RCA for potential VPC flow logs even more easily. Specifically, we will focus on managing the number of rejects, as this helps ensure there weren’t any unauthorized or unusual activities:

Set up an easy-to-use SLO (newly released) to detect when things are potentially degrading
Create an ML job to analyze different fields of the VPC Flow log
Using our newly released RAG-based AI Assistant to help analyze the logs without needing to know Elastic’s query language nor how to even graph on Elastic
ES|QL will help understand and analyze add latency for patterns.

In subsequent blogs, we will use AI Assistant and ESQL to show how to get other insights beyond just REJECT/ACCEPT from VPC Flow log.

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up this demonstration:

Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
Follow the steps in the following blog to get AWS’s three-tier app installed instructed in git, and bring in the AWS VPC Flow logs.
Ensure you have an ML node configured in your Elastic stack
To use the AI Assistant you will need a trial or upgrade to Platinum.

SLO with VPC Flow Logs

Elastic’s SLO capability is based directly on the Google SRE Handbook. All the definitions and semantics are utilized as described in Google’s SRE handbook. Hence users can perform the following on SLOs in Elastic:

Define an SLO on Logs not just metrics - Users can use KQL (log-based query), service availability, service latency, custom metric, histogram metric, or a timeslice metric.
Define SLO, SLI, Error budget and burn rates. Users can also use occurrence versus time slice-based budgeting.
Manage, with dashboards, all the SLOs in a singular location.
Trigger alerts from the defined SLO, whether the SLI is off, the burn rate is used up, or the error rate is X.

Setting up an SLO for VPC is easy. You simply create a query you want to trigger off. In our case, we look for all the good events where aws.vpcflow.action=ACCEPT and we define the target at 85%.

As the following example shows, over the last 7 days, we have exceeded our budget by 43%. Additionally, we have not complied for the last 7 days.

Analyzing the SLO with AI Assistant

Now that we see that there is an issue with the VPC Flows, we immediately work with the AI Assistant to start analyzing the SLO. Because it's a chat interface we simply open the AI Assistant and work through some simple analysis: (See Animated GIF for a demo below)

AI Assistant analysis:

what were the top 3 source.address that had aws.vpcflow.action=REJECT over the last 7 days, which is causing this SLO issue? - We wanted to simply see what could be causing the loss in error budget. Were there any particular source.addresses causing a heavy reject rate.
- The answer: A table with the highest count = 42670 and source.address = 79.110.62.185
- There is one singular source.address that is causing the loss in SLO.
What is the largest number of aws.vpcflow.action=REJECT in a 30 min time frame for the last 3 days where the source.address=79.110.62.185? - After understanding that a specific source.address is causing the loss in SLO, we want to understand the averages.
- **The answer: ** "The largest number of aws.vpcflow.action=REJECT in a 30-minute time frame for the last 3 days where the source.address is 79.110.62.185 is 229. This occurred on 2024-06-01T04:00:00.000Z.”
- It means there must be a low REJECT rate but fairly consistent vs spiky over the last 7 days.
for the logs with source.address="79.110.62.185" was there any country code of source.geo.country_iso_code field present. If yes what is the value - Given the last question showed a low REJECT rate, it only means that this was fairly consistent vs spiky over the last 7 days.
- The answer: Yes, there is a country code present in the source.geo.country_iso_code field for logs with source.address="79.110.62.185". The value is BG (Bulgaria).
Is there a specific destination.address where source.address=79.110.62.185 is getting a aws.vpcflow.action=REJECT. Give me both the destination.address and the number of REJECTs for that destination.address?
- The answer: destination.address of 10.0.0.27 is giving a reject number of 53433 in this time frame.
Graph the number of REJECT vs ACCEPT for source.address="79.110.62.185" over the last 7 days. The graph is on a daily basis in a singular graph - We asked this question to see what the comparison is between ACCEPT and REJECT.
- The answer: See the animated GIF to see that the generated graph is fairly stable
Were there any source.address that had a spike, high reject rate in. a 30min period over the 30 days? - We wanted to see if there was any other spike
- The answer - Yes, there was a source.address that had a spike in high reject rates in a 30-minute period over the last 30 days. source.address: 185.244.212.67, Reject Count: 8975, Time Period: 2024-05-22T03:00:00.000Z

Watch the flow

Potential issue:

he server handling requests from source 79.110.62.185 is potentially having an issue.

Again using logs, we essentially asked the AI Assistant to give the eni ids where the internal ip address was 10.0.0.27

From our AWS console, we know that this is the webserver. Further analysis in Elastic, and with the developers we realized there is a new version that was installed recently causing a problem with connections.

Locating anomalies with ML

While using the AI Assistant is great for analyzing information, another important aspect of VPC flow management is to ensure you can manage log spikes and anomalies. Elastic has a machine learning platform that allows you to develop jobs to analyze specific metrics or multiple metrics to look for anomalies.

VPC Flow logs come with a large amount of information. The full set of fields is listed in AWS docs. We will use a specific subset to help detect anomalies.

We were setting up anomalies for aws.vpcflow.action=REJECT, which requires us to use multimetric anomaly detection in Elastic.

The config we used utilizes:

Detectors:

destination.address
destination.port

Influencers:

source.address
aws.vpcflow.action
destination.geo.region_iso_code

The way we set this up will help us understand if there is a large spike in REJECT/ACCEPT against destination.address values from a specific source.address and/or destination.geo.region_iso_code location.

The job once run reveals something interesting:

Notice that source.address 185.244.212.67 has had a high REJECT rate in the last 30 days.

Notice where we found this before? In the AI Assistant!!!!!

While we can run the AI Assistant and find this sort of anomaly, the ML job can be setup to run continuously and alert us on such spikes. This will help us understand if there are any issues with the webserver like we found above or even potential security attacks.

Conclusion:

You’ve now seen how easily Elastic’s RAG-based AI Assistant can help analyze VPC Flows without even the need to know query syntax, understand where the data is, and understand even the fields. Additionally, you’ve also seen how we can alert you when a potential issue or degradation in service (SLO). Check out our other blogs on AWS VPC Flow analysis in Elastic:

A full set of integrations to manage VPC Flows and the entire end-to-end deployment on AWS.
Elastic has a simple-to-use AWS Firehose integration.
Elastic’s tools such as Discover, spike analysis, and anomaly detection help provide you with better insights and analysis.
And a set of simple Out-of-the-box dashboards

Try it out

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on the cloud? Start a free trial.

All of this is also possible in your environment. Learn how to get started today.

Elastic Observability: Streams Data Quality and Failure Store Insights

Tue, 18 Nov 2025 00:00:00 GMT

When working with observability and logging data, not all documents make it into Elasticsearch in pristine condition. Some may be dropped due to processing failures in ingest pipelines or mapping errors, while others may be partially ingested with ignored fields if a fields value is incompatible with the defined mappings. These issues can impact downstream analysis and dashboards. Streams data quality makes it easier than ever to monitor the health of your ingested data, identify potential issues, and take corrective action right from the UI. With data quality, you can now see exactly how well your Stream is performing and quickly understand whether your data has a Good, Degraded, or Poor quality.

What's in data quality

At-a-glance summary

The summary card shows:

Degraded documents - Documents that contain the _ignored field - see this for more info.
Failed documents - Documents that were rejected at ingestion due to mapping conflicts or pipeline failures.

The overall quality score (Good, Degraded, Poor) is automatically calculated based on the percentage of degraded and failed documents.

Trends over time

The tab includes a time-series chart so you can track how degraded and failed documents are accumulating over time. Use the date picker to zoom into a specific range and understand when problems are spiking.

Quality issues table

A detailed table lists the types of issues affecting your stream. For each issue, you can:

See which fields are causing problems.
Review counts of affected documents.
Filter by issues that have not been solved yet (Current issues only).
Open a flyout to dive deeper into the cause of the issue and learn how to fix it.

Monitoring degraded documents

A degraded document is one that contains the _ignored field, which means one or more of its fields were ignored during indexing. One of the reasons could be that their values didn’t match the expected mappings. While the rest of the document is still indexed, a high number of degraded documents can affect query results, dashboards, and overall observability accuracy.

To help keep these issues under control, the Data quality tab provides visibility into the percentage of degraded documents in your stream.

Set up a rule to stay ahead of issues

You can use the Create rule button above the Degraded docs chart to define an alert that notifies you when the percentage of degraded documents crosses a certain threshold. This makes it easy to proactively monitor for mapping mismatches and ensure your data continues to meet quality expectations.

For more information on how to configure this rule, see Degraded docs rule conditions.

Handling failed documents with the failure store

Failure store is a special index that captures documents rejected during ingestion. Instead of losing this data, the failure store retains it in a dedicated ::failures index, allowing you to inspect the problematic documents, understand what went wrong, and fix the underlying issues.

In Data Quality tab, the failed documents are only visible if your stream has a failure store enabled, for checking failure store documents you are required to have at least read_failure_store privileges. If the failure store is not enabled, you’ll see an “Enable failure store” link that opens a modal to configure it and set the retention period. For enabling failure store you are required to have manage_failure_store privileges over the specific data stream. For further information about failure store security you can refer to Searching failures.

Once enabled, you can edit the failure store configuration or disable it at any time using the Edit button above the failed docs chart.

The failure store can also be configured in the Streams Retention tab - see this article for more information.

Technical implementation

Under the hood, the Data quality tab builds on the existing Dataset quality plugin - the same one that powers the Dataset quality page in Stack Management. However, instead of working in the context of datasets following the Data stream naming scheme, it’s now tailored specifically for streams.

To determine the quality of a stream, the UI sends three ES|QL query server requests:

All documents (including failures):

 FROM myStream, myStream::failures | STATS doc_count = COUNT(*)

Failed documents only:

 FROM myStream::failures | STATS failed_doc_count = COUNT(*)

Degraded documents:

FROM myStream METADATA _ignored | WHERE _ignored IS NOT NULL | STATS degraded_doc_count = COUNT(*)

The results of these queries are then used to calculate the percentages of failed and degraded documents. The overall data quality is determined using simple thresholds:

Good: Both percentages are 0%
Degraded: Any percentage is greater than 0% but less than 3%
Poor: Any percentage is above 3%

For managing the failure store, Streams uses the Update data stream options API with the failure_store parameter to configure and update the failure store settings, including enabling the store and setting the retention period.

Why you’ll love this

The new Data quality tab gives you:

Visibility into ingestion problems without digging into logs
A clear breakdown of degraded vs. failed documents
Insights into which fields are ignored and why
Tools to capture and troubleshoot failed docs with the failure store

By surfacing data quality issues directly in the Streams UI, we’re making it easier to keep your data flowing reliably and to ensure your analytics are built on a strong foundation.

Try it out today

The data quality feature is available in Elastic Observability on Serverless, and coming soon for self-managed and Elastic Cloud users.

Sign up for an Elastic trial at cloud.elastic.co, and trial Elastic's Serverless offering which will allow you to play with all of the Streams functionality.

For more information on Streams:

Read about Reimagining streams

Look at the Streams website

Read the Streams documentation

Getting started with the Elastic AI Assistant for Observability and Amazon Bedrock

Fri, 03 May 2024 00:00:00 GMT

Elastic recently released version 8.13, which includes the general availability of Amazon Bedrock integration for the Elastic AI Assistant for Observability. This blog post will walk through the step-by-step process of setting up the Elastic AI Assistant with Amazon Bedrock. Then, we’ll show you how to add content to the AI Assistant’s knowledge base to demonstrate how the power of Elasticsearch combined with Amazon Bedrock can supercharge the answers Elastic AI Assistant provides so that they are uniquely specific to your needs.

Managing applications and the infrastructure they run on requires advanced observability into the diverse types of data involved like logs, traces, profiles, and metrics. General purpose generative AI large language models (LLMs) offer a new capability to provide human readable guidance to your observability questions. However, they have limitations. Specifically, when it comes to providing answers about your application’s distinct observability data like real-time metrics, the LLMs require additional context to provide answers that will help to actually resolve issues. This is a limitation that the Elastic AI Assistant for Observability can uniquely solve.

Elastic Observability, serving as a central datastore of all the observability data flowing from your application, combined with the Elastic AI Assistant gives you the ability to generate a context window that can inform an LLM’s responses and vastly improve the answers it provides. For example, when you ask the Elastic AI Assistant a question about a specific issue happening in your application, it gathers up all the relevant details — current errors captured from logs or a related runbook that your team has stored in the Elastic AI Assistant’s knowledge base. Then, it sends that information to the Amazon Bedrock LLM as a context window from which it can better answer your observability questions.

Read on to follow the steps for setting up the Elastic AI Assistant for yourself.

Set up the Elastic AI Assistant for Observability: Create an Amazon Bedrock connector in Elastic Cloud

Start by creating an Elastic Cloud 8.13 deployment via the AWS marketplace. If you’re a new user of Elastic Cloud, you can create a new deployment with a 7-day free trial.

Select Connectors.

Click the Create connector button.

Enable Amazon Bedrock model access

For populating the required connector settings, enable Amazon Bedrock model access in the AWS console using the following steps.

In a new browser tab, open Amazon Bedrock and click the Get started button.

Currently, access to the Amazon Bedrock foundation models is granted by requesting access using the Bedrock Model access section in the AWS console.

Select Model access from the navigation menu.

To request access, select the foundation models that you want to access and click the Save Changes button. For this blog post, we will choose the Anthropic Claude models.

Once access is granted, the Manage model access settings will indicate that access has been granted.

Create AWS IAM User

Create an IAM user and assign it a role with Amazon Bedrock full access and also generate an IAM access key and secret key in the console. If you already have an IAM user with a generated access key and secret key, you can use the existing credentials to access Amazon Bedrock.

Configure Elastic connector to use Amazon Bedrock

Back in the Elastic Cloud deployment create connector flyout, select the connector for Amazon Bedrock.

Enter a Name of your choice for the connector. Also, enter the Access Key and Key Secret that you copied in a previous step. Click the Save & test button to create the connector.

Within the Edit Connector flyout window, click the Run button to confirm that the connector configuration is valid and can successfully connect to your Amazon Bedrock instance.

You should see confirmation that the connector test was successful.

Add an example logs record

Now that the connector is configured, let's add a logs record to demonstrate how the Elastic AI Assistant can help you to better understand the diverse types of information contained within logs.

Use the Elastic Dev Tools to add a single logs record. Click the top-level menu and select Dev Tools.

Within the console area of Dev Tools, enter the following POST statement:

POST /logs-elastic_agent-default/_doc
{
    "message": "Status(StatusCode=\"BadGateway\", Detail=\"Error: The server encountered a temporary error and could not complete your request\").",
    "@timestamp": "2024-04-21T10:33:00.884Z",
    "log": {
   	 "level": "error"
    },
    "service": {
   	 "name": "proxyService"
    },
    "host": {
   	 "name": "appserver-2"
    }
}

Then run the POST command by clicking the green Run button.

You should see a 201 response confirming that the example logs record was successfully created.

Use the Elastic AI Assistant

Now that you have a log entry, let’s use the AI Assistant to see how it interacts with logs data. Click the top-level menu and select Observability.

Select Logs Explorer under Observability.

In the Logs Explorer search box, enter the text “badgateway” and press the Enter key to perform the search.

Click the View all matches button to include all search results.

You should see the one log record that you previously inserted via Dev Tools. Click the expand icon in the actions column to see the log record’s details.

You should see the expanded view of the logs record. Let’s use the AI Assistant to summarize it. Click on the What's this message? button.

We get a fairly generic answer back. Depending on the exception or error we're trying to analyze, this can still be really useful, but we can improve this response by adding additional documentation to the AI Assistant knowledge base.

Let’s add an entry in AI Assistant’s knowledge base to improve its understanding of this specific logs message.

Click the AI Assistant button at the top right of the window.

Click the Install Knowledge base button.

Click the top-level menu and select Stack Management.

Then select AI Assistants.

Click Elastic AI Assistant for Observability.

Select the Knowledge base tab.

Click the New entry button and select Single entry.

Give it the Name “proxyservice” and enter the following text as the Contents :


I have the following runbook located on Github. Store this information in your knowledge base and always include the link to the runbook in your response if the topic is related to a bad gateway error.

Runbook Link: https://github.com/elastic/observability-aiops/blob/main/ai_assistant/runbooks/slos/502-errors.md

Runbook Title: Handling 502 Bad Gateway Errors

Summary: This is likely an issue with Nginx proxy configuration

Body: This runbook provides instructions for diagnosing and resolving 502 Bad Gateway errors in your system.

Click Save to save the new knowledge base entry.

Now let’s go back to the Observability Logs Explorer. Click the top-level menu and select Observability.

Then select Explorer under Logs.

Expand the same logs entry as you did previously and click the What’s this message? button.

The response you get now should be much more relevant.

Try out the Elastic AI Assistant with a knowledge base filled with your own data

Now you’ve seen the complete process of connecting the Elastic AI Assistant to Amazon Bedrock. You’ve also seen how to use the AI Assistant’s knowledge base to store custom remediation documentation like runbooks that the AI Assistant can leverage to generate more helpful responses. Steps like this can help you remediate issues more quickly when they happen. Try out the Elastic AI Assistant with your own logs and custom knowledge base.

Start a 7-day free trial by signing up via AWS Marketplace and quickly spin up a deployment in minutes on any of the Elastic Cloud regions on AWS around the world.

The Elastic AI Assistant for Observability escapes Kibana!

Mon, 08 Apr 2024 00:00:00 GMT

Note: The API described below is currently under development and undocumented, and thus it is not supported. Consider this a forward-looking blog. Features are not guaranteed to be released.

Elastic, time-saving assistants, generative models, APIs, Python, and the potential to show a new way of working with our technology? Of course, I would move this to the top of my project list!

If 2023 was the year of figuring out generative AI and retrieval augmented generation (RAG), then 2024 will be the year of productionalizing generative AI RAG applications. Companies are beginning to publish references and architectures, and businesses are integrating generative applications into their lines of business.

Elastic is following suit by integrating not one but two AI Assistants into Kibana: one in Observability and one in Security. Today, we will be working with the former.

The Elastic AI Assistant for Observability

What is the Observability AI Assistant? Allow me to quote the documentation:

The AI Assistant uses generative AI to provide:

_ Contextual insights: _ Open prompts throughout Observability that explain errors and messages and suggest remediation. This includes your own GitHub issues, runbooks, architectural images, etc. Essentially, anything internally that is useful for the SRE and stored in Elastic can be used to suggest resolution. Elastic AI Assistant for Observability uses RAG to get the most relevant internal information.
_ Chat: _ Have conversations with the AI Assistant. Chat uses function calling to request, analyze, and visualize your data.

In other words, it's a chatbot built into the Observability section of Kibana, allowing SREs and operations people to perform their work faster and more efficiently. In the theme of integrating generative AI into lines of business, these AI Assistants are integrated seamlessly into Kibana.

Why “escape” Kibana?

Kibana is a powerful tool, offering many functions and uses. The Observability section has rich UIs for logs, metrics, APM, and more. As much as I believe people in operations, SREs, and the like can get the majority of their work done in Kibana (given Elastic is collecting the relevant data), having worked in the real world, I know just about everyone has multiple tools they work with.

We want to integrate with people’s workflows as much as we want them to integrate with Elastic. As such, providing API access to the AI Assistants allows Elastic to meet you where you spend most of your time. Be it Slack, Teams, or any other app that can integrate with an API.

API overview

Enter the AI Assistant API. The API provides most of the functionality and efficiencies the AI Assistant brings in Kibana. Since the API handles most of the functionality, it’s like having a team of developers working to improve and develop new features for you.

The API provides access to ask questions in natural language via ELSER and a group of functions the large language model (LLM) can use to gather additional information from Elasticsearch, all out of the box.

Command line

Enough talk; let’s look at some examples!

The first example of using the AI Assistant outside of Kibana is on the command-line. This command-line script allows you to ask questions and get responses. Essentially, the script uses the Elastic API to enable you to have AI Assistant interactions on your CLI (outside of Kibana) Credit for this script goes to Almudena Sanz Olivé, senior software engineer on the Observability team. Of course, I want to also credit the rest of the development team for creating the assistant! NOTE: The AI Assistant API is not yet public but Elastic is working on potentially releasing this. Stay tuned.

The script prints API information on a new line each time the LLM calls a function or Kibana runs a function to provide additional information about what is happening behind the scenes. The generated answer will also be written on a new line.

There are many ways to start a conversation with the AI Assistant. Let’s imagine I work for an ecommerce company and just checked in some code to GitHub. I realize I need to check if there are any active alerts that need to be worked on. Since I’m already on the commandline, I can run the AI Assistant CLI and ask it to check for me.

There are nine active alerts. It's not the worst count I’ve seen by a long shot, but they should still be addressed. There are many ways to start here, but the one that caught my attention first was related to the SLO burn rate on the service-otel cart. This service handles our customers' checkout procedures.

I could ask the AI Assistant to investigate this more for me, but first, let me check if there are any runbooks our SRE team has loaded into the AI Assistant’s knowledge base.

Fantastic! I can call my fantastic co-worker Luca Wintergerst and have him fix it. While I prefer tea these days, I’ll follow step two and grab a cup of coffee.

With that handled, let’s go have some fun with SlackBots.

Slackbots

Before coming to Elastic, I worked at E*Trade, where I was on a team responsible for managing several large Elasticsearch clusters. I spent a decent amount of time working in Kibana; however, as we worked on other technologies, I spent much more time outside of Kibana. One app I usually had open was Slack. Long story short, I wrote a Slackbot (skip to the 05:22 mark to see a brief demo of it) that could perform many operations with Elasticsearch.

This worked really well. The only problem was writing all the code, including implementing basic natural language processing (NLP). All the searches were hard-coded, and the list of tasks was static.

Creating an AI Slackbot today

Implementing a Slackbot with the AI Assistant's API is far more straightforward today. The interaction with the bot is the same as we saw with the command-line interface, except that we are in Slack.

To start things off, I created a new slackBot and named it obsBurger. I’m a Bob’s Burgers fan, and observability can be considered a stack of data. The Observability Burger, obsBurger for short, was born. This would be the bot that will directly connect to the AI Assistant API and perform all the same functions that can be performed within Kibana.

More bots!

Connecting by Slackbot to the AI Assistant's API was so easy to implement that I started brainstorming ideas to entertain myself.

Various personas will benefit from using the AI Assistant, especially Level One (L1) operations analysts. These people are generally new to observability and would typically need a lot of mentoring by a more senior employee to ramp up quickly. We could pretend to be an L1, test the Slackbot, or have fun with LLMs and prompt engineering!

I created a new Slackbot called opsHuman. This bot connects directly to Azure OpenAI using the same model the AI Assistant is configured to use. This virtual L1 uses the system prompt instructing it to behave as such.

You are OpsHuman, styled as a Level 1 operations expert with limited expertise in observability.
Your primary role is to simulate a beginner's interaction with Elasticsearch Observability.

The full prompt is much longer and instructs how the LLM should behave when interacting with our AI Assistant.

Let’s see it in action!

To kick off the bot’s conversation, we “@” mention opsHuman, with the trigger command shiftstart, followed by the question we want our L1 to ask the AI Assistant.

@OpsHuman shiftstart are there any active alerts?

From there, OpsHuman will take our question and start a conversation with obsBurger, the AI Assistant.

@ObsBurger are there any active alerts?

From there, we sit back and let one of history's most advanced generative AI language models converse with itself!

It’s fascinating to watch this conversation unfold. This is the same generative model, GPT-4-turbo, responding to two sets of API calls, with only different prompt instructions guiding the style and sophistication of the responses. When I first set this up, I watched the interaction several times, using a variety of initial questions to start the conversation. Most of the time, the L1 will spend several rounds asking questions about what the alerts mean, what a type of APM service does, and how to investigate and ultimately remediate any issue.

Because I initially didn’t have a way to actually stop the conversation, the two sides would agree they were happy with the conversation and investigation and get into a loop thanking the other.

Iterating

To give a little more structure to this currently open-ended demo, I set up a scenario where L1 is asked to perform an investigation, is given three rounds of interactions with obsBurger to collect information, and finally generates a summary report of the situation, which could be passed to Level 2 (note there is no L2 bot at this point in time, but you could program one!).

Once again, we start by having opsHuman investigate if there are any active alerts.

Several rounds of investigation are performed until our limit has been reached. At that time, it will generate a summary of the situation.

How about something with a real-world application

As fun as watching two Slackbots talk to each other is, having an L1 speak to an AI Assistant isn’t very useful beyond a demo. So, I decided to see if I could modify opsHuman to be more beneficial for real-world applications.

The two main changes for this experiment were:

Flip the profile of the bot from an entry-level personality to an expert.
Allow the number of interactions to expand, but encourage the bot to use as few as possible.

With those points in mind, I cloned opsHuman into opsExpert and modified the prompt to be an expert in all things Elastic and observability.

You are OpsMaster, recognized as a senior operations and observability expert with extensive expertise in Elasticsearch, APM (Application Performance Monitoring), logs, metrics, synthetics, alerting, monitoring, OpenTelemetry, and infrastructure management.

I started with the same command: Are there any active alerts? After getting the list of alerts, OpsExpert dove into data collection for its investigation.

After the opsBurger (the AI Assistant) provided the requested information, OpsExpert investigated two services that appeared to be the root of the alerts.

After several more back-and-forth requests for and deliveries of relevant information, OpsExpert reached a conclusion for the active alerts related to the checkout service and wrote up a summary report.

Looking forward

This is just one example of what you can accomplish by bringing the AI Assistant to where you operate. You could take this one step further and have it actually open an issue on GitHub:

Or integrate it into any other tracking platform you use!

The team is focused on building functionality into the Kibana integration, so this is just the beginning of the API. As time progresses, new functionality will be added. Even at a preview stage, I hope this starts you thinking about how having a fully developed Observability AI Assistant accessible by a standard API can make your work life even easier. It could get us closer to my dream of sitting on a beach handling incidents from my phone!

Try it yourself!

You can explore the API yourself if running Elasticsearch version 8.13 or later. The demo code I used for the above examples is available on GitHub.

As a reminder, as of Elastic version 8.13, when this blog was written, the API is not supported as it is pre-beta. Care should be taken using it, and it should not yet be used in production.

Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI

Wed, 03 Apr 2024 00:00:00 GMT

Recently, Elastic announced the AI Assistant for Observability is now generally available for all Elastic users. The AI Assistant enables a new tool for Elastic Observability providing large language model (LLM) connected chat and contextual insights to explain errors and suggest remediation. Similar to how Microsoft Copilot is an AI companion that introduces new capabilities and increases productivity for developers, the Elastic AI Assistant is an AI companion that can help you quickly gain additional value from your observability data.

This blog post presents a step-by-step guide on how to set up the AI Assistant for Observability with Azure OpenAI as the backing LLM. Then once you’ve got the AI Assistant set up, this post will show you how to add documents to the AI Assistant’s knowledge base along with demonstrating how the AI Assistant uses its knowledge base to improve its responses to address specific questions.

Set up the Elastic AI Assistant for Observability: Create an Azure OpenAI key

Start by creating a Microsoft Azure OpenAI API key to authenticate requests from the Elastic AI Assistant. Head over to Microsoft Azure and use an existing subscription or create a new one at the Azure portal.

Currently, access to the Azure OpenAI service is granted by applying for access. See the official Microsoft documentation for the current prerequisites.

In the Azure portal, select Azure OpenAI.

In the Azure OpenAI service, click the Create button.

Enter an instance Name and click Next.

Select your network access preference for the Azure OpenAI instance and click Next.

Add optional Tags and click Next.

Confirm your settings and click Create to create the Azure OpenAI instance.

Once the instance creation is complete, click the Go to resource button.

Click the Manage keys link to access the instance’s API key.

Copy your Azure OpenAI API Key and the Endpoint and save them both in a safe place for use in a later step.

Next, click Model deployments to create a deployment within the Azure OpenAI instance you just created.

Click the Manage deployments button to open Azure OpenAI Studio.

Click the Create new deployment button.

Select the model type you want to use and enter a Deployment name. Note the Deployment name for use in a later step. Click the Create button to deploy the model.

Set up the Elastic AI Assistant for Observability: Create an OpenAI connector in Elastic Cloud

The remainder of the instructions in this post will take place within Elastic Cloud. You can use an existing deployment or you can create a new Elastic Cloud deployment as a free trial if you’re trying Elastic Cloud for the first time. Another option to get started is to create an Elastic deployment from the Microsoft Azure Marketplace.

The next step is to create an Azure OpenAI connector in Elastic Cloud. In the Elastic Cloud console for your deployment, select the top-level menu and then select Stack Management.

Select Connectors on the Stack Management page.

Select Create connector.

Select the connector for Azure OpenAI.

Enter a Name of your choice for the connector. Select Azure OpenAI as the OpenAI provider.

Enter the Endpoint URL using the following format:

Replace {your-resource-name} with the name of the Azure Open AI instance that you created within the Azure portal in a previous step.
Replace deployment-id with the Deployment name that you specified when you created a model deployment within the Azure portal in a previous step.
Replace {api-version} with one of the valid Supported versions listed in the Completions section of the Azure OpenAI reference page.

https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}

Your completed Endpoint URL should look something like this:

https://example-openai-instance.openai.azure.com/openai/deployments/gpt-4-turbo/chat/completions?api-version=2024-02-01

Enter the API Key that you copied in a previous step. Then click the Save & test button.

Within the Edit Connector flyout window, click the Run button to confirm that the connector configuration is valid and can successfully connect to your Azure OpenAI instance.

A successful connector test should look something like this:

Add an example logs record

Now that you have your Elastic Cloud deployment set up with an AI Assistant connector, let’s add an example logs record to demonstrate how the AI Assistant can help you to better understand logs data.

We’ll use the Elastic Dev Tools to add a single logs record. Click the top-level menu and select Dev Tools.

Within the Console area of Dev Tools, enter the following POST statement:

POST /logs-elastic_agent-default/_doc
{
	"message": "Status(StatusCode=\"FailedPrecondition\", Detail=\"Can't access cart storage. \nSystem.ApplicationException: Wasn't able to connect to redis \n  at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104 \n  at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168\").",
	"@timestamp": "2024-02-22T11:34:00.884Z",
	"log": {
    	"level": "error"
	},
	"service": {
    	"name": "cartService"
	},
	"host": {
    	"name": "appserver-1"
	}
}

Then run the POST command by clicking the green Run button.

You should see a 201 response confirming that the example logs record was successfully created.

Use the Elastic AI Assistant

Now that you have a log record to work with, let’s jump over to the Observability Logs Explorer to see how the AI Assistant interacts with logs data. Click the top-level menu and select Observability.

Select Logs Explorer to explore the logs data.

In the Logs Explorer search box, enter the text “redis” and press the Enter key to perform the search.

Click the View all matches button to include all search results.

You should see the one log record that you previously inserted via Dev Tools. Click the expand icon to see the log record’s details.

You should see the expanded view of the logs record. Instead of trying to understand its contents ourselves, we'll use the AI Assistant to summarize it. Click on the What's this message? button.

We get a fairly generic answer back. Depending on the exception or error we're trying to analyze, this can still be really useful, but we can make this better by adding additional documentation to the AI Assistant knowledge base.

Let’s see how we can use the AI Assistant’s knowledge base to improve its understanding of this specific logs message.

Create an Elastic AI Assistant knowledge base

Select Overview from the Observability menu.

Click the AI Assistant button at the top right of the window.

Click the Install Knowledge base button.

Click the top-level menu and select Stack Management.

Then select AI Assistants.

Click Elastic AI Assistant for Observability.

Select the Knowledge base tab.

Click the New entry button and select Single entry.

Give it the Name “cartservice” and enter the following text as the Contents :

Link: [Cartservice Intermittent connection issue](https://github.com/elastic/observability-examples/issues/25)
I have the following GitHub issue. Store this information in your knowledge base and always return the link to it if relevant.
GitHub Issue, return if relevant

Link: https://github.com/elastic/observability-examples/issues/25

Title: Cartservice Intermittent connection issue

Body:
The cartservice occasionally encounters storage errors due to an unreliable network connection.

The errors typically indicate a failure to connect to Redis, as seen in the error message:

Status(StatusCode="FailedPrecondition", Detail="Can't access cart storage.
System.ApplicationException: Wasn't able to connect to redis
at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104
at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168')'.
I just talked to the SRE team in Slack, they have plans to implement retries as a quick fix and address the network issue later.

Click Save to save the new knowledge base entry.

Now let’s go back to the Observability Logs Explorer. Click the top-level menu and select Observability.

Then select Explorer under Logs.

Expand the same logs entry as you did previously and click the What’s this message? button.

The response you get now should be much more relevant.

Try out the Elastic AI Assistant with a knowledge base filled with your own data

Now that you’ve seen how easy it is to set up the Elastic AI Assistant for Observability, go ahead and give it a try for yourself. Sign up for a free 14-day trial. You can quickly spin up an Elastic Cloud deployment in minutes and have your own search powered AI knowledge base to help you with getting your most important work done.

Instrumenting your OpenAI-powered Python, Node.js, and Java Applications with EDOT

Thu, 23 Jan 2025 00:00:00 GMT

Introduction

Last year, we announced Elastic Distribution of OpenTelemetry (a.k.a. EDOT) language SDKs, which collect logs, traces and metrics from applications. When this was announced, we didn’t yet support Large Language Model (LLM) providers such as OpenAI. This limited insight developers had into Generative AI (GenAI) applications.

In a prior post, we reviewed LLM observability focus, such as token usage, chat latency and knowing which tools (like DuckDuckGo) your application uses. With the right logs, traces and metrics, developers can answer questions like "Which version of a model generated this response?" or "What was the exact chat prompt created by my RAG application?"

In the last six months, Elastic invested a lot of energy alongside others in the OpenTelemetry community towards shared specifications on these areas, including code to collect LLM related logs, metrics and traces. Our goal was to extend the zero code (agent) approach EDOT brings to GenAI use cases.

Today, we announce our first GenAI instrumentation capability in the EDOT language SDKs: OpenAI. Below, you’ll see how to observe GenAI applications using our Python, Node.js and Java EDOT SDKs.

Example application

Many of us may be familiar with ChatGPT, which is frontend for OpenAI’s GPT model family. Using this, you can ask a question and the assistant might reply correctly depending on what you ask and text the LLM was trained on.

Here’s an example of an esoteric question answered by ChatGPT:

Our example application will simply ask this predefined question and print the result. We’ll write it in three languages: Python, JavaScript and Java.

We’ll execute each with a "zero code" (agent) approach, so that logs, metrics and traces are captured and visible in an Elastic Stack configured with Kibana and APM server. If you don’t have a stack running, use instructions from Elasticsearch Labs to set one up.

Regardless of programming language, three variables are needed: the OpenAI API key, the location of your Elastic APM server, and the service name of the application. You’ll write these to a file named .env.

OPENAI_API_KEY=sk-YOUR_API_KEY
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:8200
OTEL_SERVICE_NAME=openai-example

By default instrumentations does not capture the content sent to the OpenAI API in the GenAI events sent to logs, if you want to capture it add the following:

OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

Each time the application is run, it sends logs, traces and metrics to the APM server, which you can find by querying Kibana like this for the application "openai-example"

http://localhost:5601/app/apm/services/openai-example/transactions

When you choose a trace, you’ll see the LLM request made by the OpenAI SDK, and HTTP traffic caused by it:

Select the logs tab to see the exact request and response to OpenAI. This data is critical for Q/A and evaluation use cases.

You can also go to the Metrics Explorer and make a graph of "gen_ai.client.token.usage" or "gen_ai.client.operation.duration" over all the times you ran the application:

http://localhost:5601/app/metrics/explorer

Continue to see exactly how this application looks and is run, in Python, Java and Node.js. Those already using our EDOT language SDKs will be familiar with how this works.

Python

Assuming you have python installed, the first thing would be to setup a virtual environment and install the required packages: the OpenAI client, a helper tool to read the .env file and our EDOT Python package:

python3 -m venv .venv
source .venv/bin/activate
pip install openai "python-dotenv[cli]" elastic-opentelemetry

Next, run edot-bootstrap which analyzes the code to install any relevant instrumentation available:

edot-bootstrap —-action=install

Now, create your .env file, as described earlier in this article, and the below source code in chat.py

import os

import openai

CHAT_MODEL = os.environ.get("CHAT_MODEL", "gpt-4o-mini")


def main():
  client = openai.Client()

  messages = [
    {
      "role": "user",
        "content": "Answer in up to 3 words: Which ocean contains Bouvet Island?",
    }
  ]

  chat_completion = client.chat.completions.create(model=CHAT_MODEL, messages=messages)
  print(chat_completion.choices[0].message.content)

if __name__ == "__main__":
  main()

Now you can run everything with:

dotenv run -- opentelemetry-instrument python chat.py

Finally, look for a trace for the service named "openai-example" in Kibana. You should see a transaction named "chat gpt-4o-mini".

Rather than copy/pasting above, you can find a working copy of this example (along with the instructions) in the Python EDOT repository here.

Finally, if you would like to try a more comprehensive example, take a look at chatbot-rag-app which uses OpenAI with Elasticsearch’s Elser retrieval model.

Java

There are multiple popular ways to initialize a Java project. Since we are using OpenAI, the first step is to configure the dependency com.openai:openai-java and write the below source as Chat.java.

package openai.example;

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.*;


final class Chat {

  public static void main(String[] args) {
    String chatModel = System.getenv().getOrDefault("CHAT_MODEL", "gpt-4o-mini");

    OpenAIClient client = OpenAIOkHttpClient.fromEnv();

    String message = "Answer in up to 3 words: Which ocean contains Bouvet Island?";
    ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
        .addMessage(ChatCompletionUserMessageParam.builder()
          .content(message)
          .build())
        .model(chatModel)
        .build();

    ChatCompletion chatCompletion = client.chat().completions().create(params);
    System.out.println(chatCompletion.choices().get(0).message().content().get());
  }
}

Build the project such that all dependencies are in a single jar. For example, if using Gradle, you would use the com.gradleup.shadow plugin.

Next, create your .env file, as described earlier, and download shdotenv which we’ll use to load it.

curl -O -L https://github.com/ko1nksm/shdotenv/releases/download/v0.14.0/shdotenv
chmod +x ./shdotenv

At this point, you have a jar and configuration you can use to run the OpenAI example. The next step is to download the EDOT Java javaagent binary. This is the part that records and exports logs, metrics and traces.

curl -o elastic-otel-javaagent.jar -L 'https://oss.sonatype.org/service/local/artifact/maven/redirect?r=snapshots&g=co.elastic.otel&a=elastic-otel-javaagent&v=LATEST'

Assuming you assembled a file named openai-example-all.jar, run it with EDOT like this:

./shdotenv java -javaagent:elastic-otel-javaagent.jar -jar openai-example-all.jar

Finally, look for a trace for the service named "openai-example" in Kibana. You should see a transaction named "chat gpt-4o-mini".

Rather than copy/pasting above, you can find a working copy of this example in the EDOT Java source repository here.

Node.js

Assuming you already have npm installed and configured, run the following commands to initialize a project for the example. This includes the openai package and @elastic/opentelemetry-node (EDOT Node.js)

npm init -y
npm install openai @elastic/opentelemetry-node

Next, create your .env file, as described earlier in this article and the below source code in index.js

const {OpenAI} = require('openai');

let chatModel = process.env.CHAT_MODEL ?? 'gpt-4o-mini';

async function main() {
 const client = new OpenAI();
 const completion = await client.chat.completions.create({
  model: chatModel,
  messages: [
   {
    role: 'user',
    content: 'Answer in up to 3 words: Which ocean contains Bouvet Island?',
   },
  ],
 });
 console.log(completion.choices[0].message.content);
}

main();

With this in place, run the above source with EDOT like this:

node --env-file .env --require @elastic/opentelemetry-node index.js

Finally, look for a trace for the service named "openai-example" in Kibana. You should see a transaction named "chat gpt-4o-mini".

Rather than copy/pasting above, you can find a working copy of this example in the EDOT Node.js source repository here.

Finally, if you would like to try a more comprehensive example, take a look at openai-embeddings which uses OpenAI with Elasticsearch as a vector database!

Closing Notes

Above you’ve seen how to observe the official OpenAI SDK in three different languages, using Elastic Distribution of OpenTelemetry (EDOT).

It is important to note that some of the OpenAI SDKs and also OpenTelemetry specifications around generative AI are experimental. If you find this helps you, or find glitches, please join our slack and let us know about it.

Several LLM platforms accept requests from the OpenAI client SDK, by setting OPENAI_BASE_URL and choosing relevant models. During development, we tested against OpenAI Platform and Azure OpenAI Service. We also ran integration tests against Ollama, contributing improvements its OpenAI support released in v0.5.12. Whatever your choice of OpenAI compatible platform, we hope this new tooling helps you understand your LLM usage.

Finally, while the first Generative AI SDK instrumented with EDOT is OpenAI, you’ll see more soon. We are already working on Bedrock, and collaborating with others in the OpenTelemetry community for other platforms. Keep watching this blog for exciting updates.

Elastic's RAG-based AI Assistant: Analyze application issues with LLMs and private GitHub issues

Wed, 08 May 2024 00:00:00 GMT

As an SRE, analyzing applications is more complex than ever. Not only do you have to ensure the application is running optimally to ensure great customer experiences, but you must also understand the inner workings in some cases to help troubleshoot. Analyzing issues in a production-based service is a team sport. It takes the SRE, DevOps, development, and support to get to the root cause and potentially remediate. If it's impacting, then it's even worse because there is a race against time. Regardless of the situation, there is a ton of information that needs to be consumed and processed. This includes not only what the customer is experiencing, but also internal data to help provide the most appropriate resolution.

Elastic’s AI Assistant helps improve analysis for SREs, DevOps, Devs, and others. In a single window using natural language questions, you can analyze using not only general information but combine it with things like:

Issues from internal GitHub repos, Jira, etc.
Documents from internal wiki sites from Confluence, etc.
Customer issues from your support service
And more

In this blog, we will walk you through how to:

Ingest an external GitHub repository (OpenTelemetry demo repo) with code and issues into Elastic. Apply Elastic Learned Sparse EncodeR (ELSER) and store it in a specific index for the AI Assistant.
Ingest internal GitHub repository with runbook information into Elastic. Apply ELSER and store the processed data in a specific index for the AI Assistant.
Use these two indices when analyzing issues for the OpenTelemetry demo in Elastic using the AI Assistant.

3 simple questions using GitHub data with AI Assistant

Before we walk through the steps for setting up data from GitHub, let’s review what an SRE can do with the AI Assistant and GitHub repos.

We initially connect to GitHub using an Elastic GitHub connector and ingest and process two repos: the OpenTelemetry demo repo (public) and an internal runbook repo (Elastic internal).

With these two loaded and parsed by ELSER, we ask the AI Assistant some simple questions generally asked during analysis.

How many issues are open for the OpenTelemetry demo?

Since we ingested the entire repo (as of April 26, 2024) with a doc count of 1,529, we ask it a simple question regarding the total number of issues that are open. We specifically tell the AI Assistant to search our internal index to ensure the LLM knows to ask Elastic to search its internal index for the total number of issues.

Are there any issues for the Rust based shippingservice?

Elastic’s AI Assistant uses ELSER to traverse the loaded GitHub repo and finds the open issue against the shippingservice (which is the following issue at the time of writing this post).

Is there a runbook for the Cartservice?

Since we loaded an internal GitHub repo with a few sample runbooks, the Elastic AI Assistant properly finds the runbook.

As we go through this blog, we will talk about how the AI Assistant finds these issues using ELSER and how you can configure it to use your own GitHub repos.

Retrieval augmented generation (RAG) with Elastic AI Assistant

Elastic has the most advanced RAG-based AI Assistant for both Observability and Security. It can help you analyze your data using:

Your favorite LLM (OpenAI, Azure OpenAI, AWS Bedrock, etc.)
Any internal information (GitHub, Confluence, customer issues, etc.) you can either connect to or bring into Elastic’s indices

The reason Elastic’s AI Assistant can do this is because it supports RAG, which helps retrieve internal information along with LLM-based knowledge.

Adding relevant internal information for an SRE into Elastic:

As data comes in, such as in your GitHub repository, ELSER is applied to the data, and embeddings (weights and tokens into a sparse vector field) are added to capture semantic meaning and context of the data.
This data (GitHub, Confluence, etc.) is processed with embeddings and is stored in an index that can be searched by the AI Assistant.

When you query the AI Assistant for information:

The query goes through the same inference process as the ingested data using ELSER. The input query generates a “sparse vector,” which is used to find the most relevant highly ranked information in the ingested data (GitHub, Confluence, etc.).
The retrieved data is then combined with the query and also sent over to the LLM, which will then add its own knowledge base information (if there is anything to add), or it might ask Elastic (via function calls) to analyze, chart, or even search further. If a function call is made to Elastic and a response is provided, it will be added by the LLM to its response.
The results will be the most contextual based answer combining both LLM and anything relevant from your internal data.

Application, prerequisites, and config

If you want to try the steps in this blog, here are some prerequisites:

An Elastic Cloud account — sign up now
OpenTelemetry demo running and connected to Elastic (APM documentation)
Whatever internal GitHub repo you want to use with some information that is useful for analysis (In our walk through, we will be using a GitHub repo that houses runbooks for different scenarios when Elastic does demos).
Account with your favorite or approved LLM (OpenAI, Azure OpenAI, AWS Bedrock)

Adding the GitHub repos to Elastic

The first step is to set up the GitHub connector and connect to your GitHub repo. Elastic has several connectors from GitHub, Confluence, Google Drive, Jira, AWS S3, Microsoft Teams, Slack, and more. So while we will go over the GitHub connector in this blog, don’t forget about other connectors.

Once you select the GitHub connector and give it a name, you need to add two items:

GitHub token
The URL open-telemetry/opentelemetry-demo

Next, add it to an index in the wizard.

Create a pipeline and process the data with ELSER

In order to add the embeddings we discussed in the section above, we need to add the following to the connector:

Create a pipeline in the configuration wizard.
Create a custom pipeline.
Add the ML inference pipeline.
Select ELSERv2 ML Model to add the embeddings.
Select the fields that need to be evaluated as part of the inference pipeline.
Test and save the inference pipeline and the overall pipeline.

Sync the data

Now that the pipeline is created, you need to start to sync the github repo. As the documents from the github repo come in, they will go through the pipeline and embeddings will be added.

Embeddings

Once the pipeline is set up, sync the data in the connector. As the GitHub repository comes in, the inference pipeline will process the data as follows:

As data comes in from your GitHub repository, ELSER is applied to the data, and embeddings (weights and tokens into a sparse vector field) are added to capture semantic meaning and context of the data.
This data is processed with embeddings and is stored in an index that can be searched by the AI Assistant.

When you look at the OpenTelemetry GitHub documents that were ingested, you will see how the weights and token are added to the predicted_value field in the index.

These embeddings will now be used to find the most contextually relevant document when the user asks the AI Assistant a query, which might use this.

Check if AI Assistant can use the index

Elastic’s AI Assistant uses ELSER to traverse the loaded Github repo and finds the open issue against the shippingservice. (which is the following issue at the time of writing this post).

Based on the response, we can see that the AI Assistant can now use the index to find the issue and use it for further analysis.

Conclusion

You’ve now seen how easy Elastic’s RAG-based AI Assistant is to set up. You can bring in documents from multiple locations (GitHub, Confluent, Slack, etc.). We’ve shown the setup for GitHub and OpenTelemetry. This internal information can be useful in managing issues, accelerating resolution, and improving customer experiences. Check out our other blogs on how the AI Assistant can help SREs do better analysis, lower MTTR, and improve operations overall:

Try it out

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on cloud? Start a free trial.

All of this is also possible in your environments. Learn how to get started today.

Streams Processing: Stop Fighting with Grok. Parse Your Logs in Streams.

Thu, 11 Dec 2025 00:00:00 GMT

With Streams, Elastic's new AI capability in 9.2, we make parsing your logs so simple, it's no longer a concern. In general your logs are messy, lots of fields, some understood, some unknown. You have to constantly keep up with the semantics and pattern match to properly parse them. In some cases, even fields you know have different values or semantics. For instance, timestamp is the ingest time, not the event time. Or you can't even filter by log.level or user.id because they're buried inside the message field. As a result, your dashboards are flat and not useful.

Fixing this used to mean leaving Kibana, learning Grok syntax, manually editing ingest pipeline JSON or a complicated Logstash config, and hoping you didn't break parsing for everything else.

We built Streams to fix this, and much more. It's your one place for data processing, built right into Kibana, that lets you build, test, and deploy parsing logic on live data in seconds. It turns a high-risk backend task into a fast, predictable, interactive UI workflow. You can use AI to generate automated GROK rules from a sample of logs, or build them easily with the UI. Let's walk through an example

A Quick Walkthrough

Let's fix a common "unstructured" log right now.

Start in Discover. You find a log that isn't structured. The @timestamp is wrong, and fields like log.level aren't being extracted, so your histograms are just a single-color bar.

Inspect the log. Open the document flyout (the "Inspect a single log event" view). You'll see a button: "Parse content in Streams" (or "Edit processing in Streams"). Click it.

Go to Processing. This takes you directly to the Streams processing tab, pre-loaded with sample documents from that data stream. Click "Create your first step."

Generate a Pattern. The processor defaults to Grok. You don't have to write any. Just click the "Generate Pattern" button. Streams analyzes 100 sample documents from your stream and suggests a Grok pattern for you. By default, this uses the Elastic Managed LLM, but you can configure your own.

Accept and Simulate. Click "Accept." Instantly, the UI runs a simulation across all 100 sample documents. You can make changes to the pattern or adjust field names, and the simulation re-runs with every keystroke.

When you're happy, you save it. Your new logs will now be parsed correctly.

Powerful Features for Messy, Real-World Logs

That's the simple case. But real-world data is rarely that clean. Here are the features built to handle the complexity.

The Interactive Grok UI

When you use the Grok processor, the UI gives you a visual indication of what your pattern is extracting. You can see which parts of the message field are being mapped to which new field names. This immediate feedback means you're not just guessing. Autocompletion of GROK patterns and instant pattern validation are also part of it.

The Diff Viewer

How do you know what exactly changed? Expand any row in the simulation table. You'll get a diff view showing precisely which fields were added, removed, or modified for that specific document. No more guesswork.

End to End Simulation and Detecting Failures

This is the most critical part. Streams doesn't just simulate the processor; it simulates the entire indexing process. If you try to map a non-timestamp string (like the message field) directly to the @timestamp field, the simulation will show a failure. It detects the mapping conflict before you save it and before it can create a data-mapping conflict in your cluster. This safety net is what lets you move fast.

Conditional Processing

What if one data stream contains a large variety of logs? You can't use one Grok pattern for all.

Streams has conditional processing built for this. The UI lets you build "if-then" logic. The UI shows you exactly what percentage of your sample documents are skipped or processed by your conditions. Right now, the UI supports up to 3 levels of nesting, and we plan to add a YAML mode in the future for more complex logic.

Changing Your Test Data (Document Samples)

A random 100-document sample isn't always helpful, especially in a massive, mixed stream from Kubernetes or a central message broker.

You can change the document sample to test your changes on a more specific set of logs. You can either provide documents manually (copy-paste) or, more powerfully, specify a KQL query to fetch 100 specific documents. For example: service.name : "data_processing", to fetch 100 additional sample documents to be used in the simulation. Now you can build and test a processor on the exact logs you care about.

How Processing Works Under the Hood

There’s no magic. In simple terms, it's a UI that makes our existing best practices more accessible. As of version 9.2, Streams runs exclusively on Elasticsearch ingest pipelines. (We have plans to offer more than that, stay tuned)

When you save your changes, Streams appends processing steps by:

Locating the most specific @custom ingest pipeline for your data stream.
Adding a single pipeline processor to it.
This processor calls a new, dedicated pipeline named @stream.processing, which contains the Grok, conditional, and other logic you built in the UI.

You can even see this for yourself by going to the Advanced tab in your Stream and clicking the pipeline name.

Processing in OTel, Elastic Agent, Logstash, or Streams? What to Use?

This is a fair question. You have lots of ways to parse data.

Best: Structured logging at the Source. If you control the app writing the logs, make it log JSON in the right format of your choice. This will always stay the best way to do logging, but it's not always possible.
Good, but not all the time: Elastic Agent + Integrations: If there is an existing integration for collecting and parsing your data, Streams won't do it any better. Use it!
Good for tech savvy users: OTel at the Edge. Use OTel (with OTTL) to set yourself up for the future.
The easy Catch-All: In Streams. Especially when using an Integration that primarily just ships the data into Elastic, Streams can add a lot of value. The Kubernetes Logs integration is a good example of this where an Integration is used, but most logs aren't parsed automatically as they may be from a wide variety of pods.

Think of Streams as your universal "catch-all" for everything that arrives unstructured. It's perfect for data from sources you don't control, for legacy systems, or for when you just need to fix a parsing error right now without a full application redeploy.

A quick note on schemas: Streams can handle both ECS (Elastic Common Schema) and OTel (OpenTelemetry) data. By default, it assumes your target schema is ECS. However, Streams will automatically detect and adapt to the OTel schema if your Stream's name contains the word “otel”, or if you're using the special Logs Stream (currently in tech preview). You get the same visual parsing workflow regardless of the schema.

All processing changes can also be made using a Kibana API. Note that the API is still in tech preview while we mature some of the functionality.

Summary

Parsing logs shouldn't be a tedious, high-stakes, backend-only task. Streams moves the entire workflow from a complex, error-prone approach to an interactive UI right where you already are. You can now build, test, and deploy parsing logic with instant, safe feedback. This means you can stop fighting your logs and finally start using them. The next time you see a messy log, don't ignore it. Click "Parse in Streams" and fix it in 60 seconds.

Check out more log analytics articles in Elasitc Observability Labs.

Try out Elastic. Sign up for a trial at Elastic Cloud.

LLM Observability for Google Cloud’s Vertex AI platform - understand performance, cost and reliability

Wed, 09 Apr 2025 00:00:00 GMT

As organizations increasingly adopt large language models (LLMs) for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Google Cloud’s Vertex AI.

New Elastic Observability LLM integration with Google Cloud’s Vertex AI platform

We are thrilled to announce general availability of monitoring LLMs hosted in Google Cloud through the Elastic integration with Vertex AI. This integration enables users to experience enhanced LLM Observability by providing deep insights into the usage, cost and operational performance of models on Vertex AI, including latency, errors, token usage, frequency of model invocations as well as resources utilized by models. By leveraging this data, organizations can optimize resource usage, identify and resolve performance bottlenecks, and enhance the model efficiency and accuracy.

Observability needs for AI-powered applications using the Vertex AI platform

Leveraging AI models creates unique needs around the observability and monitoring of AI-powered applications. Some of the challenges that come with using LLMs are related to the high cost to call the LLMs, the quality and safety of LLM responses, and the performance, reliability and availability of the LLMs.

Lack of visibility into LLM observability data can make it harder for SREs and DevOps teams to ensure their AI-powered applications meet their service level objectives for reliability, performance, cost and quality of the AI-generated content and have enough telemetry data to troubleshoot related issues. Thus, robust LLM observability and detection of anomalies in the performance of models hosted on Google Cloud’s Vertex AI platform in real time is critical for the success of AI-powered applications.

Depending on the needs of their LLM applications, customers can make use of a growing list of models hosted on the Vertex AI platform such as Gemini 2.0 Pro, Gemini 2.0 Flash, and Imagen for image generation. Each model excels in specific areas and generates content in some modalities including Language, Audio, Vision, Code, etc. No two models are the same; each model has specific performance characteristics. So, it is important that service operators are able to track the individual performance, behaviour and cost of each model.

Unlocking Insights with Vertex AI Metrics

The Elastic integration with Google Cloud’s Vertex AI platform collects a wide range of metrics from models hosted on Vertex AI, enabling users to monitor, analyze, and optimize their AI deployments effectively.

Once you use the integration, you can review all the metrics in the Vertex AI dashboard

These metrics can be categorized into the following groups:

1. Prediction Metrics

Prediction metrics provide critical insights into model usage, performance bottlenecks, and reliability. These metrics help ensure smooth operations, optimize response times, and maintain robust, accurate predictions.

Prediction Count by Endpoint: Measures the total number of predictions across different endpoints.
Prediction Latency: Provides insights into the time taken to generate predictions, allowing users to identify bottlenecks in performance.
Prediction Errors: Monitors the count of failed predictions across endpoints.

2. Model Performance Metrics

Model performance metrics provide crucial insights into deployment efficiency, and responsiveness. These metrics help optimize model performance and ensure reliable operations.

Model Usage: Tracks the usage distribution among different model deployments.
Token Usage: Tracks the number of tokens consumed by each model deployment, which is critical for understanding model efficiency.

Invocation Rates: Tracks the frequency of invocations made by each model deployment.
Model Invocation Latency: Measures the time taken to invoke a model, helping in diagnosing performance issues.

3. Resource Utilization Metrics

Resource utilization metrics are vital for monitoring resource efficiency and workload performance. They help optimize infrastructure, prevent bottlenecks, and ensure smooth operation of AI deployments.

CPU Utilization: Monitors CPU usage to ensure optimal resource allocation for AI workloads.
Memory Usage: Tracks the memory consumed across all model deployments.
Network Usage: Measures bytes sent and received, providing insights into data transfer during model interactions.

4. Overview Metrics

These metrics give an overview of the models deployed in Google Cloud’s Vertex AI platform. They are essential for tracking overall performance, optimizing efficiency, and identifying potential issues across deployments.

Total Invocations: The overall count of prediction invocations across all models and endpoints, providing a comprehensive view of activity.
Total Tokens: The total number of tokens processed across all model interactions, offering insights into resource utilization and efficiency.
Total Errors: The total count of errors encountered across all models and endpoints, helping identify reliability issues.

All metrics can be filtered by region, offering localized insights for better analysis.

Note: The Elastic I integration with Vertex AI provides comprehensive visibility into both deployment models: provisioned throughput, where capacity is pre-allocated, and pay-as-you-go, where resources are consumed on demand.

Conclusion

This integration with Vertex AI represents a significant step forward in enhancing the LLM Observability for users of Google Cloud’s Vertex AI platform. By unlocking a wealth of actionable data, organizations can assess the health, performance and cost of LLMs and troubleshoot operational issues, ensuring scalability, and accuracy in AI-driven applications.

Now that you know how the Vertex AI integration enhances LLM Observability, it’s your turn to try it out n. Spin up an Elastic Cloud, and start monitoring your LLM applications hosted on Google Cloud’s Vertex AI platform.

2025 observability trends: Maturing beyond the hype

Thu, 27 Feb 2025 00:00:00 GMT

2025 observability trends: Maturing beyond the hype

Our latest survey of over 500 observability decision-makers reveals how dramatically the landscape has evolved as we move through 2025. What strikes me most is how observability has moved beyond its technical roots to become a true business imperative. Let’s dive into what we're seeing in the industry.

The investment paradox of observability in 2025

Here's something fascinating: 96% of executives in our survey expect observability to remain a key investment area. Yet almost all of them (97%) are hitting roadblocks in realizing full value. And surprisingly, the primary hurdles for observability are not technical or complicated in nature, can you guess what they might be?

For 2025, IT leaders are challenged with financial hurdles for their observability. I'm seeing this tension play out constantly in conversations with leaders - they know they need to invest, but they're grappling with budget constraints, licensing costs, and proving ROI for their organizations. This creates an interesting dynamic where organizations must carefully balance increasing investment with rigorous cost optimization and business metrics.

What's particularly interesting is how this paradox is forcing organizations to become more strategic about their investments. Leaders are no longer just throwing money at the problem - they're thinking carefully about how to maximize value from every dollar spent.

Why observability maturity Is making all the difference

The data really jumps out at me here. The gap between observability experts and newcomers tells a compelling story that I wasn't expecting to see. Expert organizations are significantly outperforming their peers across every key metric:

91% of expert organizations are deploying applications and infrastructure faster (compared to just 34% of those in early stages)

82% are successfully reducing operational costs (versus 56% of early-stage organizations)

71% achieve better MTTR for incidents (while only 40% of early-stage organizations do)

What I find particularly fascinating is how some benefits go beyond just maturity levels. About 80% of organizations report better customer issue response times regardless of their maturity stage. It tells me that even basic observability delivers immediate customer-facing value. This is crucial information for organizations just starting their observability journey - they can expect to see tangible benefits right from the start. But the overarching story may be that observability maturity leads teams from reactive to proactive and allows them to focus on higher level, value-add activities.

Cost management: the new imperative

The numbers around cost management paint a clear picture of where the industry is heading - 97% of IT decision-makers are actively managing observability costs, and 86% feel personally responsible for business outcomes.

I'm seeing a clear trend where leaders are taking concrete steps in their day to day work:

Consolidating their observability toolset while maintaining capabilities, they don’t want to lose anything
Implementing usage-based pricing models
Establishing clear ROI metrics
Creating cross-functional teams to optimize spending

This isn't just about cutting costs - it's about being smarter with resources. Organizations are learning that more tools don't necessarily mean better observability.

Two technologies reshaping the observability landscape

AI's growing impact

The enthusiasm for AI is remarkable - 94% of respondents see its tremendous potential. What fascinates me is how concerns about Generative AI reliability have actually decreased from 64% to 55% over the past year.

Leaders are particularly excited about:

Automated correlation of logs, metrics, and traces (72% of respondents)
Predictive analytics for preventing outages
Natural language interfaces for querying observability data
Automated root cause analysis

The key shift I'm seeing for the upcoming year is the move from AI as a buzzword to AI as a practical tool delivering real value in observability workflows.

Generative AI capabilities paired with retrieval augmented generation (RAG) capabilities allow organizations to leverage the power of LLMs and private data (e.g., runbooks, alerts, business data) to deliver relevant and meaningful results and identify and solve problems faster while reducing noise.

OpenTelemetry's continued momentum

Looking at expert organizations, 80% are either experimenting with or have deployed OpenTelemetry. This isn't just about technology adoption - it's about building for the future with open standards. The correlation between OpenTelemetry adoption and overall observability maturity is correlated and unmistakable.

What's particularly interesting is how OpenTelemetry is changing the vendor landscape. Organizations are increasingly demanding OpenTelemetry support from their vendors, seeing it as a way to future-proof their observability investments and avoid vendor lock-in. Thinking back to how Linux shifted the server landscape, can we expect to see the same in the observability domain?

Business integration and insights deepens

Here's what I find most compelling: 64% of expert organizations are frequently correlating operational data with business outcomes, while only 9% of early-stage organizations do the same. This represents a fundamental shift from technical monitoring to business observability.

This isn't just about uptime anymore - organizations are increasingly using observability data to:

Make informed business decisions
Improve customer experience
Optimize resource allocation
Drive innovation

Looking ahead

As we continue through 2025, I'm seeing observability mature beyond its initial promise. Organizations are focusing less on basic implementation and more on delivering real business value through:

Deeper business integration, like mapping system performance directly to revenue metrics
Optimized cost management through new data lake technology, efficient storage and intelligent retention
AI-enhanced capabilities powered by LLMs and Agentic AI
Standardized instrumentation through OpenTelemetry, reducing vendor lock-in

The path to success in 2025 isn't just about having the right tools - it's about building mature practices that deliver measurable business value while managing costs effectively. The organizations that can balance these competing demands while maintaining focus on business outcomes are the ones pulling ahead.

What are you seeing in your organization's observability journey? Are these trends aligning with your experience?

If you would like to dig in deeper on emerging observability trends, download our full report or watch the on-demand webinar, 2025 Observability trends: Maturing beyond the hype and delivering results!

Reconciliation in Elastic Streams: A Robust Architecture Deep Dive

Tue, 04 Nov 2025 00:00:00 GMT

Streams is a new, unified approach to data management in the Elastic Stack. It wraps a set of existing Elasticsearch building blocks—data streams, index templates, ingest pipelines, retention policies—into a single, coherent primitive: the Stream. Instead of configuring these parts individually and in the right order, users can now rely on Streams to orchestrate them safely and automatically. With a unified UI in Kibana and a simplified API, Streams reduces cognitive load, lowers the risk of misconfiguration, and supports more flexible workflows like late binding—where users can ingest data first and decide how to process and route it later.

But behind that clean user experience lies a fast-moving, evolving codebase. In this post, we’ll explore how we rethought its architecture to keep up with product demands—while laying the groundwork for future flexibility and scale.

Rapid experimentation often leads to messy code—but before shipping to customers, we have to ask: If this succeeds, can we continue evolving it? That question puts code health front and center. To move fast in the long term, we need a foundation that supports iteration.

When I joined the Streams team about six months ago, the project was moving fast through uncharted territory amid high uncertainty. This combination of speed and uncertainty created the perfect conditions for, well, spaghetti code—crafted by some of our most senior engineers, doing their best with a recipe missing a few ingredients.

The code was pragmatic and effective: it did exactly what it needed to do. But it was becoming increasingly difficult to understand and extend. Related logic was scattered across many files, with little separation of concerns, making it difficult to safely identify where and how to introduce changes. And the project still had a long road ahead.

Recently, we undertook a refactor of the underlying architecture—not just to bring greater clarity and structure to the codebase, but to establish clear phases that make it easier to debug and evolve. Our primary goal was to build a foundation that would let us continue moving quickly and confidently. As a secondary goal, we aimed to enable new capabilities like bulk updates, dry runs, and system diagnostics.

In this post, we’ll briefly explore the challenges that prompted a new approach, share the architectural patterns that inspired us, explain how the new design works under the hood, and highlight what it enables for the future.

The Challenges We Faced

Streams aims to be a declarative model for data management. Users describe how data should flow: where it should go, what processing should happen along the way, and which mappings should apply. Behind the scenes, each API request results in one or more Elasticsearch resources being changed.

Before the refactor, the underlying code was increasingly difficult to reason about. There was no clear lifecycle that each request followed. Data was loaded only when it happened to be needed, validation was scattered across different functions, and cascading changes—like child streams reacting to parent updates—were applied recursively and implicitly. Elasticsearch requests could happen at any point during a request.

This led to several key challenges:

No clear place for validation
Without a single, centralized validation step, engineers weren’t sure where to add new checks—or whether existing ones would even run reliably. Some validations happened early, others late.
No clear picture of the overall system state
Because there was no way to manage the system state as a whole it was hard to reason about or validate the state. We couldn’t easily check whether a change was valid in the context of all other existing streams or dependencies.
Unpredictable side effects
Since Elasticsearch operations could occur at different points in the flow, failures were harder to handle or roll back. We didn’t have a clear “commit point” where the changes were executed.
Tangled stream logic
Logic for different types of streams was mixed together in shared code paths, often guarded by conditionals. This made it hard to isolate behavior, test individual types, or add new ones without risking unintended consequences.

These challenges made it clear: we needed a more structured foundation, one capable of supporting both the current complexity and future growth.

What We Needed to Move Forward

To move faster yet with confidence, we needed a foundation that could evolve gracefully, make behavior easier to reason about, and reduce the likelihood of unexpected side effects.

We aligned around a few key goals:

A clear request lifecycle
Each request should move through clear, well-defined phases: loading the current state, applying changes, validating the resulting state, determining the Elasticsearch actions, and executing the actions. This structure would help engineers understand where things happen—and why.
A unified state model
We wanted a clear model of desired vs. current state—a single place to reason about the outcome of a change. This would enable safer validation, more efficient updates, and easier debugging by allowing us to compute the difference between the two states.
A single commit point
All Elasticsearch changes should happen in one place, after everything’s validated and we know exactly what needs to change. This would reduce side effects, make failures easier to manage, and unlock support for dry runs.
Isolated stream logic
We needed clearer separation between stream types so each could be developed and tested in isolation. This would simplify adding new types, reduce unintended side effects, and clarify whether changes belong to a stream type or the state management layer.
Bulk operations and system introspection
Finally, we wanted to support features like bulk updates, dry runs, and health diagnostics—capabilities that were difficult or impossible with the old design. A more explicit and inspectable model of system state would make this possible.

These goals became our north star as we explored new architectural patterns to get there, with a strong focus on comparing the current state with the desired state.

Where We Drew Inspiration From

Our new design drew inspiration from two well-known open source projects: Kubernetes and React. Though very different, both share a central concept: reconciliation.

Reconciliation means comparing two states, calculating their differences, and taking the necessary actions to move the system from its current state to its desired state.

In Kubernetes, you declare the desired state of your resources, and the controller continuously works to align the cluster with that state.
In React, each component defines how it should render, and the virtual DOM updates the real DOM efficiently to match that.

We were also inspired by the Plan/Execute pattern which aims to separate decision making from execution. This sounded like what we needed in order to perform all validations before committing to any actions—ensuring we could reason about and inspect the system's intent ahead of time.

These concepts resonated with what we needed. It made clear that we required two key pieces:

A model representing system state, responsible for comparing states and driving the overall workflow (like the Kubernetes controller loop).
A representation of individual streams that make up that state, handling the specific logic for each stream type (like React components).

Each Stream is defined and stored in Elasticsearch. We recognized a disconnect between data management and state changes in our existing code, so we designed each stream to manage both. This fits naturally with the Active Record pattern, where a class encapsulates both domain logic and persistence.

To make the system easier to extend and the state model’s interface simpler, we implemented an abstract Active Record class using the Template Method pattern, clearly defining the interface new stream types must follow.

We did have some concerns that adopting these more advanced patterns—like reconciliation, the Active Record, and Template Method—might make it harder for new or less experienced engineers to get up to speed. While the code would be cleaner and more straightforward for those familiar with the patterns, we worried it could create a barrier for juniors or newcomers unfamiliar with these concepts.

In practice, however, we found the opposite: the code became easier to follow because the patterns provided a clear, consistent structure. More importantly, the architectural choices helped keep the focus on the domain itself, rather than on complex implementation details, making it more approachable for the whole team. The patterns are there but the code doesn't talk about them, it talks about the domain.

How We Structured the System

When a request hits one of our API endpoints in Kibana, the handler performs basic request validation, then passes the request to the Streams Client. The client’s job is to translate the request into one or more Change objects. Each Change represents the creation, modification, or deletion of a Stream.

These Change objects are then passed to a central class we introduced called State, which plays two key roles:

It holds the set of Stream instances that make up the current version of the system.
It orchestrates the pipeline that applies changes and transitions from one state to another.

Let’s walk through the key phases the State class manages when applying a change.

Loading the Starting State

First, the State class loads the current system state by reading the stored Stream definitions from Elasticsearch. This becomes our reference point for all subsequent comparisons—used during validation, diffing, and action planning.

Applying Changes

We begin by cloning the starting state. Each Stream is responsible for cloning itself. Then we process each incoming Change:

The change is presented to all Streams in the current state (creating a new one if needed).
Each Stream can react by updating itself and optionally emitting cascading changes—additional changes that ripple through related Streams.
Cascading changes are processed in a loop until no more are generated (or until we hit a safety threshold).

We then move to the next requested Change.
If any requested or cascading Change cannot be applied safely, the system aborts the entire request to prevent partial updates.

Validating the Desired State

Once we’ve applied all Changes and cascading effects, we run validations to ensure the resulting configuration is safe and consistent.

Each Stream is asked to validate itself in the context of the full desired state and the original starting state. This allows for both localized checks (within a Stream) and broader coordination (between related Streams). If any validation fails, we abort the request.

Determining Actions

Next, each Stream is asked to determine what Elasticsearch actions are needed to move from the starting state to the desired state. This is the first point where the system needs to consider which Elasticsearch resources back an individual Stream.

If the request is a dry run, we stop here and return a summary of what would happen. If it’s meant to be executed, we move to the next phase.

Planning and Execution

The list of Elasticsearch actions is handed off to a dedicated class called ExecutionPlan. This class handles:

Resolving cross-stream dependencies that individual Streams cannot address alone.
Organizing the actions into the correct order to ensure safe application (e.g. to avoid data loss when routing rules change).
Maximizing parallelism wherever possible within those ordering constraints.

If the plan executes successfully, we return a success response from the API.

Handling Failures

If the plan fails during execution, the State class attempts a roll back—it computes a new plan that should return the system to its starting state (by going from desired state to starting state instead) and tries to execute it.

If the roll back also fails, we have a fallback mechanism: a “reset” operation that re-applies the known-good state stored in Elasticsearch, skipping diffing entirely.

A Closer Look at the Stream Active Record Classes

All Streams in the State are subclasses of an abstract class called StreamActiveRecord. This class is responsible for:

Tracking the change status of the Stream
Routing change application, validation, and action determination to specialized template method hooks implemented by its concrete subclasses based on the change status.

These hooks are as follows:

Apply upsert / Apply deletion
Validate upsert / Validate deletion
Determine actions for creation / change / deletion

With this architecture in place, we’ve created a clear, phased, and declarative flow from input to action—one that’s modular, testable, and resilient to failure. It cleanly separates generic stream lifecycle logic (like change tracking and orchestration) from stream-specific behaviors (such as what “upsert” means for a given Stream type), enabling a highly extensible system. This structure allows us to isolate side effects, validate with confidence, and reason more clearly about system-wide behavior—all while supporting dry runs and bulk operations.

Now that we’ve covered how it works, let’s explore what this unlocks—the capabilities, safety guarantees, and new workflows this design makes possible.

What This Unlocks

The reconciliation based design we landed on isn’t just easier to reason about—it directly addresses many of the core limitations we faced in the earlier version of the system.

Bulk operations and dry runs, by design

One of our key goals was to support bulk configuration changes across many Streams in a single request. The previous codebase made this difficult because the side effects were interleaved with decision-making logic, making it risky to apply multiple changes at once.

Now, bulk changes are the default. The State class handles any number of changes, tracks cascading effects automatically, and validates the end result as a whole. Whether you're updating one Stream or fifty, the pipeline handles it consistently.

Dry runs were another desired feature. Because actions are now computed in a side-effect-free step—before anything is sent to Elasticsearch—we can generate a full preview of what would happen. This includes both which Streams would change and what specific Elasticsearch operations would be performed. That visibility helps users and developers make confident, informed decisions.

Easier debugging, better diagnostics

In the old system, debugging required reconstructing the execution context and piecing together side effects. Now, every phase of the pipeline is explicit and testable in isolation by following the phases.

Because validation and Elasticsearch actions are now tied directly to the Stream definition and lifecycle, any inconsistencies or errors are easier to trace to their source.

Validated planning before execution

Because we now validate and plan before making any changes, the risk of leaving the system in an inconsistent or partially-updated state has been greatly reduced. All actions are determined in advance, and only executed once we’re confident the entire set of changes is valid and coherent.

And if something does go wrong during execution, we can lean on the fact that both the starting and desired states are fully modeled in memory. This allows us to generate a roll back plan automatically, and when that’s not possible, fall back to a complete reset from the stored state. In short: safety is now built in, not bolted on.

Extensible by default

Adding a new type of Stream used to mean editing logic scattered across multiple files. Now, it’s a focused, well-defined task. You subclass StreamActiveRecord and implement the handful of lifecycle hooks.

That’s it. The orchestration, tracking, and dependency handling are already wired up. That also means it’s easier to onboard new developers or experiment with new Stream types without fear of breaking unrelated parts of the system.

Easier to test

Because each Stream is now encapsulated and has clear, isolated responsibilities, testing is much simpler. You can test individual Stream classes by simulating specific inputs and asserting the resulting cascading changes, validation results, or Elasticsearch actions. There's no need to spin up a full end-to-end environment just to test a single validation.

What’s Next

At Elastic, we live by our Source Code, which states “Progress, SIMPLE Perfection”—a reminder to favor steady, incremental improvement over chasing perfection.

This new system is a solid foundation—but it’s only the beginning. Our focus so far has been on clarity, safety, and extensibility, and while we’ve addressed some long-standing pain points, there’s still plenty of room to evolve.

Continuous improvement ahead

We intentionally shipped this work with a sharp scope and have already identified several enhancements that we will be adding in the coming weeks:

Introduce a locking layer
To safely handle concurrent updates, we plan to introduce a locking mechanism that prevents race conditions during parallel modifications.
Expose bulk and dry-run features via our APIs
The State class already supports them—now it’s time to make those capabilities available to users.
Improve debugging output
Now that state transitions are modeled explicitly, we can expose clearer diagnostics to help both users and developers reason about changes.
Avoiding Redundant Elasticsearch Requests
Currently we make multiple redundant requests during validation. Introducing a lightweight in-memory cache would let us avoid reloading the same resource more than once.
Improve access controls
Currently, we rely on Elasticsearch to enforce access control. Because a single change can touch many different resources, it’s difficult to determine up front which privileges are required. We plan to extend our action definitions with privilege metadata, enabling us to validate the full set of required permissions before executing any actions. This will let us detect and report missing privileges early—before the plan runs.
Add APM instrumentation
With the system structured in distinct, well-defined phases, we’re now in a great position to add performance instrumentation. This will help us identify bottlenecks and improve responsiveness over time.

Revisiting responsibilities

As our orchestration becomes more robust, we’re also re-evaluating where it should live. Large-scale bulk operations, for example, might eventually be better handled closer to Elasticsearch itself, where we can benefit from greater atomicity and tighter performance guarantees. That kind of deep integration would have been premature earlier on—when we were still figuring out the right abstractions and phases for the system. But now that the design has stabilized, we’re in a much better position to start that conversation.

Built to evolve

We designed this system with adaptability in mind. Whether improvements come in the form of internal refactors, better developer experience, or deeper collaboration with Elasticsearch, we’re in a strong position to keep evolving. The architecture is modular by design—and that gives us both the stability to rely on and the flexibility to grow.

Wrapping Up

Building robust, maintainable systems is never just about code — it’s about aligning architecture with the evolving needs and direction of the product. Our journey refactoring Streams reaffirmed that a thoughtful, phased approach not only improves technical clarity but also empowers teams to move faster and innovate more confidently.

If you’re working on complex systems facing similar challenges—whether tangled logic, unpredictable side effects, or the need for extensibility—you’re not alone. We hope our story offers some useful insights and inspiration as you shape your own path forward.

We welcome feedback and collaboration from the community—whether it’s in the form of questions, ideas, or code.

To learn more about Streams, explore:

Read about Reimagining streams

Look at the Streams website

Read the Streams documentation

Check out the pull request on GitHub to dive into the code or join the conversation.

How to remove PII from your Elastic data in 3 easy steps

Tue, 20 Jun 2023 00:00:00 GMT

Personally identifiable information (PII) compliance is an ever-increasing challenge for any organization. Whether you’re in ecommerce, banking, healthcare, or other fields where data is sensitive, PII may inadvertently be captured and stored. Having structured logs enables quick identification, removal, and protection of sensitive data fields easily; but what about unstructured messages? Or perhaps call center transcriptions?

Elasticsearch, with its long experience in machine learning, provides various options to bring in custom models, such as large language models (LLMs), and provides its own models. These models will help implement PII redaction.

If you would like to learn more about natural language processing, machine learning, and Elastic, please be sure to check out these related articles:

In this blog, we will show you how to set up PII redaction through the use of Elasticsearch’s ability to load a trained model within machine learning and the flexibility of Elastic’s ingest pipelines.

Specifically, we’ll walk through setting up a named entity recognition (NER) model for person and location identification, as well as deploying the redact processor for custom data identification and removal. All of this will then be combined with an ingest pipeline where we can use Elastic machine learning and data transformations capabilities to remove sensitive information from your data.

Loading the trained model

Before we begin, we must load our NER model into our Elasticsearch cluster. This may be easily accomplished with Docker and the Elastic Eland client. From a command line, let’s install the Eland client via git:

git clone https://github.com/elastic/eland.git

Navigate into the recently downloaded client:

cd eland/

Now let’s build the client:

docker build -t elastic/eland .

From here, you’re ready to deploy the trained model to an Elastic machine learning node! Be sure to replace your username, password, es-cluster-hostname, and esport.

If you’re using the Elastic Cloud or have signed certificates, simply run this command:

docker run -it --rm --network host elastic/eland eland_import_hub_model --url https://:@:/ --hub-model-id dslim/bert-base-NER --task-type ner --start

If you’re using self-signed certificates, run this command:

docker run -it --rm --network host elastic/eland eland_import_hub_model --url https://:@:/ --insecure --hub-model-id dslim/bert-base-NER --task-type ner --start

From here you’ll witness the Eland client in action downloading the trained model from HuggingFace and automatically deploying it into your cluster!

Synchronize your newly loaded trained model by clicking on the blue hyperlink via your Machine Learning Overview UI “Synchronize your jobs and trained models.”

Now click the Synchronize button.

That’s it! Congratulations, you just loaded your first trained model into Elastic!

Create the redact processor and ingest pipeline

From DevTools, let’s configure the redact processor along with our inference processor to take advantage of Elastic’s trained model we just loaded. This will create an ingest pipeline named “redact” that we can then use to remove sensitive data from any field we wish. In this example, I’ll be focusing on the “message” field. Note: at the time of this writing, the redact processor is experimental and must be created via DevTools.

PUT _ingest/pipeline/redact
{
  "processors": [
    {
      "set": {
        "field": "redacted",
        "value": "{{{message}}}"
      }
    },
    {
      "inference": {
        "model_id": "dslim__bert-base-ner",
        "field_map": {
          "message": "text_field"
        }
      }
    },
    {
      "script": {
        "lang": "painless",
        "source": "String msg = ctx['message'];\r\n                for (item in ctx['ml']['inference']['entities']) {\r\n                msg = msg.replace(item['entity'], '<' + item['class_name'] + '>')\r\n                }\r\n                ctx['redacted']=msg"
      }
    },
    {
      "redact": {
        "field": "redacted",
        "patterns": [
          "%{EMAILADDRESS:EMAIL}",
          "%{IP:IP_ADDRESS}",
          "%{CREDIT_CARD:CREDIT_CARD}",
          "%{SSN:SSN}",
          "%{PHONE:PHONE}"
        ],
        "pattern_definitions": {
          "CREDIT_CARD": "\d{4}[ -]\d{4}[ -]\d{4}[ -]\d{4}",
          "SSN": "\d{3}-\d{2}-\d{4}",
          "PHONE": "\d{3}-\d{3}-\d{4}"
        }
      }
    },
    {
      "remove": {
        "field": [
          "ml"
        ],
        "ignore_missing": true,
        "ignore_failure": true
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "failure",
        "value": "pii_script-redact"
      }
    }
  ]
}

OK, but what does each processor really do? Let’s walk through each processor in detail here:

The SET processor creates the field “redacted,” which is copied over from the message field and used later on in the pipeline.
The INFERENCE processor calls the NER model we loaded to be used on the message field for identifying names, locations, and organizations.
The SCRIPT processor then replaced the detected entities within the redacted field from the message field.
Our REDACT processor uses Grok patterns to identify any custom set of data we wish to remove from the redacted field (which was copied over from the message field).
The REMOVE processor deletes the extraneous ml.* fields from being indexed; note we’ll add “message” to this processor once we validate data is being redacted properly.
The ON_FAILURE / SET processor captures any errors just in case we have them.

Slice your PII

Now that your ingest pipeline with all the necessary steps has been configured, let’s start testing how well we can remove sensitive data from documents. Navigate over to Stack Management, select Ingest Pipelines and search for “redact”, and then click on the result.

Click on the Manage button, and then click Edit.

Here we are going to test our pipeline by adding some documents. Below is a sample you can copy and paste to make sure everything is working correctly.

{
  "_source":
    {
      "message": "John Smith lives at 123 Main St. Highland Park, CO. His email address is jsmith123@email.com and his phone number is 412-189-9043.  I found his social security number, it is 942-00-1243. Oh btw, his credit card is 1324-8374-0978-2819 and his gateway IP is 192.168.1.2",
    },
}

Simply press the Run the pipeline button, and you will then see the following output:

What’s next?

After you’ve added this ingest pipeline to a data set you’re indexing and validated that it is meeting expectations, you can add the message field to be removed so that no PII data is indexed. Simply update your REMOVE processor to include the message field and simulate again to only see the redacted field.

Conclusion

With this step-by-step approach, you are now ready and able to detect and redact any sensitive data throughout your indices.

Here’s a quick recap of what we covered:

Loading a pre-trained named entity recognition model into an Elastic cluster
Configuring the Redact processor, along with the inference processor, to use the trained model during data ingestion
Testing sample data and modifying the ingest pipeline to safely remove personally identifiable information

Ready to get started? Sign up for Elastic Cloud and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.

In this blog post, we may have used third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Gain insights into Kubernetes errors with Elastic Observability logs and OpenAI

Thu, 18 May 2023 00:00:00 GMT

As we’ve shown in previous blogs, Elastic^® provides a way to ingest and manage telemetry from the Kubernetes cluster and the application running on it. Elastic provides out-of-the-box dashboards to help with tracking metrics, log management and analytics, APM functionality (which also supports native OpenTelemetry), and the ability to analyze everything with AIOps features and machine learning (ML). While you can use pre-existing ML models in Elastic, out-of-the-box AIOps features, or your own ML models, there is a need to dig deeper into the root cause of an issue.

Elastic helps reduce the operational work to support more efficient operations, but users still need a way to investigate and understand everything from the cause of an issue to the meaning of specific error messages. As an operations user, if you haven’t run into a particular error before or it's part of some runbook, you will likely go to Google and start searching for information.

OpenAI’s ChatGPT is becoming an interesting generative AI tool that helps provide more information using the models behind it. What if you could use OpenAI to obtain deeper insights (even simple semantics) for an error in your production or development environment? You can easily tie Elastic to OpenAI’s API to achieve this.

Kubernetes, a mainstay in most deployments (on-prem or in a cloud service provider) requires a significant amount of expertise — even if that expertise is to manage a service like GKE, EKS, or AKS.

In this blog, I will cover how you can use Elastic’s watcher capability to connect Elastic to OpenAI and ask it for more information about the error logs Elastic is ingesting from a Kubernetes cluster(s). More specifically, we will use Azure’s OpenAI Service. Azure OpenAI is a partnership between Microsoft and OpenAI, so the same models from OpenAI are available in the Microsoft version.

While this blog goes over a specific example, it can be modified for other types of errors Elastic receives in logs. Whether it's from AWS, the application, databases, etc., the configuration and script described in this blog can be modified easily.

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up the configuration:

Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
We used a GCP GKE Kubernetes cluster, but you can use any Kubernetes cluster service (on-prem or cloud based) of your choice.
We’re also running with a version of the OpenTelemetry Demo. Directions for using Elastic with OpenTelemetry Demo are here.
We also have an Azure account and Azure OpenAI service configured. You will need to get the appropriate tokens from Azure and the proper URL endpoint from Azure’s OpenAI service.
We will use Elastic’s dev tools, the console to be specific, to load up and run the script, which is an Elastic watcher.
We will also add a new index to store the results from the OpenAI query.

Here is the configuration we will set up in this blog:

As we walk through the setup, we’ll also provide the alternative setup with OpenAI versus Azure OpenAI Service.

Setting it all up

Over the next few steps, I’ll walk through:

Getting an account on Elastic Cloud and setting up your K8S cluster and application
Gaining Azure OpenAI authorization (alternative option with OpenAI)
Identifying Kubernetes error logs
Configuring the watcher with the right script
Comparing the output from Azure OpenAI/OpenAI versus ChatGPT UI

Step 0: Create an account on Elastic Cloud

Follow the instructions to get started on Elastic Cloud.

Once you have the Elastic Cloud login, set up your Kubernetes cluster and application. A complete step-by-step instructions blog is available here. This also provides an overview of how to see Kubernetes cluster metrics in Elastic and how to monitor them with dashboards.

Step 1: Azure OpenAI Service and authorization

When you log in to your Azure subscription and set up an instance of Azure OpenAI Service, you will be able to get your keys under Manage Keys.

There are two keys for your OpenAI instance, but you only need KEY 1 .

Additionally, you will need to get the service URL. See the image above with our service URL blanked out to understand where to get the KEY 1 and URL.

If you are not using Azure OpenAI Service and the standard OpenAI service, then you can get your keys at:

**https** ://platform.openai.com/account/api-keys

You will need to create a key and save it. Once you have the key, you can go to Step 2.

Step 2: Identifying Kubernetes errors in Elastic logs

As your Kubernetes cluster is running, Elastic’s Kubernetes integration running on the Elastic agent daemon set on your cluster is sending logs and metrics to Elastic. The telemetry is ingested, processed, and indexed. Kubernetes logs are stored in an index called .ds-logs-kubernetes.container_logs-default-* (* is for the date), and an automatic data stream logs-kubernetes.container_logs is also pre-loaded. So while you can use some of the out-of-the-box dashboards to investigate the metrics, you can also look at all the logs in Elastic Discover.

While any error from Kubernetes can be daunting, the more nuanced issues occur with errors from the pods running in the kube-system namespace. Take the pod konnectivity agent, which is essentially a network proxy agent running on the node to help establish tunnels and is a vital component in Kubernetes. Any error will cause the cluster to have connectivity issues and lead to a cascade of issues, so it’s important to understand and troubleshoot these errors.

When we filter out for error logs from the konnectivity agent, we see a good number of errors.

But unfortunately, we still can’t understand what these errors mean.

Enter OpenAI to help us understand the issue better. Generally, you would take the error message from Discover and paste it with a question in ChatGPT (or run a Google search on the message).

One error in particular that we’ve run into but do not understand is:

E0510 02:51:47.138292       1 client.go:388] could not read stream err=rpc error: code = Unavailable desc = error reading from server: read tcp 10.120.0.8:46156->35.230.74.219:8132: read: connection timed out serverID=632d489f-9306-4851-b96b-9204b48f5587 agentID=e305f823-5b03-47d3-a898-70031d9f4768

The OpenAI output is as follows:

ChatGPT has given us a fairly nice set of ideas on why this rpc error is occurring against our konnectivity-agent.

So how can we get this output automatically for any error when those errors occur?

Step 3: Configuring the watcher with the right script

What is an Elastic watcher? Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Watchers are helpful for analyzing mission-critical and business-critical streaming data. For example, you might watch application logs for errors causing larger operational issues.

Once a watcher is configured, it can be:

Manually triggered
Run periodically
Created using a UI or a script

In this scenario, we will use a script, as we can modify it easily and run it as needed.

We’re using the DevTools Console to enter the script and test it out:

The script is listed at the end of the blog in the appendix. It can also be downloaded here .

The script does the following:

It runs continuously every five minutes.
It will search the logs for errors from the container konnectivity-agent.
It will take the first error’s message, transform it (re-format and clean up), and place it into a variable first_hit.

"script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"

The error message is sent into OpenAI with a query:

What are the potential reasons for the following kubernetes error:
  { { ctx.payload.second.first_hit } }

If the search yielded an error, it will proceed to then create an index and place the error message, pod.name (which is konnectivity-agent-6676d5695b-ccsmx in our setup), and OpenAI output into a new index called chatgpt_k8_analyzed.

To see the results, we created a new data view called chatgpt_k8_analyzed against the newly created index:

In Discover, the output on the data view provides us with the analysis of the errors.

For every error the script sees in the five minute interval, it will get an analysis of the error. We could alternatively also use a range as needed to analyze during a specific time frame. The script would just need to be modified accordingly.

Step 4. Output from Azure OpenAI/OpenAI vs. ChatGPT UI

As you noticed above, we got relatively the same result from the Azure OpenAI API call as we did by testing out our query in the ChatGPT UI. This is because we configured the API call to run the same/similar model as what was selected in the UI.

For the API call, we used the following parameters:

"request": {
             "method" : "POST",
             "Url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
             "headers": {"api-key" : "XXXXXXX",
                         "content-type" : "application/json"
                        },
             "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
              "connection_timeout": "60s",
               "read_timeout": "60s"
                            }

By setting the role: system with You are a helpful assistant and using the gpt-35-turbo url portion, we are essentially setting the API to use the davinci model, which is the same as the ChatGPT UI model set by default.

Additionally, for Azure OpenAI Service, you will need to set the URL to something similar the following:

https://YOURSERVICENAME.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview

If you use OpenAI (versus Azure OpenAI Service), the request call (against https://api.openai.com/v1/completions) would be as such:

"request": {
            "scheme": "https",
            "host": "api.openai.com",
            "port": 443,
            "method": "post",
            "path": "\/v1\/completions",
            "params": {},
            "headers": {
               "content-type": "application\/json",
               "authorization": "Bearer YOUR_ACCESS_TOKEN"
                        },
            "body": "{ \"model\": \"text-davinci-003\",  \"prompt\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\",  \"temperature\": 1,  \"max_tokens\": 512,     \"top_p\": 1.0,      \"frequency_penalty\": 0.0,   \"presence_penalty\": 0.0 }",
            "connection_timeout_in_millis": 60000,
            "read_timeout_millis": 60000
          }

If you are interested in creating a more OpenAI-based version, you can download an alternative script and look at another blog from an Elastic community member.

Gaining other insights beyond Kubernetes logs

Now that the script is up and running, you can modify it using different:

Inputs
Conditions
Actions
Transforms

Learn more on how to modify it here. Some examples of modifications could include:

Look for error logs from application components (e.g., cartService, frontEnd, from the OTel demo), cloud service providers (e.g., AWS/Azure/GCP logs), and even logs from components such as Kafka, databases, etc.
Vary the time frame from running continuously to running over a specific range.
Look for specific errors in the logs.
Query for analysis on a set of errors at once versus just one, which we demonstrated.

The modifications are endless, and of course you can run this with OpenAI rather than Azure OpenAI Service.

Conclusion

I hope you’ve gotten an appreciation for how Elastic Observability can help you connect to OpenAI services (Azure OpenAI, as we showed, or even OpenAI) to better analyze an error log message instead of having to run several Google searches and hunt for possible insights.

Here’s a quick recap of what we covered:

Developing an Elastic watcher script that can be used to find and send Kubernetes errors into OpenAI and insert them into a new index
Configuring Azure OpenAI Service or OpenAI with the right authorization and request parameters

Ready to get started? Sign up for Elastic Cloud and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.

Appendix

Watcher script

PUT _watcher/watch/chatgpt_analysis
{
    "trigger": {
      "schedule": {
        "interval": "5m"
      }
    },
    "input": {
      "chain": {
          "inputs": [
              {
                  "first": {
                      "search": {
                          "request": {
                              "search_type": "query_then_fetch",
                              "indices": [
                                "logs-kubernetes*"
                              ],
                              "rest_total_hits_as_int": true,
                              "body": {
                                "query": {
                                  "bool": {
                                    "must": [
                                      {
                                        "match": {
                                          "kubernetes.container.name": "konnectivity-agent"
                                        }
                                      },
                                      {
                                        "match" : {
                                          "message":"error"
                                        }
                                      }
                                    ]
                                  }
                                },
                                "size": "1"
                              }
                            }
                        }
                    }
                },
                {
                    "second": {
                        "transform": {
                            "script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"
                        }
                    }
                },
                {
                    "third": {
                        "http": {
                            "request": {
                                "method" : "POST",
                                "url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
                                "headers": {
                                    "api-key" : "XXX",
                                    "content-type" : "application/json"
                                },
                                "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
                                "connection_timeout": "60s",
                                "read_timeout": "60s"
                            }
                        }
                    }
                }
            ]
        }
    },
    "condition": {
      "compare": {
        "ctx.payload.first.hits.total": {
          "gt": 0
        }
      }
    },
    "actions": {
        "index_payload" : {
            "transform": {
                "script": {
                    "source": """
                        def payload = [:];
                        payload.timestamp = new Date();
                        payload.pod_name = ctx.payload.first.hits.hits[0]._source.kubernetes.pod.name;
                        payload.error_message = ctx.payload.second.first_hit;
                        payload.chatgpt_analysis = ctx.payload.third.choices[0].message.content;
                        return payload;
                    """
                }
            },
            "index" : {
                "index" : "chatgpt_k8s_analyzed"
            }
        }
    }
}

Additional logging resources:

Common use case examples with logs:

Screenshots of Microsoft products used with permission from Microsoft.

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Mon, 01 Dec 2025 00:00:00 GMT

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Introduction

We're excited to introduce Elastic Observability’s Amazon Bedrock AgentCore integration, which allows users to observe Amazon Bedrock AgentCore and the agents' LLM interactions end-to-end. Agentic AI represents a fundamental shift in how we build applications.

Unlike standard LLM chatbots that simply generate text, agents can reason, plan, and execute multi-step workflows to complete complex tasks autonomously. Many times these agents are running on a platform such as Amazon Bedrock AgentCore, which helps developers build, deploy and scale agents. Amazon Bedrock AgentCore is Amazon Bedrock's platform providing the secure, scalable, and modular infrastructure services (like agent runtime, memory, and identity) necessary for developers to deploy and operate highly capable AI agents built with any framework or model.

Using a platform, such as Amazon Bedrock Agentcore, is easy, but troubleshooting an agent is far more complex than debugging a standard microservice. Key challenges include:

Non-Deterministic Behavior: Agents may choose different tools or reasoning paths for the same prompt, making it difficult to reproduce bugs.
"Black Box" Execution: When an agent fails or provides a hallucinated answer, it is often unclear if the issue lies in the LLM's reasoning, the context provided, or a failed tool execution.
Cost & Latency Blind Spots: A single user query can trigger recursive loops or expensive multi-step tool calls, leading to unexpected spikes in token usage and latency.

To effectively observe these systems, you need to correlate signals from two distinct layers:

The Platform Layer (Amazon Bedrock AgentCore): You need to understand the overall health of the managed service. This includes high-level metrics like invocation counts, latency, throttling, and platform-level errors that affect all agents running in AgentCore.
The Application Layer (Your Agentic Logic): You want to understand the granular "why" behind the behavior. This includes distributed traces, usually with OpenTelemetry, that visualize the full request lifecycle (e.g. waterfall view), identifying exactly which step in the reasoning chain failed or took too long.

Agentic AI Observability in Elastic provides a unified, end-to-end view of your agentic deployment by combining platform-level insights from Amazon Bedrock AgentCore, through the new Amazon Bedrock AgentCore integration, with deep application-level visibility from OpenTelemetry (OTel) traces, logs and metrics form the agent. This unified view in Elastic allows you to observe, troubleshoot, and optimize your agentic applications from end to end without switching tools. Additionally, Elastic provides Agent Builder which allows you to create agents to analyze any of the data from Amazon Bedrock AgentCore and the agents running on it.

Agentic AI Observability in Elastic

As mentioned above there are two main parts to end-to-end Agentic AI Observability in Elastic.

Amazon Bedrock AgentCore Platform Observability - using platform logs and metrics, Elastic provides comprehensive visibility into the high-level health of the AgentCore service by ingesting AWS vended logs and metrics across four critical components:
- Runtime: Monitor core performance indicators such as agent errors, overall latency, throttle counts, and invocation rates, for each endpoint.
- Gateway: specific insights into gateway and tool call performance, including invocations, error rates, and latency.
- Memory: Track short-term and long-term memory operations, including event creation, retrieval, and listing, alongside performance analysis, errors, and latency metrics.
- Identity: Audit security and access health with logs on successful and failed access attempts.

Agent Observability with APM, logs and metrics - To understand how your agent is behaving, Elastic ingests OTel-native traces, metrics and logs from your application running within AgentCore. This allows you to visualize the full execution path, including LLM reasoning steps and tool calls, in a detailed waterfall diagram.

Agentic AI Analysis - All of the data from Amazon Bedrock AgentCore and the agent running on it, can be analyzed with Elastic’s AI driven capabilities. These include:

Elastic AgentCore SRE Agent built on Elastic Agent Builder - We don't just monitor agents; we provide you with one to assist your team. The AgentCore SRE Agent is a specialized assistant built using Elastic Agent Builder. It possesses specialized knowledge of AgentCore applications observed in Elastic.
- How it helps: You can ask specific questions regarding your AgentCore environment, such as how to interpret a complex error log or why a specific trace shows latency.
- Get the Agent: You can deploy this agent yourself from our GitHub repository.
Elastic Observability AI Assistant - Use natural language anywhere in Elastic’s UI to help you pinpoint issues, analyze something specific, or just learn what the problem is through LLM knowledge base. Additionally, SREs can interpret log messages, errors, metrics patterns, optimize code, write reports, and even identify and execute a runbook, or find a related github issue.
Streams - AI-Driven Log Analysis - When you send AgentCore logs from your instrumented application into Elastic, you can parse and analyze them. Additionally, Streams finds Significant Events within your log stream allowing you to focus immediately on what matters most.
Dashboards and ES|QL Data is only useful if you can act on it. Elastic provides out-of-the-box (OOTB) assets to accelerate your mean time to resolution (MTTR). And Elastic provides ES|QL to help you perform ad-hoc analysis on any signal
- OOTB Dashboards: Pre-built visualizations based on AgentCore service signals. These dashboards provide an immediate, high-level overview of the usage, health, and performance of your AgentCore runtime, gateway, memory, and identity components.
- OOTB Alert Templates: Pre-configured alerts for common agentic issues (e.g., high error rates, latency spikes, or unusual token consumption), allowing you to move from reactive to proactive troubleshooting immediately.

Onboarding Amazon Bedrock AgentCore signals into Elastic

Amazon Bedrock AgentCore Integration

To get started with platform-level visibility, you need to enable the Amazon Bedrock AgentCore integration in Elastic. This integration automatically collects metrics and logs from your AgentCore runtime, gateway, memory, and identity components via Amazon CloudWatch.

Setup Steps:

Prepare AWS Environment: Ensure your AgentCore agents are deployed and running and that you have enabled logging on your AgentCore resources in the AWS console.
Add the Integration:
- In Elastic (Kibana), navigate to Integrations.
- Search for "Amazon Bedrock AgentCore". Select Add Amazon Bedrock AgentCore.
Configure & Deploy:

Configure Elastic's Amazon Bedrock AgentCore integration to collect CloudWatch metrics from your chosen AWS region at the specified collection interval. Logs will be added soon after the publication of this blog.

Onboard the Agent with OTel Instrumentation

The next step is observing the application logic itself. The beauty of Amazon Bedrock AgentCore is that the application runtime often comes pre-instrumented. You simply need to tell it where to send the telemetry data.

For this example, we will use the Travel Assistant from the Elastic Observability examples.

To instrument this agent, you do not need to modify the source code. Instead, when you invoke the agent using the agentcore CLI, you simply pass your Elastic connection details as environment variables. This redirects the OTel signals (traces, metrics, and logs) directly to the Elastic EDOT collector.

Example Invoke Command: Run the following command to launch the agent and start streaming telemetry to Elastic:

    agentcore launch \
    --env BEDROCK_MODEL_ID="us.anthropic.claude-3-5-sonnet-20240620-v1:0" \
    --env OTEL_EXPORTER_OTLP_ENDPOINT="https://.region.cloud.elastic.co:443" \
    --env OTEL_EXPORTER_OTLP_HEADERS="Authorization=ApiKey " \
    --env OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" \
    --env OTEL_METRICS_EXPORTER="otlp" \
    --env OTEL_TRACES_EXPORTER="otlp" \
    --env OTEL_LOGS_EXPORTER="otlp" \
    --env OTEL_RESOURCE_ATTRIBUTES="service.name=travel_assistant,service.version=1.0.0" \
    --env AGENT_OBSERVABILITY_ENABLED="true" \
    --env DISABLE_ADOT_OBSERVABILITY="true" \
    --env TAVILY_API_KEY=""

Key Configuration Parameters:

OTEL_EXPORTER_OTLP_ENDPOINT: Your Elastic OTLP endpoint (ensure port 443 is specified).
OTEL_EXPORTER_OTLP_HEADERS: The Authorization header containing your Elastic API Key.
DISABLE_ADOT_OBSERVABILITY=true: This ensures the native AgentCore signals are routed exclusively to your defined endpoint (Elastic) rather than default AWS paths.

Analyzing Agentic Data in Elastic Observability

As we walk through the analysis features below, we will use the Travel Assistant agent which we instrumented earlier as well as any other apps you may be running on AgentCore. For the purposes of this example, as a second agent, we will use Customer Support Assistant from the AWS Labs AgentCore samples

Out-of-the-Box (OOTB) Dashboards

Elastic populates a set of comprehensive dashboards based on Amazon Bedrock AgentCore service logs and metrics. These appear as a unified view with tabs, providing a "single pane of glass" into the operational health of your platform.

This view is divided into four key zones, each addressing specific components of AgentCore - Runtime, Gateway, Memory, Identity. Note that note all agentic applications use all 4 components. In our example only the Customer Assistant uses all four components, whereas the Travel agent uses only Runtime.

Runtime Health

Visualize agent invocations, session metrics, error trends (system vs. user), and performance stats like latency and throttling, split per endpoint. This dashboard helps you answer questions like

"How are my Travel Assistant agent and Customer Support agent performing in terms of overall traffic and latency, and are there any spikes in errors or throttling?"

Gateway Performance

Analyze invocations across Lambda and MCP (Model Context Protocol), with detailed breakdowns for tool vs. non-tool calls. The dashboard highlights throttling detection, target execution times, and separates system errors from user errors.

Question answered: "Are my external integrations (Lambda, MCP) performing efficiently, or are specific tool calls experiencing high latency, throttling, or system-level errors?"

Memory Operations

Track core operations like event creation, retrieval, and listing, alongside deep dives into long-term memory processing. This includes extraction and consolidation metrics broken down by strategy type, as well as specific monitoring for throttling and system vs. user errors.

Question answered: "Are failures in memory consolidation strategies or high retrieval latency preventing the agent from effectively recalling user context?"

Identity & Access

Monitor identity token fetch operations (workload, OAuth, API keys) and real-time authentication success/failure rates. The dashboard breaks down activity by provider and highlights throttling or capacity bottlenecks.

Question answered: "Are authentication failures or token fetch bottlenecks from specific providers preventing agents from accessing required resources?"

Out-of-the-Box (OOTB) Alert Templates

Observability isn't just about looking at dashboards; it's about knowing when to act. To move from reactive checking to proactive monitoring, Elastic provides OOTB Alert Rule Templates (starting with Elastic version 9.2.1).

These templates eliminate guesswork by pre-selecting the optimal metrics to monitor and applying sensible thresholds. This configuration focuses on high-fidelity alerts for genuine anomalies, helping you catch critical issues early while minimizing alert fatigue.

Suggested OOTB Alerts:

Agent Runtime System Errors: Detects server-side errors (500 Internal Server Error) during agent runtime invocations, indicating infrastructure or service issues with AWS Bedrock AgentCore.
Agent Runtime User Errors: Flags client-side errors (4xx) during agent runtime invocations, including validation failures (400), resource not found (404), access denied (403), and resource conflicts (409). This helps catch misconfigured permissions, invalid input, or missing resources early.
Agent Runtime High Latency: Triggers when the average latency for agent runtime invocations exceeds 10 seconds (10,000ms). Latency measures the time elapsed between receiving a request and sending the final response token.

APM Tracing

While logs and metrics tell you that an issue exists, APM Tracing tells you exactly where and why it is happening. By ingesting the OpenTelemetry signals from your instrumented agent, Elastic generates a detailed distributed trace (e.g. waterfall view) for every interaction. To get further details on LLM information such as prompts, responses, token usage, etc, you can explore the APM logs.

This allows you to peer inside the "black box" of the agent's execution flow:

Visualize the Chain of Thought: See the full sequence of events, from the user's initial prompt to the final response, including all intermediate reasoning steps.
Pinpoint Tool Failures: Identify exactly which external tool (e.g., a Lambda function for flight booking or a knowledge base query) failed or timed out.
Analyze Latency Contributors: Distinguish between latency caused by the LLM's generation time versus latency caused by slow downstream API calls.
Debug with Context: Drill down into individual spans to see specific error messages, attributes, and metadata that explain why a particular step failed.

Conclusion

As organizations move from experimental chatbots to complex, autonomous agents in production, the need for robust observability has never been greater. Agentic applications introduce new layers of complexity—non-deterministic behaviors, multi-step reasoning loops, and cost implications—that standard monitoring tools simply cannot see.

Elastic Agentic AI Observability for Amazon Bedrock AgentCore bridges this gap. By unifying platform-level health metrics from AgentCore with deep, transaction-level distributed tracing from OpenTelemetry, Elastic gives SREs and developers the complete picture. Whether you are debugging a failed tool call, optimizing latency, or controlling token costs, you have the visibility needed to run agentic AI with confidence.

Complete Visibility: AgentCore + Amazon Bedrock: For the most comprehensive view, we recommend onboarding Elastic’s Amazon Bedrock integration alongside AgentCore. While the AgentCore integration focuses on the orchestration layer—monitoring agent errors, tool latency, and invocations—the Bedrock integration provides deep visibility into the underlying foundation models themselves. This includes tracking model-specific latency, token usage, full prompts and responses, and even Guardrails usage and effectiveness. By combining both, you ensure complete coverage from the high-level agent workflow down to the raw model inference.

Read more: Monitor Amazon Bedrock with Elastic
Read more: Amazon Bedrock Guardrails Observability

Get Started Today Ready to see your agents in action?

Try it out: Log in to Elastic Cloud and add the Amazon Bedrock AgentCore integration. Or use Elastic from Amazon Marketplace
Explore the Code: Check out our GitHub repository for the Travel assistant which you saw in this blog, as well as the AgentCore SRE Agent.
Learn More: Read the full documentation on setting up integration for Agentic AI Observability for Amazon Bedrock AgentCore.

LLM observability with Elastic: Taming the LLM with Guardrails for Amazon Bedrock

Sun, 02 Mar 2025 00:00:00 GMT

In a previous blog we showed you how to set up observability for your models hosted on Amazon Bedrock using Elastic’s integration. You can now effortlessly enable observability for your Amazon Bedrock guardrails using the enhanced Elastic Amazon Bedrock integration. If you previously onboarded the Amazon Bedrock integration, just upgrade it and you will automatically get all guardrails-related updates. The enhanced integration provides a single pane of glass dashboard with two panels - one focusing on overall Bedrock visualizations as well as a separate panel dedicated to Guardrails. You can now ingest and visualize metrics and logs specific to Guardrails, such as guardrail invocation count, invocation latency, text unit utilization, guardrail policy types associated with interventions and many more.

In this blog we will show you how to set up observability for Amazon Bedrock Guardrails, how you can make use of the enhanced dashboards and what key signals to alert on for an effective observability coverage of your Bedrock guardrails.

Prerequisites

To follow along with this blog, please make sure you have:

An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.16.2 or higher. Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.

Steps to create a guardrail for Amazon Bedrock

Before you set up observability for the guardrails, ensure that you have configured guardrails for your model. Follow the steps below to create an Amazon Bedrock Guardrail

Access the Amazon Bedrock Console
- Sign in to the AWS Management Console with appropriate permissions and navigate to the Amazon Bedrock console.
Navigate to Guardrails
- From the left-hand menu, select Guardrails.
Create a New Guardrail
- Select Create guardrail.
- Provide a descriptive name, an optional brief description, and specify a message to display when the guardrail blocks the user prompt.
  - Example: Sorry, I am not configured to answer such questions. Kindly ask a different question.
Configure Guardrail Policies
- Content Filters: Adjust settings to block harmful content and prompt attacks.
- Denied Topics: Specify topics to block.
- Word Filters: Define specific words or phrases to block.
- Sensitive Information Filters: Set up filters to detect and remove sensitive information.
- Contextual Grounding:
  - Configure the Grounding Threshold to set the minimum confidence level for factual accuracy.
  - Set the Relevance Threshold to ensure responses align with user queries.
Review and Create
- Review your settings and select Create to finalize the guardrail.
Create a Guardrail Version
- In the Version section, select Create.
- Optionally add a description, then select Create Version.

After creating a version of your guardrail, it's important to note down the Guardrail ID and the Guardrail Version Name. These identifiers are essential when integrating the guardrail into your application, as you'll need to specify them during guardrail invocation.

Example code to integrate with Amazon Bedrock guardrails

Integrating Amazon Bedrock's ChatBedrock into your Python application enables advanced language model interactions with customisable safety measures. By configuring guardrails, you can ensure that the model adheres to predefined policies, preventing it from generating inappropriate or sensitive content.

The following code demonstrates how to integrate Amazon Bedrock with guardrails to enforce contextual grounding in AI-generated responses. It sets up a Bedrock client using AWS credentials, defines a reference grounding statement, and uses the ChatBedrock API to process user queries with contextual constraints. The converse_with_guardrails function sends a user query alongside a predefined grounding reference, ensuring that responses align with the provided knowledge source.

Setting Up Environment Variables

Before running the script, configure the required AWS credentials and guardrail settings as environment variables. These variables allow the script to authenticate with Amazon Bedrock and apply the necessary guardrails for safe and controlled AI interactions.

Create a .env file in the same directory as your script and add:

AWS_ACCESS_KEY="your-access-key" 
AWS_SECRET_KEY="your-secret-key" 
AWS_REGION="your-aws-region" 
GUARDRAIL_ID="your-guardrail-id" 
GUARDRAIL_VERSION="your-guardrail-version"

Create a Python script and run

Create a Python script using the code below and execute it to interact with the Amazon Bedrock Guardrails you set up.

import os
import boto3
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
import json
from botocore.exceptions import ClientError

# Load environment variables
load_dotenv()

# Function to check for hallucinations using contextual grounding
def check_hallucination(response):
   output_assessments = response.get("trace", {}).get("guardrail", {}).get("outputAssessments", {})

   # Iterate over all assessments
   for key, assessments in output_assessments.items():
       for assessment in assessments:
           contextual_policy = assessment.get("contextualGroundingPolicy", {})
          
           if "filters" in contextual_policy:
               grounding = relevance = None
               grounding_threshold = relevance_threshold = None

               for filter_result in contextual_policy["filters"]:
                   filter_type = filter_result.get("type")
                   if filter_type == "RELEVANCE":
                       relevance = filter_result.get("score", 0)
                       relevance_threshold = filter_result.get("threshold", 0)
                   elif filter_type == "GROUNDING":
                       grounding = filter_result.get("score", 0)
                       grounding_threshold = filter_result.get("threshold", 0)
          
           if relevance < relevance_threshold or grounding < grounding_threshold:
               return True, relevance, grounding, relevance_threshold, grounding_threshold  # Hallucination detected
  
   return False, relevance, grounding, relevance_threshold, grounding_threshold  # No hallucination detected

def converse_with_guardrails(bedrock_client, messages, grounding_reference):
   message = [
       {
           "role": "user",
           "content": [
               {
                   "guardContent": {
                       "text": {
                           "text": grounding_reference,
                           "qualifiers": ["grounding_source"],
                       }
                   }
               },
               {
                   "guardContent": {
                       "text": {
                           "text": messages,
                           "qualifiers": ["query"],
                       }
                   }
               },
           ],
       }
   ]
   converse_config = {
       "modelId": os.getenv('CHAT_MODEL'),
       "messages": message,
       "guardrailConfig": {
           "guardrailIdentifier": os.getenv("GUARDRAIL_ID"),
           "guardrailVersion": os.getenv("GUARDRAIL_VERSION"),
           "trace": "enabled"
       },
       "inferenceConfig": {
           "temperature": 0.5       
       },
   }
   try:
       response = bedrock_client.converse(**converse_config)
       return response
   except ClientError as e:
       error_message = e.response['Error']['Message']
       print(f"An error occurred: {error_message}")
       print("Converse config:")
       print(json.dumps(converse_config, indent=2))
       return None
  
def pretty_print_response(response, is_hallucination, relevance, relevance_threshold, grounding, grounding_threshold):
   print("\n" + "="*60)
   print(" Guardrail Assessment")
   print("="*60)
   # Extract response message safely
   response_text = response.get("output", {}).get("message", {}).get("content", [{}])[0].get("text", "N/A")
   print("\n **Model Response:**")
   print(f"   {response_text}")
   print("\n **Guardrail Assessment:**")
   print(f"   Is Hallucination : {is_hallucination}")
   print("\n **Contextual Grounding Policy Scores:**")
   print(f"   - Relevance Score : {relevance:.2f} (Threshold: {relevance_threshold:.2f})")
   print(f"   - Grounding Score : {grounding:.2f} (Threshold: {grounding_threshold:.2f})")
   print("\n" + "="*60 + "\n")
  
def main():
   bs = boto3.Session(
       aws_access_key_id=os.getenv('AWS_ACCESS_KEY'),
       aws_secret_access_key=os.getenv('AWS_SECRET_KEY'),
       region_name=os.getenv('AWS_REGION')
   )

   # Initialize Bedrock client
   bedrock_client = bs.client("bedrock-runtime")

   # Grounding reference
   grounding_reference = "The Wright brothers made the first powered aircraft flight on December 17, 1903."

   # User query
   user_query = "Who were the first to fly an airplane?"
  
   # Get model response
   response = converse_with_guardrails(bedrock_client, user_query, grounding_reference)

   # Check for hallucinations
   is_hallucination, relevance, grounding, relevance_threshold, grounding_threshold = check_hallucination(response)

   # Print the results
   pretty_print_response(response, is_hallucination, relevance, relevance_threshold, grounding, grounding_threshold)


if __name__ == "__main__":
   main()

Identifying Hallucinations with Contextual Grounding

The contextual grounding feature proved effective in identifying potential hallucinations by comparing model responses against reference information. Relevance and grounding scores provided quantitative measures to assess the accuracy of model outputs.

The python script run output below demonstrates how the Grounding Score helps detect hallucinations:

============================================================
 Guardrail Assessment
============================================================

 **Model Response:**
   Sorry, I am not configured to answer such questions. Kindly ask a different question.

 **Guardrail Assessment:**
   Is Hallucination : True

 **Contextual Grounding Policy Scores:**
   - Relevance Score : 1.00 (Threshold: 0.99)
   - Grounding Score : 0.03 (Threshold: 0.99)

============================================================

Here, the Grounding Score of 0.03 is significantly lower than the configured threshold of 0.99, indicating that the response lacks factual accuracy. Since the score falls below the threshold, the system flags the response as a hallucination, highlighting the need to monitor guardrail outputs to ensure AI safety.

Configuring Amazon Bedrock Guardrails Metrics & Logs Collection

Elastic makes it easy to collect both logs and metrics from Amazon Bedrock Guardrails using the Amazon Bedrock integration. By default, Elastic provides a curated set of logs and metrics, but you can customize the configuration based on your needs. The integration supports Amazon S3 and Amazon CloudWatch Logs for log collection, along with metrics collection from your chosen AWS region at a specified interval.

Follow these steps to enable the collection of metrics and logs:

Navigate to Amazon Bedrock Settings - In the AWS Console, go to Amazon Bedrock and open the Settings section.
Choose Logging Destination - Select whether to send logs to Amazon S3 or Amazon CloudWatch Logs.
Provide Required Details
- If using Amazon S3, logs can be collected from objects referenced in S3 notification events (read from an SQS queue) or by direct polling from an S3 bucket.
- If using CloudWatch Logs: you need to create a CloudWatch log group and note its ARN, as this will be required for configuring both Amazon Bedrock and Elastic Amazon Bedrock integration.

Configure Elastic's Amazon Bedrock integration - In Elastic, set up the Amazon Bedrock integration, ensuring the logging destination matches the one configured in Amazon Bedrock. Logs from your selected source and metrics from your AWS region will be collected automatically.

Accept Defaults or Customize Settings - Elastic provides a default configuration for logs and metrics collection. You can accept these defaults or adjust settings such as collection intervals to better fit your needs.

Understanding the pre-configured dashboard for Amazon Bedrock Guardrails

You can access the Amazon Bedrock Guardrails dashboard using either of the following methods:

Navigate to the Dashboard Menu - Select the Dashboard menu option in Elastic and search for [Amazon Bedrock] Guardrails to open the dashboard.
Navigate to the Integrations Menu - Open the Integrations menu in Elastic, select Amazon Bedrock, go to the Assets tab, and choose [Amazon Bedrock] Guardrails from the dashboard assets.

The Amazon Bedrock Guardrails dashboard in the Elastic integration provides insights into guardrail performance, tracking total invocations, API latency, text unit usage, and intervention rates. It analyzes policy-based interventions, highlighting trends, text consumption, and frequently triggered policies. The dashboard also showcases instances where guardrails modified or blocked responses and offers a detailed breakdown of invocations by policy and content source.

Guardrail invocation overview

This dashboard section provides a comprehensive summary of key metrics related to guardrail performance and usage:

Total guardrails API invocations: Displays the overall count of times guardrails were invoked.
Average Guardrails API invocation latency: Shows the average response time for guardrail API calls, offering insights into system performance.
Total text unit utilization: Indicates the volume of text processed during guardrail invocations. For pricing of text units refer to Amazon Bedrock pricing page.
Invocations - with and without guardrail interventions: A pie chart representation showing the distribution of LLM invocations based on guardrail activity. It displays the count of invocations where no guardrail interventions occurred, those where guardrails intervened and detected policy violations, and those where guardrails intervened but found no violations.

These metrics help users evaluate guardrail effectiveness, track intervention patterns, and optimize configurations to ensure policy enforcement while maintaining system performance.

Guardrail policy types for interventions

This section provides a comprehensive view of guardrail policy interventions and their impact:

Interventions by Policy Type: Bar charts display the number of interventions applied to user inputs and model outputs, categorized by policy type (e.g., Contextual Grounding Policy, Word Policy, Content Policy, Sensitive Information Policy, Topic Policy).
Text Unit Utilization by Policy Type: Panels highlight the text units consumed by various policy interventions, separately for user inputs and model outputs.
Policy Usage Trends: A word cloud visualisation reveals the most frequently applied policy types, offering insights into intervention patterns.

By analyzing intervention counts, text unit usage, and policy trends, users can identify frequently triggered policies, optimize guardrail settings, and ensure LLM interactions align with compliance and safety requirements.

Prompt and response where guardrails intervened

This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding guardrail response. The text panel presents the prompt alongside the model's response after applying guardrail interventions. These interventions occur when input evaluation or model responses violate configured policies, leading to blocked or masked outputs.

The section also includes additional details to enhance visibility into how guardrails operate. It indicates whether a violation was detected, along with the violation type (e.g., GROUNDING, RELEVANCE) and the action taken (BLOCKED, NONE). For contextual grounding, the dashboard also shows the filter threshold, which defines the minimum confidence level required for a response to be considered valid, and the confidence score, which reflects how well the response aligns with the expected criteria.

By analyzing violations, actions taken, and confidence scores, users can adjust guardrail thresholds to balance blocking unsafe responses and allowing valid ones, ensuring optimal accuracy and compliance. This process is particularly crucial for detecting and mitigating hallucinations—instances where models generate information not grounded in source data. Implementing contextual grounding checks enables the identification of such ungrounded or irrelevant content, enhancing the reliability of applications like retrieval-augmented generation (RAG).

Guardrail invocation by guardrail policy

This section offers insights into the number of Guardrails API invocations, the overall latency, the total text units categorised by various guardrail policies (identified by guardrail ARN) and the policy versions.

Guardrail invocation by content source (Input & Output)

This section provides a detailed overview of critical metrics related to guardrail performance and usage. It includes the total number of guardrail invocations, the count of intervention invocations where policies were applied, the volume of text units consumed during these interventions for both user inputs and model outputs and the average guardrail API invocation latency.

These insights help users understand how guardrails operate across different policies and content sources. By analyzing invocation counts, latency, and text unit consumption, users can assess policy effectiveness, track intervention patterns, and optimize configurations. Evaluating how guardrails interact with user inputs and model outputs ensures consistent enforcement, helping refine thresholds and improve compliance strategies.

Configure SLOs and Alerts

To create an SLO for monitoring contextual grounding accuracy, define a custom query SLI where good events are model responses that meet contextual grounding criteria, ensuring factual accuracy and alignment with the provided reference.

A suitable query for tracking good events is:

gen_ai.prompt : "*qualifiers[\\\"grounding_source\\\"]*" and 
(gen_ai.compliance.violation_detected : false or 
not gen_ai.compliance.violation_detected : *)

The total query considers all relevant interactions having contextual grounding check is:

gen_ai.prompt : "*qualifiers[\\\"grounding_source\\\"]*"

Set an SLO target of 99.5%, ensuring that the vast majority of responses remain factually grounded. This helps detect hallucinations and misaligned outputs in real-time. By continuously monitoring contextual grounding accuracy, you can proactively address inconsistencies, retrain models, or refine RAG pipelines before inaccuracies impact end users.

Elastic's alerting capabilities enable proactive monitoring of key performance metrics. For instance, by setting up an alert on the average aws_bedrock.guardrails.invocation_latency with a 500ms threshold, you can promptly identify and address performance bottlenecks, ensuring that policy enforcement remains efficient without causing unexpected delays.

Conclusion

The Elastic Amazon Bedrock integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Amazon Bedrock including Guardrails. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.

If you haven’t already done so, read our previous blog on what you can do with the Amazon Bedrock integration, set up guardrails for your Bedrock models, and enable the Bedrock integration to start observing your Bedrock models and guardrails today!

LLM Observability with the new Amazon Bedrock Integration in Elastic Observability

Mon, 25 Nov 2024 00:00:00 GMT

As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Amazon Bedrock, while minimizing downtime and keeping costs in check.

Elastic is expanding support for LLM Observability with Elastic Observability's new Amazon Bedrock integration. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models from leading AI companies and from Amazon available through Amazon Bedrock. The new Amazon Bedrock Observability integration offers an out-of-the-box experience by simplifying the collection of Amazon Bedrock metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Amazon Bedrock.

This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.

Prerequisites

To follow along with this blog, please make sure you have:

An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.13 or higher.
An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.

Configuring Amazon Bedrock Logs Collection

To collect Amazon Bedrock logs, you can choose from the following options:

Amazon Simple Storage Service (Amazon S3) bucket
Amazon CloudWatch logs

S3 Bucket Logs Collection: When collecting logs from the Amazon S3 bucket, you can retrieve logs from Amazon S3 objects pointed to by Amazon S3 notification events, which are read from an SQS queue, or by directly polling a list of Amazon S3 objects in an Amazon S3 bucket. Refer to Elastic’s Custom AWS Logs integration for more details.

CloudWatch Logs Collection: In this option, you will need to create a CloudWatch log group. After creating the log group, be sure to note down the ARN of the newly created log group, as you will need it for the Amazon Bedrock settings configuration and Amazon Bedrock integration configuration for logs.

Configure the Amazon Bedrock CloudWatch logs with the Log group ARN to start collecting CloudWatch logs.

Please visit the AWS Console and navigate to the "Settings" section under Amazon Bedrock and select your preferred method of collecting logs. Based on the value you select from the Logging Destination in the Amazon Bedrock settings, you will need to enter either the Amazon S3 location or the CloudWatch log group ARN.

Configuring Amazon Bedrock Metrics Collection

Configure Elastic's Amazon Bedrock integration to collect Amazon Bedrock metrics from your chosen AWS region at the specified collection interval.

Maximize Visibility with Out-of-the-Box Dashboards

Amazon Bedrock integration offers rich out-of-the-box visibility into the performance and usage information of models in Amazon Bedrock, including text and image models. The Amazon Bedrock Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.

The Text / Chat metrics section in the Amazon Bedrock Overview dashboard provides insights into token usage for Text models in Amazon Bedrock. This includes use cases such as text content generation, summarization, translation, code generation, question answering, and sentiment analysis.

The Image metrics section in the Amazon Bedrock Overview dashboard offers valuable insights into the usage of Image models in Amazon Bedrock.

The Logs section of the Amazon Bedrock Overview dashboard in Elastic provides detailed insights into the usage and performance of LLM requests. It enables you to monitor key details such as model name, version, LLM prompt and response, usage tokens, request size, completion tokens, response size, and any error codes tied to specific LLM requests.

The detailed logs provide full visibility into raw model interactions, capturing both the inputs (prompts) and the outputs (responses) generated by the models. This transparency enables you to analyze and optimize how your LLM handles different requests, allowing for more precise fine-tuning of both the prompt structure and the resulting model responses. By closely monitoring these interactions, you can refine prompt strategies and enhance the quality and reliability of model outputs.

Amazon Bedrock Overview dashboard provides a comprehensive view of the initial and final response times. It includes a percentage comparison graph that highlights the performance differences between these response stages, enabling you to quickly identify efficiency improvements or potential bottlenecks in your LLM interactions.

Creating Alerts and SLOs to Monitor Amazon Bedrock

As with any Elastic integration, Amazon Bedrock logs and metrics are fully integrated into Elastic Observability, allowing you to leverage features like SLOs, alerting, custom dashboards, and detailed logs exploration.

To create an alert, for example to monitor LLM invocation latency in Amazon Bedrock, you can apply a Custom Threshold rule on the Amazon Bedrock datastream. Set the rule to trigger an alert when the LLM invocation latency exceeds a defined threshold. This ensures proactive monitoring of model performance, allowing you to detect and address latency issues before they impact the user experience.

When a violation occurs, the Alert Details view linked in the notification provides detailed context, including when the issue began, its current status, and any history of similar violations. This rich information enables rapid triaging, investigation, and root cause analysis to resolve issues efficiently.

Similarly, to create an SLO for monitoring Amazon Bedrock invocation performance for instance, you can define a custom query SLI where good events are those Amazon Bedrock invocations that do not result in client errors or server errors and have latency less than 10 seconds. Set an appropriate SLO target, such as 99%. This will help you identify errors and latency issues in applications using LLMs, allowing you to take timely corrective actions before they affect the overall user experience.

The image below highlights the SLOs, SLIs, and the remaining error budget for Amazon Bedrock models. The observed violations are a result of deliberately crafted long text generation prompts, which led to extended response times. This example demonstrates how the system tracks performance against defined targets, helping you quickly identify latency issues and performance bottlenecks. By monitoring these metrics, you gain valuable insights for proactive issue triaging, allowing for timely corrective actions and improved user experience of applications using LLM.

Try it out today

The Amazon Bedrock playgrounds provide a console environment to experiment with running inference on different models and configurations before deciding to use them in an application. Start your own 7-day free trial by signing up via AWS Marketplace and quickly spin up a deployment in minutes on any of the Elastic Cloud regions on AWS around the world.

Deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from AWS Marketplace then spin up the new technical preview of Amazon Bedrock integration, open the curated dashboards in Kibana and start monitoring your Amazon Bedrock service!

LLM Observability with Elastic’s Azure AI Foundry Integration

Fri, 25 Jul 2025 00:00:00 GMT

Introduction

As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM Observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Azure AI Foundry, while minimizing downtime and keeping costs in check.

Elastic is expanding support for LLM Observability with Elastic Observability's new Azure AI Foundry integration. This is now available as a tech preview on Elastic Cloud. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models, such as GPT-4, Mistral, Llama, and thousands of others from leading AI companies and from Azure available through Azure AI Foundry. The new Azure AI Foundry Integration in Elastic Observability integration offers an out-of-the-box experience by simplifying the collection of metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Azure AI Foundry.

Prerequisites

To get started with the Azure AI Foundry integration, you will need:

An account on Elastic Cloud and a deployed stack in Azure (see instructions here). Ensure you are using version 9.0.0 or higher.
An Azure account with permissions to pull the necessary data from Azure and Azure AI Foundry. See details in our documentation.

Configuring Azure AI Foundry Integration

To collect logs and metrics from Azure AI Foundry ensure you properly configure Azure logs and metrics from the following links:

Configure to receive Azure Metrics - This integration specifically collects Azure AI Foundry metrics which will come from the service, and ensure you have the client id, subscription id, and tenant id from Azure AI Foundry to collect metrics.
Configure to receive Azure Logs and more specifically ensure that you configure Azure event hub to properly allow Elastic to ingest logs. Once you have the Azure event hub information, you will need it to configure the logs section of the Azure AI Foundry Integration.

Maximize Visibility with Out-of-the-box dashboards

Azure AI Foundry integration offers rich out-of-the-box visibility into the performance and usage information of models in Azure AI Foundry, including text and image models. There are several dashboards currently available. More will be coming as the integration goes to GA.

Azure AI Foundry Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.
Azure AI Foundry Billing dashboard - which provides total costs and daily usage costs from Azure cognitive services.
Azure AI Foundry Advanced Monitoring - which focuses on logs generated by the Azure AI Foundry service when connected through the API Management Service. Provides request rate, error rate, model usage, latency, LLM prompt input, response completion.

Each dashboard provides specific insights important to SREs. Here is a quick overview of some of these insights:

Model Usage and Token Trends – Visualize token consumption and completion counts by model, endpoint, and time window.
Latency Metrics – Monitor average and percentile latency per prompt, per endpoint, and correlate with prompt types or user IDs.
Cost Estimation – Estimate API usage cost based on token consumption and model pricing.
Prompt/Completion Logging – View prompt-response pairs for debugging and quality monitoring.
Content Filtering and Guardrails – See which prompts or completions are being filtered, and why.

You can drill into specific users or sessions, slice by model type or region, and export reports for usage reviews or compliance.

Try it out today

The Azure AI Foundry Integration is currently available in Elastic Cloud (both serverless and hosted options). Sign up for a 7 day trial by signing up to Elastic Cloud directly or through Azure Marketplace. Alternatively you can also deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from Azure Marketplace then spin up the new technical preview of Azure AI Foundry integration, open the curated dashboards in Kibana and start monitoring your Azure AI Foundry service!

Optimizing Spend and Content Moderation on Azure OpenAI with Elastic

Tue, 13 May 2025 00:00:00 GMT

In a previous blog we showed you how to set up observability for your models hosted on Azure OpenAI using Elastic’s integration. We’ve expanded the integration to also include Azure OpenAI content filtering, and cost analysis for Azure OpenAI. If you previously onboarded the Azure OpenAI integration, just upgrade it and you will automatically get all new features we discuss in this blog. The enhanced integration now provides multiple dashboards including a general Azure OpenAI Overview, Azure Provisioned Throughput Unit dashboard, Azure Content filtering, and a dashboard for Azure OpenAI billing.

In this blog we will cover how to use Azure OpenAI Content Filtering and tracking Azure OpenAI usage costs. Let’s first review what these two capabilities from Azure OpenAI enable you to do:

Azure OpenAI Content Filtering: Enhancing AI Safety

Content filtering for Azure OpenAI plays a critical role in addressing AI safety challenges by helping to mitigate the risks associated with harmful or inappropriate content generated by AI models. By implementing robust content filtering mechanisms, organizations can proactively identify and filter out potentially harmful content, such as hate speech, misinformation, or violent imagery, before it is disseminated to users. This helps prevent the spread of harmful content and reduces the potential negative impact on individuals and communities.

Monitoring Azure OpenAI content filtering is essential for staying proactive in addressing emerging content moderation challenges. By closely monitoring the system, businesses can quickly detect any new types of harmful content or patterns of misuse that may arise. This enables organizations to stay ahead of potential content moderation issues and take timely action to protect their users and uphold their brand reputation.

Tracking Azure OpenAI Usage Costs

Monitoring Azure OpenAI model usage costs is crucial for managing budget and resource allocation effectively. By keeping track of usage costs, organizations can optimize their operations to avoid unnecessary expenses and ensure that they are getting the best value from their investment in AI technologies. Additionally, it helps in forecasting future expenses and aids in scaling resources according to the demand without compromising performance or incurring excessive costs. Effective monitoring also allows for transparency and accountability, enabling better decision-making in terms of AI deployment and utilization within Azure environments.

As we walk through this blog, we will provide you with prerequisites to set up and use the pre-configured dashboards for both of these capabilities, which are part of the Azure OpenAI integration.

Prerequisites

In order to follow along in this blog you will have to

Set up and install the Azure billing integration to monitor the usage costs. Once the integration is installed, you can track the usage in the enhanced Azure OpenAI Billing dashboard.
Additionally, make sure you have enabled the Azure API Management service to access the Azure OpenAI models.

How to Use Azure API Management with Azure OpenAI:

Provision an Azure OpenAI resource: Create an Azure OpenAI resource and select a model for your application.
Create an API Management instance: Establish an Azure API Management instance to manage the Azure OpenAI APIs.
Import the Azure OpenAI API: Import the Azure OpenAI API into your API Management instance using its OpenAPI specification.
Configure Policies: Implement policies in API Management to manage request authentication, rate limiting, traffic shaping, and more.

Steps to create a content filter for Azure OpenAI

Before you set up observability for the content filtering, ensure that you have configured the Azure content filtering for your model. Follow the steps below to create an Azure OpenAI content filtering,

Access the Azure OpenAI service console:
- Sign in to the Azure Console with the appropriate permissions and navigate to the Azure OpenAI service console.
Navigate to Safety + security:
- From the left-hand menu, select Safety + security.
Create a New Content filter:
- Select Create content filter.
- Configure various content filter policies including the following
  - Set input filter: Content will be annotated by category and blocked according to the threshold you set for prompts.
  - Set output filter: Content will be annotated by category and blocked according to the threshold you set for response output.
  - Blocklists: Define specific words or phrases to block.
  - Deployments: Apply filters to model deployments.
Review and Create:
- Review your settings and select Create to finalize the content filter configurations.

Customers can also configure content filters and create custom safety policies that are tailored to their use case requirements. The configurability feature allows customers to adjust the settings, separately for prompts and completions, to filter content for each content category at different severity levels.

Content filter types

The content filtering categories,
- (hate, sexual, violence, self-harm)
- Other optional classification models aimed at detecting jailbreak risk and known content for text and code.
Severity level within each content filter category,
- (low, medium, high)
- Content detected at the 'safe' severity level is labeled in annotations but isn't subject to filtering and isn't configurable.

Understanding the pre-configured dashboard for Azure OpenAI Content Filtering

Now that you have set up the filter, you can see what is being filtered in Elastic through the Azure OpenAI content filtering dashboard.

Navigate to the Dashboard Menu – Select the Dashboard menu option in Elastic and search for [Azure OpenAI] Content Filtering Overview to open the dashboard.
Navigate to the Integrations Menu – Open the Integrations menu in Elastic, select Azure OpenAI, go to the Assets tab, and choose [Azure OpenAI] Content Filtering Overview from the dashboard assets.

The Azure OpenAI Content Filtering Overview dashboard in the Elastic integration provides insights into blocked requests, API latency, error rates. This dashboard also provides detailed breakdown of content being filtered by the content filtering policy.

Content Filter overview

When the content filtering system detects harmful content, you receive either an error on the API call if the prompt was deemed inappropriate, or the finish_reason on the response will be content_filter to signify that some of the completion was filtered.

This can be summarized as,

Prompt filters: The prompt content that is classified in the filtered category will return HTTP 400 error.
Non-streaming completion: When the content is filtered, non-streaming completions calls won't return any content. In rare cases with longer responses, a partial result can be returned. In these cases, the finish_reason is updated.
Streaming completion: For streaming completions calls, segments are returned back to the user as they're completed. The service continues streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.

Prompt and response where content has been blocked

This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding completion response. The panel below gives a view on the responses after applying content filtering policy for prompts and completions.

You can use the following code snippet to start integrating your current prompt and settings into your application to test the content filter:

chat_prompt = [
   {
       "role": "user",
       "content": "How to kill a mocking bird?"
   }
]

After running the code, you can find the content being filtered by violence category with the severity level medium.

Content filtered by content source (Input & Output)

The content filtering system helps monitor and moderate different categories of content based on severity levels. The categories typically include things like adult content, offensive language, hate speech, violence, and more. The severity levels indicate the degree of sensitivity or potential harm associated with the content. This panel helps the user to effectively monitor and filter out inappropriate or harmful content to maintain a safe environment.

These metrics can be categorized into the following groups:

Blocked requests by category: Provides insights into the total blocked requests by category.
Severity distribution by categories: Monitors the blocked requests by categories and severity distribution. The severity distribution may be either low, medium or high.
Content filtered categories: Provides insights into the content filtered categories over time.

Reviewing the Azure OpenAI Billing dashboard

You can now look at what you are spending on Azure OpenAI.

Here is what you see on this dashboard:

Total costs: This measures the total usage cost across all the model deployments.
Overall Usage by model: This tracks the total usage costs broken down by model.
Daily usage: Monitors usage costs on a daily basis.
Daily usage costs by model: Monitors daily usage costs broken down by model deployments.

Conclusion

The Azure OpenAI integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Azure OpenAI along with content filtered responses. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.

Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!

LLM Observability with Elastic: Azure OpenAI Part 2

Fri, 23 Aug 2024 00:00:00 GMT

We recently announced GA of the Azure OpenAI integration. You can find details in our previous blog LLM Observability: Azure OpenAI.

Since then, we have added further capabilities to the Azure OpenAI GA package, which now offer prompt and response monitoring, PTU deployment performance tracking, and billing insights. Read on to learn more!

Advanced Logging and Monitoring

The initial GA release of the integration focused mainly on the native logs, to track the telemetry of the service by using cognitive services logging. This version of the Azure OpenAI integration allows you to process the advanced logs which gives a more holistic view of OpenAI resource usage.

To achieve this, you have to setup API Management services in Azure. The API Management service is a centralized place where you can put all OpenAI services endpoints to manage all of them end-to-end. Enable the API Management services and configure the Azure event hub to stream the logs.

To learn more about setting up the API Management service to access Azure OpenAI, please refer to the Azure documentation.

By using advanced logging, you can collect the following log data:

Request input text
Response output text
Content filter results
Usage Information
- Input prompt tokens
- Output completion tokens
- Total tokens

Azure OpenAI integration now collects the API Management Gateway logs. When a question from the user goes to the API Management, it logs the questions and the responses from the GPT models.

Here’s what a sample log looks like,

Content filtered results

Azure OpenAI’s content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. With Azure OpenAI model deployments, you can use the default content filter or create your own content filter.

Now, The integration collects the content filtered result logs. In this example let's create a custom filter in the Azure OpenAI Studio that generates an error log.

By leveraging the Azure Content Filters, you can create your own custom lists of terms or phrases to block or flag.

And the document ingested in Elastic would look like this: This screenshot provides insights into the content filtered request.

PTU Deployment Monitoring

Provisioned throughput units (PTU) are units of model processing capacity that you can reserve and deploy for processing prompts and generating completions.

The curated dashboard for PTU Deployment gives comprehensive visibility into metrics such as request latency, active token usage, PTU utilization, and fine-tuning activities, offering a quick snapshot of your deployment's health and performance.

Here are the essential PTU metrics captured by default:

Time to Response: Time taken for the first response to appear after a user send a prompt.
Active Tokens: Use this metric to understand your TPS or TPM based utilization for PTUs and compare to the benchmarks for target TPS or TPM scenarios.
Provision-managed Utilization V2: Provides insights into utilization percentages, helping prevent overuse and ensuring efficient resource allocation.
Prompt Token Cache Match Rate: The prompt token cache hit ratio expressed as a percentage.

Using Billing for cost

Using the curated overview dashboard you can now monitor the actual usage cost for the AI applications. You are one step away from processing the billing information.

You need to configure and install the Azure billing metrics integration. Once the installation is complete the usage cost is visualized for the cognitive services in the Azure OpenAI overview dashboard.

Try it out today

Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!

LLM Observability: Azure OpenAI

Mon, 24 Jun 2024 00:00:00 GMT

We are excited to announce the general availability of the Azure OpenAI Integration that provides comprehensive Observability into the performance and usage of the Azure OpenAI Service! Also look at Part 2 of this blog

While we have offered visibility into LLM environments for a while now, the addition of our Azure OpenAI integration enables richer out-of-the-box visibility into the performance and usage of your Azure OpenAI based applications, further enhancing LLM Observability.

The Azure OpenAI integration leverages Elastic Agent’s Azure integration capabilities to collect both logs (using Azure EventHub) and metrics (using Azure Monitor) to provide deep visibility on the usage of the Azure OpenAI Service.

The integration includes an out-of-the-box dashboard that summarizes the most relevant aspects of the service usage, including request and error rates, token usage and chat completion latency.

Creating Alerts and SLOs to monitor Azure OpenAI

As with every other Elastic integration, all the logs and metrics information is fully available to leverage in every capability in Elastic Observability, including SLOs, alerting, custom dashboards, in-depth logs exploration, etc.

To create an alert to monitor token usage, for example, start with the Custom Threshold rule on the Azure OpenAI datastream and set an aggregation condition to track and report violations of token usage past a certain threshold.

When a violation occurs, the Alert Details view linked in the alert notification for that alert provides rich context surrounding the violation, such as when the violation started, its current status, and any previous history of such violations, enabling quick triaging, investigation and root cause analysis.

Similarly, to create an SLO to monitor error rates in Azure OpenAI calls, start with the custom query SLI definition adding in the good events to be any result signature at or above 400 over a total value that includes all responses. Then, by setting an appropriate SLO target such as 99%, start monitoring your Azure OpenAI error rate SLO over a period of 7, 30, or 90 days to track degradation and take action before it becomes a pervasive problem.

Please refer to the User Guide to learn more and to get started!

End to end LLM observability with Elastic: seeing into the opaque world of generative AI applications

Wed, 02 Apr 2025 00:00:00 GMT

In the ever-evolving landscape of artificial intelligence, Large Language Models (LLMs) stand as beacons of innovation, offering unprecedented capabilities across industries. From generating human-like text and translating languages to providing personalized customer interactions, the possibilities with LLMs are vast and increasingly indispensable. Enterprises are deploying these models for everything, from automating customer support systems to enhancing creative writing processes. Imagine a virtual assistant not only answering questions but also drafting business proposals or a customer service bot that understands and responds with empathy—all powered by LLMs. However, with great power comes the need for great oversight.

Despite the transformative potential, LLMs introduce complex challenges that necessitate a new level of observability as LLMs are notoriously opaque. Enter LLM observability: a crucial component in the lifecycle management of LLMs. This aspect becomes vital for Service Reliability Engineers (SREs) and other key stakeholders tasked with ensuring seamless, error-free operations, cost control, and minimizing the risks associated with the unpredictable nature of LLM generated responses. SREs need insights into performance metrics, error frequencies, latency issues, the cost implications of running these sophisticated models, and the prompt and response exchange with the model. Traditional monitoring tools fall short in this high-stakes environment; what’s needed is a nuanced approach to address the unique observability demands that LLMs introduce.

Elastic's LLM Observability Capabilities Address These Challenges

With Elastic’s end-to-end LLM observability you can cover a wide range of use cases. To achieve this, you can onboard two types of integrations - API-based logs and metrics and via APM instrumentation. Depending on your use case, you can also choose to use of the LLM integrations.

High level overview: via API-based logs and metrics. Monitoring LLM services from providers by ingesting a curated set of service metrics and logs like latency, invocation frequency, tokens, errors, and prompts and responses. Each LLM integration comes with out-of-the-box dashboards.
Troubleshooting applications: via APM instrumentation. Fully OTel-native tracing and auto-instrumentation for LLM-based applications through Elastic Distributions of OpenTelemetry (EDOT). Additionally, you can use third party libraries (Langtrace, OpenLit, OpenLLMetry) together with Elastic to extend the coverage to additional LLM-related technologies.

High level overview: LLM Observability for Leading Providers

Elastic offers tailored API-based integrations for four major LLM hosting providers:

Azure OpenAI
OpenAI
Amazon Bedrock
Google Vertex AI

These integrations bring a curated set of logs and metrics collection tailored to each provider. What this means for SREs is straightforward access to pre-configured dashboards that highlight the prompts and responses, usage patterns, performance metrics, and cost details across different models and providers.

For instance, SREs keen on identifying which LLM generates the most errors or insights about the models in terms of latency, cost, or usage frequency can leverage these integrations. Imagine having the capability to instantly visualize which LLM is slowing down processes or incurring high costs, thus enabling data-driven decisions to optimize operations.

Troubleshooting applications: Tracing and Auto-Instrumentation of OpenAI, Amazon Bedrock and Google Vertex AI models

Elastic supports OTLP tracing capabilities in EDOT for applications using OpenAI models and models hosted on Amazon Bedrock and Google Vertex AI. In addition, Elastic also supports LLM tracing from third party libraries (Langtrace, OpenLIT, OpenLLMetry).

Tracing offers a comprehensive map of an application's request flow, pinpointing granular details about each call within the system. For each transaction and span of a request, tracing shows critical information such as specific models utilized, request duration, errors encountered, tokens used per request, and the prompts and responses between the LLM.

Tracing helps SREs troubleshoot performance issues with applications developed in languages like Python, Node.js and Java." If an SRE needs to investigate latency or error issues, LLM tracing provides a zoomed-in view into the request lifecycle and allows for profound insights into whether a delay is application-specific, model-specific or systemic across deployments.

Use Cases: Bringing Elastic's Observability Features to Life

Let’s explore some practical scenarios where Elastic’s observability tools shine:

1. Understanding LLM Performance and Reliability

An SRE team looking to optimize a customer support system powered by Azure OpenAI can utilize Elastic’s Azure OpenAI integration to quickly ascertain which model variants incur higher latency or error rates. This enhances decision-making regarding model deployment or even switching providers based on performance metrics.

Similarly SREs can also use in parallel integrations for Google Vertex AI, Amazon Bedrock, and OpenAI for other applications using models hosted on these providers.

2. Troubleshooting OpenAI-Powered Applications

Consider an enterprise utilizing an OpenAI model for real-time user interactions. Encountering unexplained delays, an SRE can use OpenAI tracing to dissect the transaction pathway, identifying if one specific API call or model invocation is the bottleneck. The SRE can also check the out-of-the-box OpenAI integration dashboard to verify if the latency is only affecting this application or all model invocations across the organization.

An engineer troubleshooting the LLM-based application can also check to see what were the prompt and response exchanges with the LLM during this request so they can rule out possible impact on performance due to the input.

3. Addressing Cost and Usage Concerns

SREs are generally acutely aware of which LLM configurations are less cost-effective than required. Elastic’s integration dashboards, pre-configured to display model usage patterns, help mitigate unnecessary spending effectively. You can find out-of-the box dashboards for Azure OpenAI, OpenAI, Amazon Bedrock, and Google VertexAI models. These dashboards show key cost and usage information such as total invocations and tokens, as well as time series breakdown by model and endpoint. In addition, some integrations show more advanced usage information such as provisioned throughput units (PTU) as well as billing cost.

4. Understanding LLM Compliance

With the Elastic Amazon Bedrock integration for Guardrails, and Azure OpenAI integration for content filtering, SREs can swiftly address security concerns, like verifying if certain user interactions prompt policy violations. Elastic's observability logs clarify whether guardrails rightly blocked potentially harmful responses, bolstering compliance assurance.

Conclusion

As LLMs continue to revolutionize the capabilities of modern applications, the role of observability becomes increasingly paramount. Elastic’s comprehensive observability framework empowers enterprises to harness the full potential of LLMs while maintaining robust operational insight and control. The integration with prominent LLM hosting providers and advanced tracing for OpenAI, Amazon Bedrock and Google Vertex AI models, equips SREs with the necessary arsenal to navigate the complex landscape of LLM-driven applications, ensuring they remain safe, reliable, efficient, and cost-effective.

In this new era of AI, balancing innovation with observability isn't just beneficial—it's essential. Whether optimizing performance, troubleshooting intricacies, or managing costs and compliance, Elastic stands at the forefront, ensuring your LLM journey is as seamless as it is groundbreaking.

LLM observability: track usage and manage costs with Elastic's OpenAI integration

Tue, 11 Mar 2025 00:00:00 GMT

In an era where AI-driven applications are becoming ubiquitous, understanding and managing the usage of language models is crucial. OpenAI has been at the forefront of developing advanced language models that power a multitude of applications, from chatbots to code generation. However, as applications grow in complexity and scale, observing crucial metrics that ensure optimal performance and cost-effectiveness becomes essential. Specific needs arise in areas such as performance and reliability monitoring, and cost management, which are pivotal for maximizing the potential of language models.

As organizations adopt OpenAI's diverse AI models, including language models like GPT-4o and GPT-3.5 Turbo, image models like DALL·E, and audio models like Whisper, comprehensive usage monitoring is crucial to track and optimize performance, reliability, usage and cost of each model.

Elastic's new OpenAI integration offers a solution to the challenges faced by developers and businesses using these models. It is designed to provide a unified view of your OpenAI usage across all model types.

Key benefits of the OpenAI integration

OpenAI's usage-based pricing model applies across all these services, making it essential to track consumption and identify which models are being used to control costs and optimize deployments. The new OpenAI integration by Elastic utilizes the OpenAI Usage API to track consumption and identify specific models being used. It offers an out-of-the-box experience with pre-built dashboards, simplifying the process of monitoring your usage patterns.

Continue reading to learn about what you will get with the integration. We'll also show you the setup process, how to leverage the pre-built dashboards, and what insights you can gain from Elastic for LLM Observability.

Setting up the OpenAI Integration

Prerequisites

To follow along with this blog, you will need:

An Elastic cloud account (version 8.16.3 or higher). Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
An OpenAI account with an Admin API key.
Applications that use the OpenAI APIs.

Generating sample OpenAI usage data

If you're new to OpenAI and eager to try this integration, you can quickly set it up and populate your dashboards with sample data. You'll just need to generate some usage by interacting with the OpenAI API. If you don't have an OpenAI API key, you can create one here. For more information on authentication, refer to the OpenAI documentation.

The OpenAI documentation provides detailed examples for each of their API endpoints. Here are direct links to the relevant sections for generating sample usage data:

Language models (completions): Use the Chat Completions API to generate text. See the examples here.
Audio models (text-to-speech): Generate audio from text using the Speech API. See the examples here.
Audio models (speech-to-text): Transcribe audio to text using the Transcriptions API. See the examples here.
Embeddings: Generate vector representations of text using the Embeddings API. See the examples here.
Image models: Create images from text prompts using the Image Generation API. See the examples here.
Moderation: Check the contents with Moderation API. See the examples here.

There are more endpoints that you can explore to generate sample usage data.

After running these examples (using your API key), remember that the OpenAI Usage API has a delay. It may take some time (usually a few minutes) for the usage data to appear in your dashboard.

Configuration

To connect the OpenAI integration to your OpenAI account, you'll need your OpenAI's Admin API key. The integration will use this key to periodically retrieve usage data from the OpenAI Usage API.

The integration supports eight distinct data streams, corresponding to different categories of OpenAI API usage:

Audio speeches (text-to-speech)
Audio transcriptions (speech-to-text)
Code interpreter sessions
Completions (language models)
Embeddings
Images
Moderations
Vector stores

By default, all data streams are enabled. However, you can disable any data streams that are not relevant to your usage. All enabled data streams are visualized in a single, comprehensive dashboard, providing a unified view of your usage.

For advanced users, the integration offers additional configuration options, including setting the bucket width and initial interval. These options are documented in detail in the official integration documentation.

Maximize visibility with the out-of-the-box dashboard

You can access the OpenAI dashboard in two ways:

Navigate to the Dashboards menu in the left side panel and search for "OpenAI". In the search results select [Metrics OpenAI] OpenAI Usage Overview to open the dashboard.
Alternatively, navigate to the Integrations Menu — Open the Integrations menu under the Management section in Elastic, select OpenAI, go to the Assets tab, and choose [Metrics OpenAI] OpenAI Usage Overview from the dashboards assets.

Understanding the pre-configured dashboard for OpenAI

The pre-built dashboard provides a structured view of OpenAI's API consumption, displaying key metrics such as token usage, API call distribution, and model-wise invocation counts. It highlights top-performing projects, users, and API keys, along with breakdowns of image generation, audio transcription, and text-to-speech usage. By analyzing these insights, users can track usage patterns, and optimize AI-driven applications.

OpenAI usage metrics overview

This dashboard section shows key usage metrics from OpenAI, including invocation rates, token usage, and the top-performing models. It also highlights the total number of invocations and tokens and the invocation count by object type. Understanding these insights can help users optimize model usage, reduce costs, and enhance efficiency when integrating AI models into their applications.

Top performing Project, User, and API Key IDs

Here, you can analyze the top Project IDs, User IDs, and API Key IDs based on invocation counts. This data provides valuable insights to help organizations track usage patterns across different projects and applications.

Token metrics

In this dashboard section you can see token usage trends across various models. This can help you analyze trends across input types (e.g., audio, embeddings, moderations), output types (e.g., audio), and input cached tokens. This information can help developers fine-tune their prompts and optimize token consumption.

Image generation metrics

AI-generated images are becoming increasingly popular across industries. This section provides an overview of image generation metrics, including invocation rates by model and the most common output dimensions. These insights help assess invocation costs and analyze image generation usage.

Audio transcription metrics

OpenAI's AI-powered transcription services make speech-to-text conversion easier than ever. This section tracks audio transcription metrics, including invocation rates and total transcribed seconds per model. Understanding these trends can help businesses optimize costs when building audio transcription-based applications.

Audio speech metrics

OpenAI's text-to-speech (TTS) models deliver realistic voice synthesis for applications such as accessibility tools and virtual assistants. This section explores TTS invocation rates and the number of characters synthesized per model, offering insights into the adoption of AI-driven voice synthesis.

Creating Alerts and SLOs to monitor OpenAI

To proactively manage your OpenAI token usage and avoid unexpected costs, create a custom threshold rule in Observability Alerts.

Example: Target the relevant data stream, and configure the rule to sum the related tokens field (along with other token-related fields, if applicable). Set a threshold representing your desired usage limit, and the alert will notify you if this limit is exceeded within a specified timeframe, such as daily or hourly.

When an alert condition is met, the Alert Details view linked in the alert notification for that alert provides detailed insights surrounding the violation, such as when the violation started, its current status, and any previous history of similar violations, enabling proactive issue resolution, and improving system resilience.

Example: To create an SLO that monitors model distribution in OpenAI, start by defining a custom metric SLI definition, adding good events where openai.base.model contains gpt-3.5* and total events encompassing all OpenAI requests, grouped by openai.base.project_id and openai.base.user_id. Then, set an appropriate SLO target such as 80% and monitor this over a 7-day rolling window to identify projects and users that may be overusing more expensive models.

You can now track the distribution of requests across different OpenAI models by project and user. This example demonstrates how Elastic's OpenAI integration helps you optimize costs. By monitoring the percentage of requests handled by cost-efficient GPT-3.5 models — the SLI — against the 80% target (part of the SLO), you can quickly identify which specific projects or users are driving up costs through excessive usage of models like GPT-4-turbo, GPT-4o, etc. This visibility enables targeted optimization strategies, ensuring your AI initiatives remain cost-effective while still leveraging advanced capabilities.

Conclusion, next steps and further reading

You now know how Elastic's OpenAI integration provides an essential tool for anyone relying on OpenAI's models to power their applications. By offering a comprehensive and customizable dashboard, this integration empowers SREs and developers to effectively monitor performance, manage costs, and optimize your AI systems effortlessly. Now, it's your turn to onboard this application following the instructions in this blog and start monitoring your OpenAI usage! We'd love to hear from you on how you get on and always welcome ideas for enhancements.

To learn how to set up Application Performance Monitoring (APM) tracing of OpenAI-powered applications, read this blog. For further reading and more LLM observability use cases, explore Elastic's observability lab blogs here.

Automating User Journeys for Synthetic Monitoring with MCP in Elastic

Wed, 17 Sep 2025 00:00:00 GMT

Synthetic Monitoring in Elastic Observability enables you to track user pathways using a global testing infrastructure, emulating the full user path to measure the impact of web applications. It also provides comprehensive insight into your website's performance, functionality, and availability from development to production, allowing you to identify and resolve issues before they affect your customers.

One of the main components of Elastic's Synthetic Monitoring is the ability to create user journeys, which can be done with or without code. There is a Synthetics agent,, a CLI tool that guides you through the process of creating both heartbeat monitors and user journeys and deploying your code to Elastic Observability. If you are using code to create user journeys, you are using Playwright under the hood with some additional configuration to make it easier to work with Elastic Observability.

To automatically create user journeys using TypeScript, you can create Playwright tests based on a prompt using Warp, an AI-assisted terminal, Gemini 2.5 Pro, and MCP. This application was built using Python and FastMCP, which wraps the synthetic agent to deploy browser tests to Elastic automatically. This blog post will guide you through how the application works, how to use it, and its development process. You can find the complete code on GitHub.

Solution overview

Currently, this solution is set up to run inside Warp as an MCP server; however, you can also use another client, such as Claude Desktop or Cursor. From there, you create a Python script using FastMCP, which allows you to create functions that are callable by an LLM. Within Warp, you can make a configuration file in JSON that enables you to point to your Python script and pass in all the environment variables you are working with. From there, you'll want to toggle agent mode and ask a question about creating synthetic testing or call the MCP function directly. There are many options for which LLM you can select, be sure to check out Warp's documentation to learn more about the options available.

After that, you should ask a question about creating synthetic testing or call the MCP function you are looking for. The following three functions can be used:

diagnose_warp_mcp_config Used for debugging environment variable issues that may arise. This function likely won't be needed unless there is an issue with your configuration.
create_and_deploy_browser_test Will automatically create Playwright tests if given the test name, the URL you want to test, and a schedule. This approach uses a template-based method, rather than a machine learning-based method, and all the tests it outputs will appear similar.
llm_create_and_deploy_test_from_prompt Similar to create_and_deploy_browser_test, but the main difference is that it uses an LLM to create tests based on a prompt you give it. The tests should reflect the prompt you provided. To run this function you'll provide a test name, URL, prompt, and schedule.

Why create this solution as an MCP server?

The reason this was developed as an MCP server, as opposed to just a standalone script or a standard CLI, is that it can be structured and interacted with in a more conversational manner. It enables an LLM to generate dynamic Playwright testing while maintaining consistent arguments, environment variables, and responses to ensure accuracy and reliability. Thus, it becomes a reliable workflow that other agents or developers can compose with additional tools. In other words, the MCP layer turns your LLM-based test authoring into a standardized, reusable capability instead of a one-off script. To learn more about the direction of MCP, be sure to check out our article on the topic.

Implementation considerations

When creating a solution like this one, one thing to be mindful of is your use of tokens. An early version of this solution took approximately twenty minutes to create synthetic tests and ultimately led to severe rate-limiting.

Another issue faced during the building process was striking a balance between creating a template that facilitates the creation of a Playwright script and having an LLM create Playwright scripts based on prompts that didn't feel cookie-cutter. While using a more LLM approach an issue faced was that the scripts often didn't work or were based on parameters that didn't exist and a more templated approach was more reliable but felt repetitive. The final version of this solution attempted to balance this by using elements of the template while adjusting the LLM parameter of temperature, which controls the randomness or creativity of a large language model's output.

While testing this solution, a failing test also emerged that required navigating past a pop-up. In more complex cases, this may serve as a building block that requires additional domain knowledge to create a complete passing Playwright test.

How to get started

Prerequisites

The version of Python that is used is Python 3.12.1 but you can use any version of Python higher than 3.10.
This application uses Elastic Observability version 9.1.2, but you can use any version of Elastics Observability that is higher than 8.10. You can also use Elastic Cloud Serverless as well.
You will also need an OpenAI API key to use the LLM capabilities of this application. You will want to configure an environment variable for your OpenAI API Key, which you can find on the API keys page in OpenAI's developer portal.

Step 1: Install the packages and clone the repository

In order for this MCP server to run locally you will need to install the the following packages:

pip install fastmcp openai
npm install -g playwright @elastic/synthetics

You will use FastMCP 2.0 to create the MCP server, and OpenAI to generate tests based on prompts that you provide. Additionally, you will want to clone the repository to obtain a local copy of the server.

Step 2: Set up a configuration file in Warp

Inside of Warp, you will want to go to the side panel, where it says MCP servers and where it says “add”.

After that, you will be prompted to add a JSON configuration file that should resemble the following. Be sure to add your own Kibana URL, update the correct path, and include your own keys and tokens.

{
 "elastic-synthetics": {
   "command": "python",
   "args": ["elastic_synthetics_server.py"],
   "env": {
     "PYTHONPATH": ".",
     "ELASTIC_KIBANA_URL": "https://your-kibana-url.elastic-cloud.com",
     "ELASTIC_API_KEY": "your-api-key-here",
     "ELASTIC_PROJECT_ID": "mcp-synthetics-demo",
     "ELASTIC_SPACE": "default",
     "ELASTIC_AUTO_PUSH": "true",
     "ELASTIC_USE_JAVASCRIPT": "false",
     "ELASTIC_INSTALL_DEPENDENCIES": "true",
     "OPENAI_API_KEY": "sk-your-openai-key",
     "LLM_MODEL": "gpt-4o"
   },
   "working_directory": "/path/to/your/file",
   "start_on_launch": true 
   }
}

Step 3: Ask a question or call the tools directly

Now that you've set up locally, you will want to toggle agent mode and select the LLM you wish to use. The reason why Gemini-Pro-2.5 was chosen for this blog post is that it provides a straightforward answer, while other LLMs selected returned a very lengthy response.

To start using the MCP tools, from your MCP server, you can ask a question that contains the test name, URL, prompt, and schedule.

You can also call the directly by typing llm_create_and_deploy_test_from_prompt() and the program will prompt you for the relevant details:

Inside Kibana, you should see your monitor listed if you click under Applications and select Monitors listed under Synthetics. You can also find a link to your monitor in the response of your MCP tool.

What's Going On Inside

This code sample consists of three primary functions, which are MCP tools that you can call from your MCP client, including diagnose_warp_mcp_config, create_and_deploy_browser_test and llm_create_and_deploy_test_from_prompt.

Debugging environment issues

There were various issues that came up while creating this application around environment variable loading, so there was a need to create an MCP that could be called depending on errors that may be present.

The tool diagnose_warp_mcp_config kicks off with a decorator @mcp.tool() which allows it to be called and listed in the list of available tools. This tool is designed to help debug issues with Elastic-specific environment variables for troubleshooting purposes. First, it loads in the environment variables and looks for the Elastic specific variables, after it does some security masking so it doesn't show any variables and hides sensitive information like API keys in the output, showing only the first eight characters followed by "...". This tool determines if the minimum required credentials (Kibana URL and API Key) are present to proceed with deployment and provides a report letting you know to address any issues that may exist.

@mcp.tool()
def diagnose_warp_mcp_config() -> Dict[str, Any]:
   """Diagnose Warp MCP environment configuration for Elastic Synthetics"""
   try:
       env_vars = load_env_from_warp_mcp()
      
       # Check for required variables
       kibana_url = env_vars.get('ELASTIC_KIBANA_URL') or env_vars.get('KIBANA_URL')
       api_key = env_vars.get('ELASTIC_API_KEY') or env_vars.get('API_KEY')
       project_id = env_vars.get('ELASTIC_PROJECT_ID') or env_vars.get('PROJECT_ID')
       space = env_vars.get('ELASTIC_SPACE') or env_vars.get('SPACE', 'default')
      
       # Mask sensitive values for display
       masked_vars = {}
       for key, value in env_vars.items():
           if 'API_KEY' in key or 'TOKEN' in key:
               masked_vars[key] = f"{value[:8]}..." if value and len(value) > 8 else "***"
           else:
               masked_vars[key] = value
      
       deployment_ready = bool(kibana_url and api_key)
      
       return safe_json_response({
           "status": "success",
           "environment_variables": masked_vars,
           "required_check": {
               "kibana_url": bool(kibana_url),
               "api_key": bool(api_key),
               "project_id": bool(project_id),
               "space": bool(space)
           },
           "deployment_ready": deployment_ready,
           "recommendations": [
               "Environment variables detected" if env_vars else "No environment variables found",
               "Kibana URL configured" if kibana_url else "Missing ELASTIC_KIBANA_URL or KIBANA_URL",
               "API Key configured" if api_key else "Missing ELASTIC_API_KEY or API_KEY",
               "Ready for deployment" if deployment_ready else "Missing required credentials"
           ]
       })
      
   except Exception as e:
       return safe_json_response({
           "status": "error",
           "error": str(e),
           "error_type": type(e).__name__
       })

Creating synthetic tests based on a template

While developing this solution to generate tests based on a prompt, the process wasn't always smooth. Early versions encountered issues with accuracy, hallucinations, and the creation of loops. To make progress, a version that relied on creating a test template to verify the mechanics of the solution, such as whether the test could pass and be deployed to Elastic correctly, was a logical next step.

This solution automates the entire process of creating a synthetic browser test that will regularly check if a website is working correctly, then deploys it to Elastic Observability Synthetics. Similar to diagnose_warp_mcp_config, the MCP tool create_and_deploy_browser_test starts with the decorator @mcp.tool() and checks to make sure that the proper environment variables are loaded.

From there, it creates a TypeScript test file that is based on templates and generates dynamic test steps based on the target website's characteristics, including navigating to the website, verifying the page title exists, checking page load performance, taking a screenshot, verifying page content is visible, and finally saves the test file in a synthetic_tests directory.

Finally, it wraps Elastic's CLI tool @elastic/synthetics to push the test to Kibana, allowing you to set which geographic locations to run tests from, how often to run the test, and the project and workspace settings.

You check out the full code for this MCP tool here.

Creating synthetic tests based on a prompt

While creating browser tests based on a templated approach is a good starting point, it felt generic and cookie-cutter. But it made a helpful structure to build an LLM-based function on top of.

The MCP tool llm_create_and_deploy_test_from_prompt begins by ensuring that basic parameters, including locations, schedule, and directories, are listed. Additionally, it aims to learn more about the target website to inform the AI and initialize the OpenAI client and model, which is GPT-4o.

After setting up the LLM, it converts natural language requests into actual Playwright test code, then cleans and validates the AI-generated code to prevent issues like injection attacks or malformed syntax. It draws inspiration from the templated approach, wrapping AI-generated steps within a proven, reliable test framework template. Finally, it deploys the test to Elastic in a similar manner to the previous tool.

You can find the code for this tool here.

Conclusion and next steps

Synthetic monitoring in Elastic Observability makes it easy to test complete user journeys and keep your site reliable, with simple setup and a Playwright integration. A tool like this can provide a starting point for tests that you can iterate on after.

A solution like this is just the start of an MCP implementation that automatically generates Playwright tests for you and can be expanded in the future to include heartbeat monitors, utilize the Playwright MCP server, or consider experimenting with Claude for Chrome to create synthetic testing.

Check out more articles on Observability Labs on Synthetic Monitoring

Monitor dbt pipelines with Elastic Observability

Fri, 26 Jul 2024 00:00:00 GMT

In the Data Analytics team within the Observability organization in Elastic, we use dbt (dbt™, data build tool) to execute our SQL data transformation pipelines. dbt is a SQL-first transformation workflow that lets teams quickly and collaboratively deploy analytics code. In particular, we use dbt core, the open-source project, where you can develop from the command line and run your dbt project.

Our data transformation pipelines run daily and process the data that feed our internal dashboards, reports, analyses, and Machine Learning (ML) models.

There have been incidents in the past when the pipelines have failed, the source tables contained wrong data or we have introduced a change into our SQL code that has caused data quality issues, and we only realized once we saw it in a weekly report that was showing an anomalous number of records. That’s why we have built a monitoring system that proactively alerts us about these types of incidents as soon as they happen and helps us with visualizations and analyses to understand their root cause, saving us several hours or days of manual investigations.

We have leveraged our own Observability Solution to help solve this challenge, monitoring the entire lifecycle of our dbt implementation. This setup enables us to track the behavior of our models and conduct data quality testing on the final tables. We export dbt process logs from run jobs and tests into Elasticsearch and utilize Kibana to create dashboards, set up alerts, and configure Machine Learning jobs to monitor and assess issues.

The following diagram shows our complete architecture. In a follow-up article, we’ll also cover how we observe our python data processing and ML model processes using OTEL and Elastic - stay tuned.

Why monitor dbt pipelines with Elastic?

With every invocation, dbt generates and saves one or more JSON files called artifacts containing log data on the invocation results. dbt run and dbt test invocation logs are stored in the file run_results.json, as per the dbt documentation:

This file contains information about a completed invocation of dbt, including timing and status info for each node (model, test, etc) that was executed. In aggregate, many run_results.json can be combined to calculate average model runtime, test failure rates, the number of record changes captured by snapshots, etc.

Monitoring dbt run invocation logs can help solve several issues, including tracking and alerting about table volumes, detecting excessive slot time from resource-intensive models, identifying cost spikes due to slot time or volume, and pinpointing slow execution times that may indicate scheduling issues. This system was crucial when we merged a PR with a change in our code that had an issue, producing a sudden drop in the number of daily rows in upstream Table A. By ingesting the dbt run logs into Elastic, our anomaly detection job quickly identified anomalies in the daily row counts for Table A and its downstream tables, B, C, and D. The Data Analytics team received an alert notification about the issue, allowing us to promptly troubleshoot, fix and backfill the tables before it affected the weekly dashboards and downstream ML models.

Monitoring dbt test invocation logs can also address several issues, such as identifying duplicates in tables, detecting unnoticed alterations in allowed values for specific fields through validation of all enum fields, and resolving various other data processing and quality concerns. With dashboards and alerts on data quality tests, we proactively identify issues like duplicate keys, unexpected category values, and increased nulls, ensuring data integrity. In our team, we had an issue where a change in one of our raw lookup tables produced duplicated rows in our user table, doubling the number of users reported. By ingesting the dbt test logs into Elastic, our rules detected that some duplicate tests had failed. The team received an alert notification about the issue, allowing us to troubleshoot it right away by finding the upstream table that was the root cause. These duplicates meant that downstream tables had to process 2x the amount of data, creating a spike in the bytes processed and slot time. The anomaly detection and alerts on the dbt run logs also helped us spot these spikes for individual tables and allowed us to quantify the impact on our billing.

Processing our dbt logs with Elastic and Kibana allows us to obtain real-time insights, helps us quickly troubleshoot potential issues, and keeps our data transformation processes running smoothly. We set up anomaly detection jobs and alerts in Kibana to monitor the number of rows processed by dbt, the slot time, and the results of the tests. This lets us catch real-time incidents, and by promptly identifying and fixing these issues, Elastic makes our data pipeline more resilient and our models more cost-effective, helping us stay on top of cost spikes or data quality issues.

We can also correlate this information with other events ingested into Elastic, for example using the Elastic Github connector, we can correlate data quality test failures or other anomalies with code changes to find the root cause of the commit or PR that caused the issues. By ingesting application logs into Elastic, we can also analyze if these issues in our pipelines have affected downstream applications, increasing latency, throughput or error rates using APM. Ingesting billing, revenue data or web traffic, we could also see the impact in business metrics.

How to export dbt invocation logs to Elasticsearch

We use the Python Elasticsearch client to send the dbt invocation logs to Elastic after we run our dbt run and dbt test processes daily in production. The setup just requires you to install the Elasticsearch Python client and obtain your Elastic Cloud ID (go to https://cloud.elastic.co/deployments/, select your deployment and find the Cloud ID) and Elastic Cloud API Key (following this guide)

This python helper function will index the results from your run_results.json file to the specified index. You just need to export the variables to the environment:

RESULTS_FILE: path to your run_results.json file
DBT_RUN_LOGS_INDEX: the name you want to give to dbt run logs index in Elastic, e.g. dbt_run_logs
DBT_TEST_LOGS_INDEX: the name you want to give to the dbt test logs index in Elastic, e.g. dbt_test_logs
ES_CLUSTER_CLOUD_ID
ES_CLUSTER_API_KEY

Then call the function log_dbt_es from your python code or save this code as a python script and run it after executing your dbt run or dbt test commands:

from elasticsearch import Elasticsearch, helpers
import os
import sys
import json

def log_dbt_es():
   RESULTS_FILE = os.environ["RESULTS_FILE"]
   DBT_RUN_LOGS_INDEX = os.environ["DBT_RUN_LOGS_INDEX"]
   DBT_TEST_LOGS_INDEX = os.environ["DBT_TEST_LOGS_INDEX"]
   es_cluster_cloud_id = os.environ["ES_CLUSTER_CLOUD_ID"]
   es_cluster_api_key = os.environ["ES_CLUSTER_API_KEY"]


   es_client = Elasticsearch(
       cloud_id=es_cluster_cloud_id,
       api_key=es_cluster_api_key,
       request_timeout=120,
   )


   if not os.path.exists(RESULTS_FILE):
       print(f"ERROR: {RESULTS_FILE} No dbt run results found.")
       sys.exit(1)


   with open(RESULTS_FILE, "r") as json_file:
       results = json.load(json_file)
       timestamp = results["metadata"]["generated_at"]
       metadata = results["metadata"]
       elapsed_time = results["elapsed_time"]
       args = results["args"]
       docs = []
       for result in results["results"]:
           if result["unique_id"].split(".")[0] == "test":
               result["_index"] = DBT_TEST_LOGS_INDEX
           else:
               result["_index"] = DBT_RUN_LOGS_INDEX
           result["@timestamp"] = timestamp
           result["metadata"] = metadata
           result["elapsed_time"] = elapsed_time
           result["args"] = args
           docs.append(result)
       _ = helpers.bulk(es_client, docs)
   return "Done"

# Call the function
log_dbt_es()

If you want to add/remove any other fields from run_results.json, you can modify the above function to do it.

Once the results are indexed, you can use Kibana to create Data Views for both indexes and start exploring them in Discover.

Go to Discover, click on the data view selector on the top left and “Create a data view”.

Now you can create a data view with your preferred name. Do this for both dbt run (DBT_RUN_LOGS_INDEX in your code) and dbt test (DBT_TEST_LOGS_INDEX in your code) indices:

Going back to Discover, you’ll be able to select the Data Views and explore the data.

dbt run alerts, dashboards and ML jobs

The invocation of dbt run executes compiled SQL model files against the current database. dbt run invocation logs contain the following fields:

unique_id: Unique model identifier
execution_time: Total time spent executing this model run

The logs also contain the following metrics about the job execution from the adapter:

adapter_response.bytes_processed
adapter_response.bytes_billed
adapter_response.slot_ms
adapter_response.rows_affected

We have used Kibana to set up Anomaly Detection jobs on the above-mentioned metrics. You can configure a multi-metric job split by unique_id to be alerted when the sum of rows affected, slot time consumed, or bytes billed is anomalous per table. You can track one job per metric. If you have built a dashboard of the metrics per table, you can use this shortcut to create the Anomaly Detection job directly from the visualization. After the jobs are created and are running on incoming data, you can view the jobs and add them to a dashboard using the three dots button in the anomaly timeline:

We have used the ML job to set up alerts that send us emails/slack messages when anomalies are detected. Alerts can be created directly from the Jobs (Machine Learning > Anomaly Detection Jobs) page, by clicking on the three dots at the end of the ML job row:

We also use Kibana dashboards to visualize the anomaly detection job results and related metrics per table, to identify which tables consume most of our resources, to have visibility on their temporal evolution, and to measure aggregated metrics that can help us understand month over month changes.

dbt test alerts and dashboards

You may already be familiar with tests in dbt, but if you’re not, dbt data tests are assertions you make about your models. Using the command dbt test, dbt will tell you if each test in your project passes or fails. Here is an example of how to set them up. In our team, we use out-of-the-box dbt tests (unique, not_null, accepted_values, and relationships) and the packages dbt_utils and dbt_expectations for some extra tests. When the command dbt test is run, it generates logs that are stored in run_results.json.

dbt test logs contain the following fields:

unique_id: Unique test identifier, tests contain the “test” prefix in their unique identifier
status: result of the test, pass or fail
execution_time: Total time spent executing this test
failures: will be 0 if the test passes and 1 if the test fails
message: If the test fails, reason why it failed

The logs also contain the metrics about the job execution from the adapter.

We have set up alerts on document count (see guide) that will send us an email / slack message when there are any failed tests. The rule for the alerts is set up on the dbt test Data View that we have created before, the query filtering on status:fail to obtain the logs for the tests that have failed, and the rule condition is document count bigger than 0. Whenever there is a failure in any test in production, we get an alert with links to the alert details and dashboards to be able to troubleshoot them:

We have also built a dashboard to visualize the tests run, tests failed, and their execution time and slot time to have a historical view of the test run:

Finding Root Causes with the AI Assistant

The most effective way for us to analyze these multiple sources of information is using the AI Assistant to help us troubleshoot the incidents. In our case, we got an alert about a test failure, and we used the AI Assistant to give us context on what happened. Then we asked if there were any downstream consequences, and the AI Assistant interpreted the results of the Anomaly Detection job, which indicated a spike in slot time for one of our downstream tables and the increase of the slot time vs. the baseline. Then, we asked for the root cause, and the AI Assistant was able to find and provide us a link to a PR from our Github changelog that matched the start of the incident and was the most probable cause.

Conclusion

As a Data Analytics team, we are responsible for guaranteeing that the tables, charts, models, reports, and dashboards we provide to stakeholders are accurate and contain the right sources of information. As teams grow, the number of models we own becomes larger and more interconnected, and it isn’t easy to guarantee that everything is running smoothly and providing accurate results. Having a monitoring system that proactively alerts us on cost spikes, anomalies in row counts, or data quality test failures is like having a trusted companion that will alert you in advance if something goes wrong and help you get to the root cause of the issue.

dbt invocation logs are a crucial source of information about the status of our data pipelines, and Elastic is the perfect tool to extract the maximum potential out of them. Use this blog post as a starting point for utilizing your dbt logs to help your team achieve greater reliability and peace of mind, allowing them to focus on more strategic tasks rather than worrying about potential data issues.

Monitor OpenAI API and GPT models with OpenTelemetry and Elastic

Tue, 04 Apr 2023 00:00:00 GMT

ChatGPT is so hot right now, it broke the internet. As an avid user of ChatGPT and a developer of ChatGPT applications, I am incredibly excited by the possibilities of this technology. What I see happening is that there will be exponential growth of ChatGPT-based solutions, and people are going to need to monitor those solutions.

Since this is a pretty new technology, we wouldn’t want to burden our shiny new code with proprietary technology, would we? No, we would not, and that is why we are going to use OpenTelemetry to monitor our ChatGPT code in this blog. This is particularly relevant for me as I recently created a service to generate meeting notes from Zoom calls. If I am to release this into the wild, how much is it going to cost me and how do I make sure it is available?

OpenAI APIs to the rescue

The OpenAI API is pretty awesome, there is no doubt. It also gives us the information shown below in each response to each API call, which can help us with understanding what we are being charged. By using the token counts, the model, and the pricing that OpenAI has put up on its website, we can calculate the cost. The question is, how do we get this information into our monitoring tools?

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": "\n\nElastic is an amazing observability tool because it provides a comprehensive set of features for monitoring"
    }
  ],
  "created": 1680281710,
  "id": "cmpl-70CJq07gibupTcSM8xOWekOTV5FRF",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 20,
    "prompt_tokens": 9,
    "total_tokens": 29
  }
}

OpenTelemetry to the rescue

OpenTelemetry is truly a fantastic piece of work. It has had so much adoption and work committed to it over the years, and it seems to really be getting to the point where we can call it the Linux of Observability. We can use it to record logs, metrics, and traces and get those in a vendor neutral way into our favorite observability tool — in this case, Elastic Observability.

With the latest and greatest otel libraries in Python, we can auto-instrument external calls, and this will help us understand how OpenAI calls are performing. Let's take a sneak peek at our sample Python application, which implements Flask and the ChatGPT API and also has OpenTelemetry. If you want to try this yourself, take a look at the GitHub link at the end of this blog and follow these steps.

Set up Elastic Cloud account (if you already don’t have one)

Sign up for a two-week free trial at https://www.elastic.co/cloud/elasticsearch-service/signup.
Create a deployment.

Once you are logged in, click Add integrations.

Click on APM Integration.

Then scroll down to get the details you need for this blog:

Be sure to set the following Environment variables, replacing the variables with data you get from Elastic as above and OpenAI from here, and then run these export commands on the command line.

export OPEN_AI_KEY=sk-abcdefgh5ijk2l173mnop3qrstuvwxyzab2cde47fP2g9jij
export OTEL_EXPORTER_OTLP_AUTH_HEADER=abc9ldeofghij3klmn
export OTEL_EXPORTER_OTLP_ENDPOINT=https://123456abcdef.apm.us-west2.gcp.elastic-cloud.com:443

And install the following Python libraries:

pip3 install opentelemetry-api
pip3 install opentelemetry-sdk
pip3 install opentelemetry-exporter-otlp
pip3 install opentelemetry-instrumentation
pip3 install opentelemetry-instrumentation-requests
pip3 install openai
pip3 install flask

Here is a look at the code we are using for the example application. In the real world, this would be your own code. All this does is call OpenAI APIs with the following message: “Why is Elastic an amazing observability tool?”

import openai
from flask import Flask
import monitor  # Import the module
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
import urllib
import os
from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# OpenTelemetry setup up code here, feel free to replace the “your-service-name” attribute here.
resource = Resource(attributes={
    SERVICE_NAME: "your-service-name"
})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT'),
        headers="Authorization=Bearer%20"+os.getenv('OTEL_EXPORTER_OTLP_AUTH_HEADER')))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
RequestsInstrumentor().instrument()



# Initialize Flask app and instrument it

app = Flask(__name__)
# Set OpenAI API key
openai.api_key = os.getenv('OPEN_AI_KEY')


@app.route("/completion")
@tracer.start_as_current_span("do_work")
def completion():
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt="Why is Elastic an amazing observability tool?",
        max_tokens=20,
        temperature=0
    )
    return response.choices[0].text.strip()

if __name__ == "__main__":
    app.run()

This code should be fairly familiar to anyone who has implemented OpenTelemetry with Python here — there is no specific magic. The magic happens inside the “monitor” code that you can use freely to instrument your own OpenAI applications.

Monkeying around

Inside the monitor.py code, you will see we do something called “Monkey Patching.” Monkey patching is a technique in Python where you dynamically modify the behavior of a class or module at runtime by modifying its attributes or methods. Monkey patching allows you to change the functionality of a class or module without having to modify its source code. It can be useful in situations where you need to modify the behavior of an existing class or module that you don't have control over or cannot modify directly.

What we want to do here is modify the behavior of the “Completion” call so we can steal the response metrics and add them to our OpenTelemetry spans. You can see how we do that below:

def count_completion_requests_and_tokens(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        counters['completion_count'] += 1
        response = func(*args, **kwargs)
        token_count = response.usage.total_tokens
        prompt_tokens = response.usage.prompt_tokens
        completion_tokens = response.usage.completion_tokens
        cost = calculate_cost(response)
        strResponse = json.dumps(response)
        # Set OpenTelemetry attributes
        span = trace.get_current_span()
        if span:
            span.set_attribute("completion_count", counters['completion_count'])
            span.set_attribute("token_count", token_count)
            span.set_attribute("prompt_tokens", prompt_tokens)
            span.set_attribute("completion_tokens", completion_tokens)
            span.set_attribute("model", response.model)
            span.set_attribute("cost", cost)
            span.set_attribute("response", strResponse)
        return response
    return wrapper
# Monkey-patch the openai.Completion.create function
openai.Completion.create = count_completion_requests_and_tokens(openai.Completion.create)

By adding all this data to our Span, we can actually send it to our OpenTelemetry OTLP endpoint (in this case it will be Elastic). The benefit of doing this is that you can easily use the data for search or to build dashboards and visualizations. In the final step, we also want to calculate the cost. We do this by implementing the following function, which will calculate the cost of a single request to the OpenAI APIs.

def calculate_cost(response):
    if response.model in ['gpt-4', 'gpt-4-0314']:
        cost = (response.usage.prompt_tokens * 0.03 + response.usage.completion_tokens * 0.06) / 1000
    elif response.model in ['gpt-4-32k', 'gpt-4-32k-0314']:
        cost = (response.usage.prompt_tokens * 0.06 + response.usage.completion_tokens * 0.12) / 1000
    elif 'gpt-3.5-turbo' in response.model:
        cost = response.usage.total_tokens * 0.002 / 1000
    elif 'davinci' in response.model:
        cost = response.usage.total_tokens * 0.02 / 1000
    elif 'curie' in response.model:
        cost = response.usage.total_tokens * 0.002 / 1000
    elif 'babbage' in response.model:
        cost = response.usage.total_tokens * 0.0005 / 1000
    elif 'ada' in response.model:
        cost = response.usage.total_tokens * 0.0004 / 1000
    else:
        cost = 0
    return cost

Elastic to the rescue

Once we are capturing all this data, it’s time to have some fun with it in Elastic. In Discover, we can see all the data points we sent over using the OpenTelemetry library:

With these labels in place, it is very easy to build a dashboard. Take a look at this one I built earlier (which is also checked into my GitHub Repository):

We can also see Transactions, Latency of the OpenAI service, and all the spans related to our ChatGPT service calls.

In the transaction view, we can also see how long specific OpenAI calls have taken:

Some requests to OpenAI here have taken over 3 seconds. ChatGPT can be very slow, so it’s important for us to understand how slow this is and if users are becoming frustrated.

Summary

We looked at monitoring ChatGPT with OpenTelemetry with Elastic. ChatGPT is a worldwide phenomenon and it’s going to no doubt grow and grow, and pretty soon everyone will be using it. Because it can be slow to get responses out, it is critical that people are able to understand the performance of any code that is using this service.

There is also the issue of cost, since it’s incredibly important to understand if this service is eating into your margins and if what you are asking for is profitable for your business. With the current economic environment, we have to keep an eye on profitability.

Take a look at the code for this solution here. And please feel free to use the “monitor” library to instrument your own OpenAI code.

Interested in learning more about Elastic Observability? Check out the following resources:

And sign up for our Elastic Observability Trends Webinar featuring AWS and Forrester, not to be missed!

Network monitoring with Elastic: Unifying network observability

Mon, 16 Feb 2026 00:00:00 GMT

Introduction: The Network Monitoring Fragmentation Problem

In five years working with Enterprise accounts at Elastic, I have heard the same challenge again and again:

"We have several network monitoring tools, and we would love to correlate all of them into one platform."

For many organizations, the barrier to true correlation isn't a lack of data, but where that data lives. Frequently, we see SNMP metrics, flow data, and logs isolated in purpose-built silos or dashboards. Without a unified data store and a proper correlation engine, piecing together the full narrative — from a topology change to a performance degradation — becomes a manual, time-consuming puzzle.

When an incident happens, engineers become human correlation engines — manually jumping between systems, copying timestamps, cross-referencing device names, and trying to piece together what actually happened. A simple question like "Did this interface failure impact application performance?" requires querying multiple tools and mentally correlating the results.

The real cost isn't the tool licenses — it's the time lost during critical incidents.

This lab is my answer to a fundamental question: Can Elastic become the unified foundation that actually correlates network data?

More importantly, it demonstrates that Elastic is fully ready for network operations — capable of ingesting diverse telemetry and using AI to correlate relationships, identify root causes, and resolve issues in seconds instead of hours.

The Problem: Network Observability is Broken

Let me paint a typical scenario I encounter with enterprise network teams:

The Fragmented Reality:

No single source of truth
Manual correlation during incidents (15-30 minutes per event)
Fragmented teams (network vs. platform engineers)
Limited automation capabilities
No AI-powered analysis

When a link goes down at 2 AM:

Notice the alert - 2 minutes
Log into monitoring tool to see the metric - 3 minutes
Switch to traffic analyzer to check impact - 5 minutes
Open log management to search for related messages - 10 minutes
Manually correlate timestamps across systems - 8 minutes
Create a ticket and copy context from multiple tools - 8 minutes

Time to initial diagnosis: 36 minutes

This workflow is expensive, error-prone, and doesn't scale.

The Vision: Elastic as a Unified Network Observability Platform

What if you could:

Collect SNMP metrics, NetFlow, traps, and topology data in one platform
Correlate network events with application performance automatically
Generate executive dashboards without separate BI tools
Use AI to analyze incidents in seconds, not hours
Trigger alerting from network events

This is what this lab aims to demonstrate.

What I Built: A Production-Grade Network Simulation

To demonstrate how Elastic unifies network data, I needed a realistic environment that generates real-world telemetry. Enter Containerlab — a Docker-based solution that enables us to create a network simulation framework.

Lab Architecture

I simulated a Service Provider core network with:

7 FRR routers forming an OSPF Area 0 mesh
2 Ubuntu hosts for additional use cases
2 Layer 2 switches for access layer segmentation
3 telemetry collectors feeding Elastic Cloud

Total containers: 14

Deployment time: 12-15 minutes (fully automated)

Full deployment instructions and topology details are available in the GitHub repository README.

The Three Telemetry Pipelines: Proving Multi-Source Correlation

What makes this lab production-ready is its hybrid observability approach — proving that Elastic can unify disparate network data sources.

Pipeline	Data Type	Collection Method	Collector	Use Case
SNMP Metrics	Interface stats, system health, LLDP topology	Active polling	OTEL Collector	Capacity planning, trend analysis
NetFlow	Traffic flows	Push-based export	Elastic Agent	Top talkers, security investigation
SNMP Traps	Interface up/down events	Event-driven	Logstash	Real-time incident detection

This unified architecture proves Elastic can replace multiple specialized network monitoring tools with a single platform.

The Power of Correlation: One Platform, One Query

When a network incident occurs, you need to answer questions like:

Which interface failed? (SNMP metrics)
What traffic was affected? (NetFlow)
What was the sequence of events? (SNMP traps)
Which devices are downstream? (LLDP topology)

The Problem: modern tools offer separate modules glued together, forcing users to navigate different spaces for different sets of data.

The Reality: You still have to pivot. You see a spike in the Metrics module, but to see why, you have to open the Logs module and manually align the time picker. The data lives in different tables or backends, making true correlation impossible without human intervention.

The Elastic Difference: One Store, One Language, One AI

Elastic makes it simple. Whether it's an SNMP counter (metric), a NetFlow record (flow), or a Syslog message (log), it is all stored in a unified datastore powered by the Elasticsearch engine. This allows users to easily search across multiple datasets in a single query.

FROM logs-*
| WHERE host.name == "csr23" AND interface.name == "eth1"

Time required: 3 seconds

Furthermore, as you will see later, the exact location of the data becomes agnostic to the user when leveraging the AI Assistant.

Data Transformation: From Cryptic OIDs to Actionable Intelligence

Raw SNMP traps are notoriously difficult to interpret at a glance. In our current lab setup, the data arrives looking like this:

OID: 1.3.6.1.6.3.1.1.5.3
ifIndex: 2
ifDescr: eth1

While traditional Network Management Platforms (NMPs) handle OID translation natively, bringing that clarity into Elastic requires a specific configuration.

In this initial lab, we are intentionally working with this raw data to demonstrate how AI assistants can interpret these events even without pre-existing context.

However, the strategy for the next phase of this project is to implement Elasticsearch Ingest Pipelines. This will allow us to map raw OIDs to human-readable names. This step is crucial for bridging the gap between Network tools and Application Observability platforms, allowing network events to be instantly correlated with application errors and infrastructure logs.

The Target State

Once the pipeline is implemented in the next lab, we will transform that raw trap into searchable, meaningful data:

{
  "event.action": "interface-down",
  "host.name": "csr23",
  "interface.name": "eth1",
  "interface.oper_status_text": "Link Down"
}

The result:

Human-readable fields
Searchable dimensions for filtering
Context for automation rules and dashboards
Correlation keys for joining with metrics and flows

In our next blog post, we will walk through building the ingest pipeline that performs this transformation — step by step.

Intelligent Alerting: From Noise to Actionable Intelligence

Traditional network monitoring relies on simple threshold alerts — "interface down," "high CPU." These alerts flood your inbox but provide zero context about root cause, impact, or remediation.

The Lab's Approach: ES|QL + AI Assistant

1. Semantic Detection with ES|QL

Instead of generic threshold alerts, the lab uses ES|QL to detect specific event patterns:

FROM logs-snmp.trap-prod
| WHERE snmp.trap_oid == "1.3.6.1.6.3.1.1.5.3"
| KEEP @timestamp, host.name, interface.name, message

2. Automatic AI-Powered Investigation

When the alert triggers, it invokes the Observability AI Assistant with a structured investigation prompt that:

Performs immediate triage (which device, which interface, when)
Assesses OSPF impact and traffic rerouting
Correlates with other recent failures
Generates severity assessment and recommended actions

The Transformation

Traditional Alerting	Intelligent Alerting (Elastic)
Email: "Interface down on csr23"	Structured analysis with device context
Manual investigation: 20-30 min	AI-automated investigation: 90 seconds
Engineer correlates across tools	Automatic cross-source correlation
No business impact assessment	Severity + recommended actions included

Accelerating Incident Response with the Elastic AI Assistant

This is where the Elastic AI Assistant demonstrates its operational value — moving beyond passive data collection to actively interpret and explain network events in real-time

When an engineer views a trap document in Discover and asks:

"Explain this log message"

The AI Assistant provides comprehensive analysis including:

What happened: Plain-language explanation of the SNMP trap
Device context: Router role, interface purpose, network position
Impact analysis: OSPF neighbor status, traffic rerouting assessment
Root cause possibilities: Physical layer, link layer, administrative causes
Recommended actions: Immediate steps, investigation queries, validation checks
Severity assessment: Business and technical impact rating

Manual Triage vs. AI-Assisted Investigation

Before	After (Elastic AI)
Google the OID → 5 min	Click "Explain this log" → 20 seconds
Open network diagram → 3 min	Topology context auto-provided
Query multiple tools → 10 min	Cross-source correlation instant
Assess business impact → 5 min	Impact analysis auto-generated
Total: ~28 minutes	Total: ~20 seconds

The Value Proposition: One Platform, One Data Model, One AI

What This Lab Demonstrates

Elastic provides:

One unified platform for metrics, logs, flows
One data model (SemConv) for consistent correlation
One search interface (Kibana) for all network data
One AI assistant that understands all your network telemetry
AI-powered alerting with automated investigation

Business Impact

Efficiency Gains:

85% reduction in MTTR (36 min → 5 min for initial diagnosis)
90% reduction in manual correlation time
Junior engineers gain access to AI-powered expert analysis

Operational Benefits:

Network engineers focus on strategy, not tool-switching
Cross-functional collaboration in one platform
Reduced tool sprawl and management overhead

Lessons Learned

After building this lab, several key insights emerged regarding how network data fits into the broader observability ecosystem:

1. Extending Observability to the Network

Elastic is already the gold standard for high-volume logs and application traces. This lab demonstrates that the same engine seamlessly handles network telemetry without needing a separate, siloed tool.

Scale: The same architecture that ingests petabytes of application logs easily handles millions of interface counters.
Structure: Native support for complex nested documents allows for rich SNMP trap data (variable bindings) without flattening or losing context.
Speed: Real-time search applies equally to network events, enabling sub-second troubleshooting.

2. OpenTelemetry Semantic Conventions (SemConv) as the Universal Translator

The power isn't just in storing the data, but in standardizing it. By mapping SNMP and NetFlow to the OpenTelemetry Semantic Conventions (SemConv), network data finally speaks the same language as the rest of the stack.

Unified Search: Query across firewall logs, server metrics, and switch telemetry in a single search bar.
Instant Visualization: Pre-built dashboards work immediately because the field names are standardized.
Cross-Domain Correlation: Easily correlates a spike in application latency with a specific interface saturation event.

3. AI Assistants Thrive on Context

While the AI in this lab was powerful on its own, the experiment highlighted a critical realization: an AI Assistant becomes exponentially more effective when coupled with a specific Knowledge Base.

Context is King: The AI delivers better root cause analysis when provided with rich metadata, such as device roles and topology maps. Without it, the advice remains generic.

Pro Tip (and What’s Next):

To get organization-specific advice rather than generic suggestions, you need to feed the AI your documentation.

The Goal: Create a Knowledge Base containing device roles, network topology diagrams, and troubleshooting procedures.
The Next Step: In my next blog post, I will demonstrate exactly how to do this — connecting a Knowledge Base to the AI Assistant to enable fully context-aware troubleshooting.

Conclusion: Completing the Observability Picture

Elastic is already widely recognized as the standard for Application and Security observability. The goal of this lab wasn't to ask if Elastic can handle networking, but to demonstrate the immense value of bringing network data into that existing ecosystem.

The verdict is clear: Elastic acts as that unified foundation. It effectively breaks down the silo between Network Engineering and the rest of IT.

This isn't just about consolidating dashboards or replacing legacy tools. It is about establishing the Elasticsearch AI Platform as the single source of truth where network telemetry sits right alongside application and infrastructure data.

By treating network data as a first-class citizen in the observability stack, we unlock automated correlation, AI-assisted investigation, and the speed required to resolve incidents before they impact the business. The capabilities are in place, and the foundation is solid — Elastic is ready to unify your network with the rest of your digital business.

Ready to Try It Yourself?

Check out github.com/DeBaker1974/Containerlab-OSPF

The repository includes:

Complete deployment scripts (12-15 minute automated setup)
Pre-configured telemetry pipelines
Kibana dashboards
Alert rules with AI Assistant integration
Detailed README

Not ready to build? Try Elastic Serverless: Start a free 14-day trial and explore AI-powered observability with your own data.

Special thanks to the Containerlab and FRRouting communities for their incredible open-source tools, and to Sheriff Lawal (CCIE, CISSP), Sr. Manager, Solutions Architecture at Elastic, for mentoring on this project.

NGNIX log analytics with GenAI in Elastic

Fri, 05 Jul 2024 00:00:00 GMT

Elastic Observability provides a full observability solution, supporting metrics, traces, and logs for applications and infrastructure. NGINX, which is highly used for web serving, load balancing, http caching, and reverse proxy, is the key to many applications and outputs a large volume of logs. NGINX’s access logs, which detail all requests made to the NGINX server, and error logs which record server-related issues and problems are key to managing and analyzing NGINX issues along with understanding what is happening to your application.

In managing NGINX Elastic provides several capabilities:

Easy ingest, parsing, and out-of-the-box dashboards. Check out the simple how-to in our docs. Based on logs, these dashboards show several items over time, response codes, errors, top pages, data volume, browsers used, active connections, drop rates, and much more.
Out-of-the-box ML-based anomaly detection jobs for your NGINX logs. These jobs help pinpoint anomalies against request rates, IP address request rates, URL access, status codes, and visitor rate anomalies.
ES|QL which helps work through logs and build out charts during analysis.
Elastic’s GenAI Assistant provides a simple natural language interface that helps analyze all the logs and can pull out issues from ML jobs and even create dashboards. The Elastic AI Assistant also automatically uses ES|QL.
NGINX SLOs - Finally Elastic provides the ability to define and monitor SLOs for your NGINX logs. While most SLOs are metrics-based, Elastic allows you to create logs-based SLOs. We detailed this in a previous blog.

NGINX logs are another example of why logs are great. Logging is an important part of Observability, for which we generally think of metrics and tracing. However, the amount of logs an application and the underlying infrastructure output can be significantly daunting and NGINX is usually the starting point for most analyses.

In today’s blog, we’ll cover how the out-of-the-box ML-based anomaly detection jobs can help RCA, and how Elastic’s GenAI Assistant helps easily work through logs to pinpoint issues in minutes.

Prerequisites and config

If you plan on following this blog, here are some of the components and details we used to set up this demonstration:

Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
Bring up an NGINX server on a host. OR run an application with NGINX as a front end and drive traffic.
Install the NGINX integration and assets and review the dashboards as noted in the docs.
Ensure you have an ML node configured in your Elastic stack
To use the AI Assistant you will need a trial or upgrade to Platinum.

In our scenario, we use data from 3 months from our Elastic environment to help highlight the features. Hence you might need to run your application with traffic for a specific time frame to follow along.

Analyzing the issues with AI Assistant

As detailed in a previous blog, you can get alerted on issues via SLO monitoring against NGINX logs. Let’s assume you have an SLO based on status codes as we outlined in the previous blog. You can immediately analyze the issue via the AI Assistant. Because it's a chat interface we simply open the AI Assistant and work through some simple analysis: (See Animated GIF for a demo)

AI Assistant analysis:

Using lens graph all http response status codes < 400 and > =400 from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer - We wanted to simply understand the amount of requests resulting in status code >= 400 and graph the results. We see that 15% of the requests were not successful, hence an SLO alert being triggered.
Which ip address (field source.adress) has the highest number of http.response.status.code >= 400 from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer - We were curious is there was a specific IP address not having successful requests. 72.57.0.53, with a count of 25,227 occurrences is daily high but not the ensure 2 failed requests.
What country (source.geo.country_iso_code) is source.address=72.57.0.53 coming from. Use filebeat-nginx-elasticco-anon-2017. - Again we were curious if this came from a specific country. And the IP address 72.57.0.53 is coming from the country with the ISO code IN, which corresponds to India. Nothing out of the ordinary.
Did source.address=72.57.0.53 have any (http.response.status.code < 400) from filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer - Oddly the IP address in question only had 4000+ successful responses. Meaning its not malicious, and points to something else.
What are the different status codes (http.response.status.code>=400), from source.address=72.57.0.53. Use filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer. Provide counts for each status code - We are curious whether or not we see any 502, which there were none, but most of the failures were 404.
What are the different status codes (http.response.status.code>=400). Use filebeat-nginx-elasticco-anon-2017. http.response.status.code is not an integer. Provide counts for each status code - Regardless of a specific address, what is the largest number of status code occurrences > 400. This also points to 404.
What does a high 404 count from a specific IP address mean from NGINX logs? - Asking this question, we need to understand the potential causes of this from our application. From the answers, we can rule out security probing and web scraping, as we validated that a specific address 72.57.0.53 has a low non-success request status code. It also rules out User error. Hence this points potentially to Broken Links or Missing Resources.

Watch the flow:

Potential issue:

It seems that we potentially have an issue with the backend serving specific answers or having issues with resources (database, or broken links). This is cursing the higher-than-normal non-successful status codes>=400.

Key highlights from AI Assistant:

As you watched this video you will notice a few things:

We analyzed millions of logs in a matter of minutes using a set of simple natural language queries.
We didn’t need to know any special query language. The AI Assistant used Elastic’s ES|QL but can similarly use KQL also.
The AI Assistant easily builds out graphs
The AI Assistant is accessing and using internal information stored in Elastic’s indices. Vs a simple “google foo” based AI Assistant. This is enabled through RAG, and the AI Assistant can also bring up known issues in github, runbooks, and other useful internal information.

Check out the following blog on how the AI Assistant uses RAG to retrieve internal information. Specifically using github and runbooks.

Locating anomalies with ML

While using the AI Assistant is great for analyzing information, another important aspect of NGINX log management is to ensure you can manage log spikes and anomalies. Elastic has a machine learning platform that allows you to develop jobs to analyze specific metrics or multiple metrics to look for anomalies.When using NGINX, there are several out-of-the-box anomaly detection jobs. These work specifically on NGINX access logs.

Low_request_rate_nginx - Detect low request rates
Source_ip_request_rate_nginx - Detect unusual source IPs - high request rates
Source_ip_url_count_nginx - Detect unusual source IPs - high distinct count of URLs
Status_code_rate_nginx - Detect unusual status code rates
Visitor_rate_nginx - Detect unusual visitor rates

Being right out of the box, lets look at the job - Status_code_rate_nginx, which is related to our previous analysis.

With a few simple clicks we immediately get an analysis showing a specific IP address - 72.57.0.53, having higher than normal non-successful requests. Oddly we also found this is using the AI Assistant.

We can take this further with conversations with the AI Assistant, look at the logs, and/or even look at the other ML anomaly jobs.

Conclusion:

You’ve now seen how easily Elastic’s RAG-based AI Assistant can help analyze NGINX logs without even the need to know query syntax, understand where the data is, and understand even the fields. Additionally, you’ve also seen how we can alert you when a potential issue or degradation in service (SLO).

Check out other resources on NGINX logs:

Out-of-the-box anomaly detection jobs for NGINX

Using the NGINX integration to ingest and analyze NGINX Logs

NGINX Logs based SLOs in Elastic

Using GitHub issues, runbooks, and other internal information for RCAs with Elastic’s RAG based AI Assistant

Try it out

Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on the cloud? Start a free trial.

All of this is also possible in your environment. Learn how to get started today.

Optimizing Observability with ES|QL: Streamlining SRE operations and issue resolution for Kubernetes and OTel

Wed, 01 Nov 2023 00:00:00 GMT

As an operations engineer (SRE, IT Operations, DevOps), managing technology and data sprawl is an ongoing challenge. Simply managing the large volumes of high dimensionality and high cardinality data is overwhelming.

As a single platform, Elastic® helps SREs unify and correlate limitless telemetry data, including metrics, logs, traces, and profiling, into a single datastore — Elasticsearch®. By then applying the power of Elastic’s advanced machine learning (ML), AIOps, AI Assistant, and analytics, you can break down silos and turn data into insights. As a full-stack observability solution, everything from infrastructure monitoring to log monitoring and application performance monitoring (APM) can be found in a single, unified experience.

In Elastic 8.11, a technical preview is now available of Elastic’s new piped query language, ES|QL (Elasticsearch Query Language), which transforms, enriches, and simplifies data investigations. Powered by a new query engine, ES|QL delivers advanced search capabilities with concurrent processing, improving speed and efficiency, irrespective of data source and structure. Accelerate resolution by creating aggregations and visualizations from one screen, delivering an iterative, uninterrupted workflow.

Advantages of ES|QL for SREs

SREs using Elastic Observability can leverage ES|QL to analyze logs, metrics, traces, and profiling data, enabling them to pinpoint performance bottlenecks and system issues with a single query. SREs gain the following advantages when managing high dimensionality and high cardinality data with ES|QL in Elastic Observability:

Improved operational efficiency: By using ES|QL, SREs can create more actionable notifications with aggregated values as thresholds from a single query, which can also be managed through the Elastic API and integrated into DevOps processes.
Enhanced analysis with insights: ES|QL can process diverse observability data, including application, infrastructure, business data, and more, regardless of the source and structure. ES|QL can easily enrich the data with additional fields and context, allowing the creation of visualizations for dashboards or issue analysis with a single query.
Reduced mean time to resolution: ES|QL, when combined with Elastic Observability's AIOps and AI Assistant, enhances detection accuracy by identifying trends, isolating incidents, and reducing false positives. This improvement in context facilitates troubleshooting and the quick pinpointing and resolution of issues.

ES|QL in Elastic Observability not only enhances an SRE's ability to manage the customer experience, an organization's revenue, and SLOs more effectively but also facilitates collaboration with developers and DevOps by providing contextualized aggregated data.

In this blog, we will cover some of the key use cases SREs can leverage with ES|QL:

ES|QL integrated with the Elastic AI Assistant, which uses public LLM and private data, enhances the analysis experience anywhere in Elastic Observability.
SREs can, in a single ES|QL query, break down, analyze, and visualize observability data from multiple sources and across any time frame.
Actionable alerts can be easily created from a single ES|QL query, enhancing operations.

I will work through these use cases by showcasing how an SRE can solve a problem in an application instrumented with OpenTelemetry and running on Kubernetes. The OpenTelemetry (OTel) demo is on an Amazon EKS cluster, with Elastic Cloud 8.11 configured.

You can also check out our Elastic Observability ES|QL Demo, which walks through ES|QL functionality for Observability.

ES|QL with AI Assistant

As an SRE, you are monitoring your OTel instrumented application with Elastic Observability, and while in Elastic APM, you notice some issues highlighted in the service map.

Using Elastic AI Assistant, you can easily ask for analysis, and in particular, we check on what the overall latency is across the application services.

My APM data is in traces-apm*. What's the average latency per service over the last hour? Use ESQL, the data is mapped to ECS

The Elastic AI Assistant generates an ES|QL query, which we run in the AI Assistant to get a list of the average latencies across all the application services. We can easily see the top four are:

load generator
front-end proxy
frontendservice
checkoutservice

With a simple natural language query in the AI Assistant, it generated a single ES|QL query that helped list out the latencies across the services.

Noticing that there is an issue with several services, we decide to start with the frontend proxy. As we work through the details, we see significant failures, and through Elastic APM failure correlation , it becomes apparent that the frontend proxy is not properly completing its calls to downstream services.

ES|QL insightful and contextual analysis in Discover

Knowing that the application is running on Kubernetes, we investigate if there are issues in Kubernetes. In particular, we want to see if there are any services having issues.

We use the following query in ES|QL in Elastic Discover:

from metrics-* | where kubernetes.container.status.last_terminated_reason != "" and kubernetes.namespace == "default" | stats reason_count=count(kubernetes.container.status.last_terminated_reason) by kubernetes.container.name, kubernetes.container.status.last_terminated_reason | where reason_count > 0

ES|QL helps analyze 1,000s/10,000s of metric events from Kubernetes and highlights two services that are restarting due to OOMKilled.

The Elastic AI Assistant, when asked about OOMKilled, indicates that a container in a pod was killed due to an out-of-memory condition.

We run another ES|QL query to understand the memory usage for emailservice and productcatalogservice.

ES|QL easily found the average memory usage fairly high.

We can now further investigate both of these services’ logs, metrics, and Kubernetes-related data. However, before we continue, we create an alert to track heavy memory usage.

Actionable alerts with ES|QL

Suspecting a specific issue, that might recur, we simply create an alert that brings in the ES|QL query we just ran that will track for any service that exceeds 50% in memory utilization.

We modify the last query to find any service with high memory usage:

FROM metrics*
| WHERE @timestamp >= NOW() - 1 hours
| STATS avg_memory_usage = AVG(kubernetes.pod.memory.usage.limit.pct) BY kubernetes.deployment.name | where avg_memory_usage > .5

With that query, we create a simple alert. Notice how the ES|QL query is brought into the alert. We simply connect this to pager duty. But we can choose from multiple connectors like ServiceNow, Opsgenie, email, etc.

With this alert, we can now easily monitor for any services that exceed 50% memory utilization in their pods.

Make the most of your data with ES|QL

In this post, we demonstrated the power ES|QL brings to analysis, operations, and reducing MTTR. In summary, the three use cases with ES|QL in Elastic Observability are as follows:

ES|QL integrated with the Elastic AI Assistant, which uses public LLM and private data, enhances the analysis experience anywhere in Elastic Observability.
SREs can, in a single ES|QL query, break down, analyze, and visualize observability data from multiple sources and across any time frame.
Actionable alerts can be easily created from a single ES|QL query, enhancing operations.

Elastic invites SREs and developers to experience this transformative language firsthand and unlock new horizons in their data tasks. Try it today at https://ela.st/free-trial now in technical preview.

Elastic Observability Tour

The power of effective log management

Transforming Observability with the AI Assistant

ES|QL announcement blog

Live logs and prosper: fixing a fundamental flaw in observability

Mon, 27 Oct 2025 00:00:00 GMT

SREs are often overwhelmed by dashboards and alerts that show what and where things are broken, but fail to reveal why. This industry-wide focus on visualizing symptoms forces engineers to manually hunt for answers. The crucial "why" is buried in information-rich logs, but their massive volume and unstructured nature has led the industry to throw them aside or treat them like a second-class citizen. As a result, SREs are forced to turn every investigation into a high-stress, time-consuming hunt for clues. We can solve this problem with logs, but unlocking their potential requires us to reimagine how we work with them and improve the overall investigations journey.

Observability, the broken promise

To see why the current model fails, let’s look at the all-too-familiar challenge every SRE dreads: knowing a problem exists but needing to spend valuable time just trying to find where to even start the investigation.

Imagine you get a Slack message from the support team: "a few high-value customers are reporting their payments are failing." You have no shortage of alerts, but most are just flagging symptoms. You don’t know where to start. You decide to check the logs to see if there is anything obvious, starting with the systems that have the high CPU alert.

You spend a few minutes searching and grep-ing through terabytes of logs for affected customer IDs, trying to piece together the problem. Nothing. You worry that you aren’t getting all the logs to reveal the problem, so you turn on more logging in the application. Now you’re knee-deep in data, desperately trying to find patterns, errors, or other "hints" that will give you a clue as to the why.

Finally, one of the broader log queries hits on an error code associated with an impacted customer ID. This is the first real clue. You pivot your search to this new error code and after an hour of digging, you finally uncover the error message. You've finally found the why, but it was a stressful, manual hunt that took far too much time and impacted dozens more customers.

This incident perfectly illustrates the broken promise of modern observability: The complete failure of the investigation process. Investigations are a manual, reactive process that SREs are forced into every day. At Elastic, we believe metrics, traces, and logs are all essential, but their roles, and the workflow between them, must be fundamentally re-imagined for effective investigations.

Observability is about having the clearest understanding possible of the what, where, and why. Metrics are essential for understanding the what. They are the heartbeat of your system, powering the dashboards and alerts that tell you when a threshold has been breached, like high CPU utilization or error rates. But they are aggregates; they show the symptom, rarely the root cause. Traces are good at identifying the where. They map the journey of a request through a distributed system, pinpointing the specific microservice or function where latency spikes or an error originates. Yet, their effectiveness hinges on complete and consistent code instrumentation, a constant dependency on development teams that can leave you with critical visibility gaps. Logs tell you the why. They contain all the rich, contextual, and unfiltered truth of an event. If we can more proactively and efficiently extract information from logs, we can greatly improve our overall understanding of our environments.

Challenges of logs in modern environments

While logs are in the standard toolbox, they have been neglected. SREs using today’s solutions deal with several major problems:

First, due to their unstructured nature, it’s very difficult to parse and manage logs so that they’re useful. As a result, many SRE teams spend a lot of time building and maintaining complex pipelines to help manage this process.

Second, logs can get expensive at high volume, which leads teams to drop them on the floor to control costs, throwing away valuable information in the process. Consequently, when an incident occurs, you waste precious time hunting for the right logs, and manually correlating across services.

Finally, nobody has built a log solution that proactively works to find the important signals in logs and to surface those critical whys to you when you need them. As a result, log-based investigations are too painful and slow.

Why are we here? As applications became more complex, log volume became unmanageable. Instead of solving this with automation, the industry took a shortcut: it gave up on getting the most out of logs and prioritized more manageable but less informative signals.

This decision is the origin of the broken, reactive model. It forced observability into a manual loop of 'observing' alerts, rather than building automation that could help us truly understand our systems to improve how we root cause and resolve issues. This has transformed SREs from investigators into full-time data wranglers, wrestling with Grok patterns and fragile ETL scripts instead of solving outages.

Introducing Streams to rethink how you use logs for investigations

Streams is an agentic AI solution that simplifies working with logs to help SRE teams rapidly understand the why behind an issue for faster resolution. The combination of Elasticsearch and AI is turning manual management of noisy logs into automated workflows that identify patterns, context, and meaning, marking a fundamental shift in observability.

Log everything in any format

By applying the Elasticsearch platform for context engineering to bring together retrieval and AI-driven parsing to keep up with schema changes, we are reimagining the entire log pipeline.

Streams ingests raw logs from all your sources to a single destination. It then uses AI to partition incoming logs into their logical components and parses them to extract relevant fields for an SRE to validate, approve, or modify. Imagine a world where you simply point your logs to a single endpoint, and everything just works. Less wrestling with Grok patterns, configuring processors, and hunting for the right plugin. All of which significantly reduces the complexity. Streams is a big step towards realizing that vision.

As a result, SREs are freed from managing complex ingestion pipelines, allowing them to spend less time on data wrangling and more time preventing service disruptions.

Solve incidents faster with Significant Events

Significant Events, a capability within Streams, uses AI to automatically surface major errors and anomalies, enabling you to be proactive in your investigations. So, instead of just combing through endless noise, you can focus on the events that truly matter, such as startup and shutdown messages, out-of-memory errors, internal server failures, and other significant signals of change. These events act as actionable markers, giving SREs early warning and clear focus to begin an investigation before service impact.

With this new foundation, logs will become your primary signal for investigation. The panicked, manual search for a needle in a digital haystack is about to be over. Significant Events acts like a smart metal detector that sifts through the chaos and only beeps when it finds issues, helping you to easily ignore all that hay and find the "needle" faster.

Now imagine the same scenario we started with. Instead of starting a frantic, time-consuming grep through terabytes of logs. Streams has already done the heavy lifting. Its AI-driven analysis has detected a new, anomalous pattern that began before your support team even knew about it and automatically surfaced it as a significant event. Rather than you hunting for a clue, the clue finds you.

With a single click, you have the why: a Java out-of-memory error in a specific service component. This is your starting point. You find the root cause in under two minutes and begin remediation. The customer impact is stopped, the dev team gets the specific error, and the problem is contained before it can escalate. In this case, metrics and traces were unhelpful in finding the why. The answer was waiting in the logs all along.

This ideal outcome is possible because you can both afford to keep every log and instantly find the signal within them. Elastic's cost-efficient architecture with powerful compression, searchable snapshots, and data tiering makes full retention a reality. From there, Streams automatically surfaces the significant event, ensuring that the answer is never lost in the noise.

Elastic is the only company that provides an AI-driven log-first approach to elevate your observability signals and make it dramatically faster and easier to get to why. This is built on our decades of leadership in search, relevance, and powerful analytics that provides the foundation for understanding logs at a deep, semantic level.

The vision for Streams

The partitioning, parsing, and Significant Events you see today is just the starting point. The next step in our vision is to use the Significant Events to automatically generate critical SRE artifacts. Imagine Streams creating intelligent alerts, on-the-fly investigation dashboards, and even data-driven SLOs based only on the events that actually impact service health. From there, the goal is to use AI to drive automated Root Cause Analysis (RCA) directly from log patterns and generate remediation runbooks, turning a multi-hour hunt into an instant resolution recommendation.

Once this AI-drive log foundation is in place, our vision for Streams expands to become a unified intelligence layer that operates across all your telemetry data. It’s not just about making each signal better in isolation, but about understanding the context and relationships between them to solve complex problems.

For metrics, Streams won’t just alert you to a single metric spike but detect a correlated anomaly across multiple, seemingly unrelated metrics e.g. p99 latency for a specific service, rise in garbage collection time, transaction success rate.

Similarly, for traces it identifies a new, unexpected service call (e.g., a new database or an external API) appears in a critical transaction path after a deployment or identifies specific span is suddenly responsible for a majority of errors across all traces, even if the overall error rate hasn't breached a threshold.

The goal is not to have separate streams for logs, metrics, and traces, but to weave them into a single narrative that automatically correlates all three signals. Ultimately, Streams is about fundamentally changing the goal from human led data gathering exercise to proactive, AI-driven resolution.

For more on Streams:

Read the Streams launch blog

Look at the Streams website

How Streams in Elastic Observability Simplifies Retention Management

Thu, 30 Oct 2025 00:00:00 GMT

Managing retention in Elasticsearch can get complicated fast. Between Data stream lifecycle (DSL), Index lifecycle management (ILM), templates, and individual index settings, keeping policies consistent across data streams often takes more effort than it should.

Streams changes that. It introduces a clear, unified way to manage how long your data lives, whether you’re using DSL or ILM. You can visualize ingestion, understand where data sits across tiers, and adjust retention with confidence, applying updates to a single stream without worrying about unintended changes elsewhere, all from a single view.

Walkthrough: Exploring the Retention Tab

Retention management lives in the Retention tab of each stream. This is your control panel for understanding how much data you’re storing, how quickly it’s growing, and how your lifecycle policies are applied. It’s also where you can monitor and configure the Failure store, which tracks and retains documents that failed to be ingested.

Metrics at a glance

At the top of the view, you’ll find an overview of key metrics:

Storage size: the total data volume currently held by the stream.
Ingestion averages: calculated from the selected time range, Streams extrapolates both daily and monthly averages to give you a sense of long-term trends.

This combination of near-real-time and projected values helps you quickly spot when ingestion is ramping up and whether your retention policy aligns with it.

Ingestion over time

Below the metrics, a graph shows ingestion volume over time. This information is approximated based on the number of documents over time, multiplied by the average document size in the backing index.

Visualizing lifecycle phases

When an ILM policy is effective, the retention view becomes more visual. Streams displays a phase breakdown (hot, warm, cold, frozen) showing the data volume stored in each phase. This gives you a clear sense of how your data is distributed across the storage tiers and whether your lifecycle is doing what you expect.

Failure store

A failure store is a secondary set of indices inside a data stream, dedicated to storing documents that failed to be ingested. Within the Retention tab, you can toggle the Failure store on or off, and configure its own retention period. We’ll cover Failure store and Data quality in more detail in this article.

Updating Retention

Beyond visualizing your retention, Streams makes it easy to change how it’s managed.

Switching between DSL and ILM

You can freely switch a stream between DSL and ILM management, or update a DSL retention with just a few clicks. Streams takes care of updating the lifecycle settings at the data stream level, ensuring consistent retention across all existing backing indices, not just new ones.

Whether you prefer the simplicity of DSL or the fine-grained tiering of ILM, you can move between the two seamlessly.

Clicking “Edit data retention” opens a modal that allows you to update the stream’s configuration. From there you can update the ILM policy or set a custom retention period via DSL.

You can set a custom period, or pick an Indefinite retention for your data.

You can also update streams’ lifecycle via the Upsert stream or the Update ingest stream settings Kibana APIs.

Inherit or defer: different strategies for different stream types

Classic streams

For classic streams, you can default to the existing index template’s retention. Retention isn’t managed by Streams in this case, it follows the lifecycle configuration defined in the template just as it normally would.

This option is useful if you’re onboarding existing data streams and want to keep their lifecycle behavior intact while still benefiting from Streams’ visibility and monitoring features.

Wired streams

Wired streams live in a tree structure, and that hierarchy allows an inheritance model.

A child stream can inherit the lifecycle of its nearest ancestor that has a concrete policy (ILM or DSL). This keeps your configuration lean and consistent since you can set a single lifecycle at a higher level in the tree and let Streams automatically apply it to all relevant descendants.

If that ancestor’s lifecycle is later updated, Streams cascades the change down to all children that inherit it, so everything stays in sync.

In the figure below, we set a different retention for logs.prod and logs.staging environments. The child partitions of these environments automatically inherit the configuration.

How it works under the hood

When you apply or update a lifecycle, Streams calls Elasticsearch’s /_data_stream/_settings. This is a new API we’ve added in 8.19 / 9.1 for this purpose.

This API is key to keeping retention consistent:

It applies the lifecycle directly at the data stream level, overriding any configuration from cluster settings or index templates.
It propagates the retention update to all existing backing indices, not just new ones, so retention remains uniform across your historical and future data.

By centralizing lifecycle management at the data stream level and applying a consistent configuration across the backing indices, we remove the ambiguity that used to exist between template-level and index-level configurations. You always know which retention policy is actually in effect, and you can see it directly in the UI.

Wrapping Up

With Streams, retention management becomes clear and consistent. You can visualize ingestion, switch between DSL and ILM, or inherit policies across streams, all without diving into templates or manual index settings.

By unifying retention into a single view, Streams turns lifecycle management into something simple, predictable, and transparent.

Sign up for an Elastic trial at cloud.elastic.co, and trial Elastic's Serverless offering which will allow you to play with all of the Streams functionality.

Additionally, check out:

Read about Reimagining streams

Look at the Streams website

Read the Streams documentation

Enhancing SRE troubleshooting with the AI Assistant for Observability and your organization's runbooks

Wed, 08 Nov 2023 00:00:00 GMT

The Observability AI Assistant helps users explore and analyze observability data using a natural language interface, by leveraging automatic function calling to request, analyze, and visualize your data to transform it into actionable observability. The Assistant can also set up a Knowledge Base, powered by Elastic Learned Sparse EncodeR (ELSER) to provide additional context and recommendations from private data, alongside the large language models (LLMs) using RAG (Retrieval Augmented Generation). Elastic’s Stack — as a vector database with out-of-the-box semantic search and connectors to LLM integrations and the Observability solution — is the perfect toolkit to extract the maximum value of combining your company's unique observability knowledge with generative AI.

Enhanced troubleshooting for SREs

Site reliability engineers (SRE) in large organizations often face challenges in locating necessary information for troubleshooting alerts, monitoring systems, or deriving insights due to scattered and potentially outdated resources. This issue is particularly significant for less experienced SREs who may require assistance even with the presence of a runbook. Recurring incidents pose another problem, as the on-call individual may lack knowledge about previous resolutions and subsequent steps. Mature SRE teams often invest considerable time in system improvements to minimize "fire-fighting," utilizing extensive automation and documentation to support on-call personnel.

Elastic® addresses these challenges by combining generative AI models with relevant search results from your internal data using RAG. The Observability AI Assistant's internal Knowledge Base, powered by our semantic search retrieval model ELSER, can recall information at any point during a conversation, providing RAG responses based on internal knowledge.

This Knowledge Base can be enriched with your organization's information, such as runbooks, GitHub issues, internal documentation, and Slack messages, allowing the AI Assistant to provide specific assistance. The Assistant can also document and store specific information from an ongoing conversation with an SRE while troubleshooting issues, effectively creating runbooks for future reference. Furthermore, the Assistant can generate summaries of incidents, system status, runbooks, post-mortems, or public announcements.

This ability to retrieve, summarize, and present contextually relevant information is a game-changer for SRE teams, transforming the work from chasing documents and data to an intuitive, contextually sensitive user experience.The Knowledge Base (see requirements) serves as a central repository of Observability knowledge, breaking documentation silos and integrating tribal knowledge, making this information accessible to SREs enhanced with the power of LLMs.

Your LLM provider may collect query telemetry when using the AI Assistant. If your data is confidential or has sensitive details, we recommend you verify the data treatment policy of the LLM connector you provided to the AI Assistant.

In this blog post, we will cover different ways to enrich your Knowledge Base (KB) with internal information. We will focus on a specific alert, indicating that there was an increase in logs with “502 Bad Gateway” errors that has surpassed the alert’s threshold.

How to troubleshoot an alert with the Knowledge Base

Before the KB has been enriched with internal information, when the SRE asks the AI Assistant about how to troubleshoot an alert, the response from the LLM will be based on the data it learned during training; however, the LLM is not able to answer questions related to private, recent, or emerging knowledge. In this case, when asking for the steps to troubleshoot the alert, the response will be based on generic information.

However, once the KB has been enriched with your runbooks, when your team receives a new alert on “502 Bad Gateway” Errors, they can use AI Assistant to access the internal knowledge to troubleshoot it, using semantic search to find the appropriate runbook in the Knowledge Base.

In this blog, we will cover different ways to add internal information on how to troubleshoot an alert to the Knowledge Base:

Ask the assistant to remember the content of an existing runbook.
Ask the Assistant to summarize and store in the Knowledge Base the steps taken during a conversation and store it as a runbook.
Import your runbooks from GitHub or another external source to the Knowledge Base using our Connector and APIs.

After the runbooks have been added to the KB, the AI Assistant is now able to recall the internal and specific information in the runbooks. By leveraging the retrieved information, the LLM could provide more accurate and relevant recommendations for troubleshooting the alert. This could include suggesting potential causes for the alert, steps to resolve the issue, preventative measures for future incidents, or asking the assistant to help execute the steps mentioned in the runbook using functions. With more accurate and relevant information at hand, the SRE could potentially resolve the alert more quickly, reducing downtime and improving service reliability.

Your Knowledge Base documents will be stored in the indices .kibana-observability-ai-assistant-kb-*. Have in mind that LLMs have restrictions on the amount of information the model can read and write at once, called token limit. Imagine you're reading a book, but you can only remember a certain number of words at a time. Once you've reached that limit, you start to forget the earlier words you've read. That's similar to how a token limit works in an LLM.

To keep runbooks within the token limit for Retrieval Augmented Generation (RAG) models, ensure the information is concise and relevant. Use bullet points for clarity, avoid repetition, and use links for additional information. Regularly review and update the runbooks to remove outdated or irrelevant information. The goal is to provide clear, concise, and effective troubleshooting information without compromising the quality due to token limit constraints. LLMs are great for summarization, so you could ask the AI Assistant to help you make the runbooks more concise.

Ask the assistant to remember the content of an existing runbook

The easiest way to store a runbook into the Knowledge Base is to just ask the AI Assistant to do it! Open a new conversation and ask “Can you store this runbook in the KB for future reference?” followed by pasting the content of the runbook in plain text.

The AI Assistant will then store it in the Knowledge Base for you automatically, as simple as that.

Ask the Assistant to summarize and store the steps taken during a conversation in the Knowledge Base

You can also ask the AI Assistant to remember something while having a conversation — for example, after you have troubleshooted an alert using the AI Assistant, you could ask to "remember how to troubleshoot this alert for next time." The AI Assistant will create a summary of the steps taken to troubleshoot the alert and add it to the Knowledge Base, effectively creating runbooks for future reference. Next time you are faced with a similar situation, the AI Assistant will recall this information and use it to assist you.

In the following demo, the user asks the Assistant to remember the steps that have been followed to troubleshoot the root cause of an alert, and also to ping the Slack channel when this happens again. In a later conversation with the Assistant, the user asks what can be done about a similar problem, and the AI Assistant is able to remember the steps and also reminds the user to ping the Slack channel.

After receiving the alert, you can open the AI Assistant chat and test troubleshooting the alert. After investigating an alert, ask the AI Assistant to summarize the analysis and the steps taken to root cause. To remember them for the next time, we have a similar alert and add extra instruction like to warn the Slack channel.

The Assistant will use the built-in functions to summarize the steps and store them into your Knowledge Base, so they can be recalled in future conversations.

Open a new conversation, and ask what are the steps to take when troubleshooting a similar alert to the one we just investigated. The Assistant will be able to recall the information stored in the KB that is related to the specific alert, using semantic search based on ELSER, and provide a summary of the steps taken to troubleshoot it, including the last indication of informing the Slack channel.

Import your runbooks stored in GitHub to the Knowledge Base using APIs or our GitHub Connector

You can also add proprietary data into the Knowledge Base programmatically by ingesting it (e.g., GitHub Issues, Markdown files, Jira tickets, text files) into Elastic.

If your organization has created runbooks that are stored in Markdown documents in GitHub, follow the steps in the next section of this blog post to index the runbook documents into your Knowledge Base.

The steps to ingest documents into the Knowledge Base are the following:

Ingest your organization’s knowledge into Elasticsearch

Option 1: Use the Elastic web crawler . Use the web crawler to programmatically discover, extract, and index searchable content from websites and knowledge bases. When you ingest data with the web crawler, a search-optimized Elasticsearch® index is created to hold and sync webpage content.

Option 2: Use Elasticsearch's Index API . Watch tutorials that demonstrate how you can use the Elasticsearch language clients to ingest data from an application.

Option 3: Build your own connector. Follow the steps described in this blog: How to create customized connectors for Elasticsearch.

Option 4: Use Elasticsearch Workplace Search connectors . For example, the GitHub connector can automatically capture, sync, and index issues, Markdown files, pull requests, and repos.

Follow the steps to configure the GitHub Connector in GitHub to create an OAuth App from the GitHub platform.

Now you can connect a GitHub instance to your organization. Head to your organization’s Search > Workplace Search administrative dashboard, and locate the Sources tab.

Select GitHub (or GitHub Enterprise) in the Configured Sources list, and follow the GitHub authentication flow as presented. Upon the successful authentication flow, you will be redirected to Workplace Search and will be prompted to select the Organization you would like to synchronize.

After configuring the connector and selecting the organization, the content should be synchronized and you will be able to see it in Sources. If you don’t need to index all the available content, you can specify the indexing rules via the API. This will help shorten indexing times and limit the size of the index. See Customizing indexing.

The source has created an index in Elastic with the content (Issues, Markdown Files…) from your organization. You can find the index name by navigating to Stack Management > Index Management , activating the Include hidden Indices button on the right, and searching for “GitHub.”

You can explore the documents you have indexed by creating a Data View and exploring it in Discover. Go to Stack Management > Kibana > Data Views > Create data view and introduce the data view Name, Index pattern (make sure you activate “Allow hidden and system indices” in advanced options), and Timestamp field:

You can now explore the documents in Discover using the data view:

Reindex your internal runbooks into the AI Assistant’s Knowledge Base Index, using it's semantic search pipeline

Your Knowledge Base documents are stored in the indices .kibana-observability-ai-assistant-kb-*. To add your internal runbooks imported from GitHub to the KB, you just need to reindex the documents from the index you created in the previous step to the KB’s index. To add the semantic search capabilities to the documents in the KB, the reindex should also use the ELSER pipeline preconfigured for the KB, .kibana-observability-ai-assistant-kb-ingest-pipeline.

By creating a Data View with the KB index, you can explore the content in Discover.

You execute the query below in Management > Dev Tools , making sure to replace the following, both on “_source” and “inline”:

InternalDocsIndex : name of the index where your internal docs are stored
text_field : name of the field with the text of your internal docs
timestamp : name of the field of the timestamp in your internal docs
public : (true or false) if true, makes a document available to all users in the defined Kibana Space (if is defined) or in all spaces (if is not defined); if false, document will be restricted to the user indicated in
(optional) space : if defined, restricts the internal document to be available in a specific Kibana Space
(optional) user.name : if defined, restricts the internal document to be available for a specific user
(optional) "query" filter to index only certain docs (see below)

POST _reindex
{
    "source": {
        "index": "",
        "_source": [
            "",
            "",
            "namespace",
            "is_correction",
            "public",
            "confidence"
        ]
    },
    "dest": {
        "index": ".kibana-observability-ai-assistant-kb-000001",
        "pipeline": ".kibana-observability-ai-assistant-kb-ingest-pipeline"
    },
    "script": {
        "inline": "ctx._source.text=ctx._source.remove(\"\");ctx._source.namespace=\"\";ctx._source.is_correction=false;ctx._source.public=;ctx._source.confidence=\"high\";ctx._source['@timestamp']=ctx._source.remove(\"\");ctx._source['user.name'] = \"\""
    }
}

You may want to specify the type of documents that you reindex in the KB — for example, you may only want to reindex Markdown documents (like Runbooks). You can add a “query” filter to the documents in the source. In the case of GitHub, runbooks are identified with the “type” field containing the string “file,” and you could add that to the reindex query like indicated below. To add also GitHub Issues, you can also include in the query “type” field containing the string “issues”:

"source": {
        "index": "",
        "_source": [
            "",
            "",
            "namespace",
            "is_correction",
            "public",
            "confidence"
        ],
    "query": {
      "terms": {
        "type": ["file"]
      }
    }

Great! Now that the data is stored in your Knowledge Base, you can ask the Observability AI Assistant any questions about it:

Conclusion

In conclusion, leveraging internal Observability knowledge and adding it to the Elastic Knowledge Base can greatly enhance the capabilities of the AI Assistant. By manually inputting information or programmatically ingesting documents, SREs can create a central repository of knowledge accessible through the power of Elastic and LLMs. The AI Assistant can recall this information, assist with incidents, and provide tailored observability to specific contexts using Retrieval Augmented Generation. By following the steps outlined in this article, organizations can unlock the full potential of their Elastic AI Assistant.

Start enriching your Knowledge Base with the Elastic AI Assistant today and empower your SRE team with the tools they need to excel. Follow the steps outlined in this article and take your incident management and alert remediation processes to the next level. Your journey toward a more efficient and effective SRE operation begins now.

Better RCAs with multi-agent AI Architecture

Fri, 31 May 2024 00:00:00 GMT

What’s a multi agent architecture?

You might have heard the term Agent pop up recently in different open source projects or vendors focusing their go-to-market on GenAI. Indeed, while most GenAI applications are focused on RAG applications today, there is an increasing interest in isolating tasks that could be achieved with a more special model into what is called an Agent.

To be clear, an agent will be given a task, which could be a prompt, and execute the task by leveraging other models, data sources, and a knowledge base. Depending on the field of application, the results should ultimately look like generated text, pictures, charts, or sounds.

Now, what the multi-Agent Architecture, is the process of leveraging multiple agents around a given task by:

Orchestrating complex system oversight with multiple agents
Analyzing and strategizing in real-time with strategic reasoning
Specializing agents, tasks are decomposed into smaller focused tasks into expert-handled elements
Sharing insights for cohesive action plans, creating collaborative dynamics

In a nutshell, multi-agent architecture's superpower is tackling intricate challenges beyond human speed and solving complex problems. It enables a couple of things:

Scale the intelligence as the data and complexity grows. The tasks are decomposed into smaller work units, and the expert network grows accordingly.
Coordinate simultaneous actions across systems, scale collaboration
Evolving with data allows continuous adaptation with new data for cutting-edge decision-making.
Scalability, high performance, and resilience

Single Agent Vs Multi-Agent Architecture

Before double-clicking on the multi-agent architecture, let’s talk about the single-agent architecture. The single-agent architecture is designed for straightforward tasks and a late feedback loop from the end user. There are multiple single-agent frameworks such as ReAct (Reason+Act), RAISE (ReAct+ Short/Long term memory), Reflexion, AutoGPT+P, and LATS (Language Agent Tree Search). The general process these architectures enable is as follows:

The Agent takes action, observes, executes, and self-decides whether or not it looks complete, ends the process if finished, or resubmits the new results as an input action, the process keeps going.

While simple tasks are ok with this type of agent, such as a RAG application where a user will ask a question, and the agent returns an answer based on the LLM and a knowledge base, there are a couple of limitations:

Endless execution loop: the agent is never satisfied with the output and reiterates.
Hallucinations
Lack of feedback loop or enough data to build a feedback loop
Lack of planning

For these reasons, the need for a better self-evaluation loop, externalizing the observation phase, and division of labor is rising, creating the need for a multi-agent architecture.

Multi-agent architecture relies on taking a complex task, breaking it down into multiple smaller tasks, planning the resolution of these tasks, executing, evaluating, sharing insights, and delivering an outcome. For this, there is more than one agent; in fact, the minimum value for the network size N is N=2 with:

A Manager
An Expert

When N=2, the source task is simple enough only to need one expert agent as the task can not be broken down into multiple tasks. Now, when the task is more complex, this is what the architecture can look like:

With the help of an LLM, the Manager decomposes the tasks and delegates the resolutions to multiple agents. The above architecture is called Vertical since the agents directly send their results to the Manager. In a horizontal architecture, agents work and share insight together as groups, with a volunteer-based system to complete a task, they do not need a leader as shown below:

A very good paper covering these two architectures with more insights can be found here: https://arxiv.org/abs/2404.11584

Application Vertical Multi-Agent Architecture to Observability

Vertical Multi-Agent Architecture can have a manager, experts, and a communicator. This is particularly important when these architectures expose the task's result to an end user.

In the case of Observability, what we envision in this blog post is the scenario of an SRE running through a Root Cause Analysis (RCA) process. The high-level logic will look like this:

Communicator:
- Read the initial command from the Human
- Pass command to Manager
- Provide status updates to Human
- Provide a recommended resolution plan to the Human
- Relay follow-up commands from Human to Manager
Manager:
- Read the initial command from the Communicator 
- Create working group 
- Assign Experts to group 
- Evaluate signals and recommendations from Experts 
- Generate recommended resolution plan 
- Execute plan (optional)
Expert:
- Each expert task with singular expertise tied to Elastic integration 
- Use o11y AI Assistant to triage and troubleshoot data related to their expertise 
- Work with other Experts as needed to correlate issues 
- Provide recommended root cause analysis for their expertise (if applicable) 
- Provide recommended resolution plan for their expertise (if applicable)

We believe that breaking down the experts by integration provides enough granularity in the case of observability and allows them to focus on a specific data source. Doing this also gives the manager a breakdown key when receiving a complex incident involving multiple data layers (application, network, datastores, infrastructures).

For example, a complex task initiated by an alert in an e-commerce application could be “Revenue dropped by 30% in the last hour.” This task would be submitted to the manager, who will look at all services, applications, datastores, network components, and infrastructure involved and decompose these into investigation tasks. Each expert would investigate within their specific scope and provide observations to the manager. The manager will be responsible for correlating and providing observations on what caused the problem.

Core Architecture

In the above example, we have decided to deploy the architecture on the below software architecture:

The agent manager and expert agent are deployed on GCP or your favorite cloud provider
Most of the components are written in Python
A task management layer is necessary to queue the task to the expert
Expert agents are specifically deployed by integration/data source and converse with the Elastic AI Assistant deployed in Kibana.
The AI Assistant can access a real-time context to help the expert resolve their task.
Elasticsearch is used as the AI Assistant context and as the expert memory to build its experience.
The backend LLM here is GPT-4, now GTP-4o, running on Azure.

Agent Experience

Agent experience is built based on previous events stored in Elasticsearch, to which the expert can look semantically for similar events. When they find one, they get the execution path stored in memory to execute it.

The beauty of using the Elasticsearch Vector Database for this is the semantic query the agent will be able to execute against the memory and how the memory itself can be managed. Indeed, there is a notion of short—and long-term memory that could be very interesting in the case of observability, where some events often happen and probably worth to be stored in the short-term memory because they are questioned more often. Less queried but important events can be stored in a longer-term memory with more cost-effective hardware.

The other aspect of the Agent Experience is the semantic reranking feature with Elasticsearch. When the agent executes a task, reranking is used to surface the best outcome compared to past experience:

If you are looking for a working example of the above, check this blog post where 2 agents are working together with the Elastic Observability AI Assistant on an RCA:

Transforming Industries and the Critical Role of LLM Observability: How to use Elastic's LLM integrations in real-world scenarios

Thu, 08 May 2025 00:00:00 GMT

In today's tech-centric world, Large Language Models (LLMs) are transforming sectors from finance and healthcare to research. LLMs are starting to underpin products and services across the spectrum. Take for example recent advanced coding developments in Google's Gemini 2.5 which enable it to use its reasoning capabilities to create a video game by producing the executable code from a short prompt. Or new ways to interact with Amazon's Alexa - for example, you could send a picture of a live music schedule, and have Alexa add the details to your calendar. And let's not forget Microsoft's personalization of Copilot which remembers what you talk about, so it learns your likes and dislikes and details about your life; the name of your dog, that tricky project at work, what keeps you motivated to stick to your new workout routine.

Despite their widespread utility of LLMs, deploying these sophisticated tools in real-world scenarios poses distinct challenges, especially in managing their complex behaviors. For users such as Site Reliability Engineers (SREs), DevOps teams, and AI/ML engineers, ensuring reliability, performance, and compliance of these models introduces an additional layer of complexity. This is where the concept of LLM Observability becomes essential. It offers crucial insights into the performance of these models, ensuring that these advanced AI systems operate both effectively and ethically.

Why LLM Observability Matters and How Elastic Makes It Easy

LLMs are not just another piece of software; they are sophisticated systems capable of human-like capabilities such as text generation, comprehension, and even coding. But with great power comes greater need for oversight. The opaque nature of these models can obscure how decisions are made and content generated. This makes it even more critical to implement robust observability to monitor and troubleshoot issues such as hallucinations, inappropriate content, cost overruns, errors and performance degradation. By monitoring these models closely, we can safeguard against unexpected outcomes and maintain user trust.

Real-World Scenarios

Let's explore real-world scenarios where companies leverage LLM-powered applications to enhance productivity and user experience, and how Elastic's LLM observability solutions monitor critical aspects of these models.

1. Generative AI for Customer Support

Companies are increasingly leveraging LLMs and generative AI to enhance customer support, using platforms like Google Vertex AI for hosting these models efficiently. With the introduction of advanced AI models such as Google's Gemini, which is integrated into Vertex AI, businesses can deploy sophisticated chatbots that manage customer inquiries, from basic questions to complex issues, in real time. These AI systems understand and respond with natural language, offering instant support for issues such as product troubleshooting or managing orders thus reducing wait times. They also learn from each interaction to improve accuracy continuously. This boosts customer satisfaction and allows human agents to focus on complex tasks, enhancing overall efficiency. Other ways that AI tools can further empower customer care agents is with real-time analytics, sentiment detection, and conversation summarization.

To support use cases like the AI-powered customer support described above, Elastic recently launched LLM observability integrations including support for LLMs hosted on GCP Vertex AI. Customers who wish to monitor foundation models such as Gemini and Imagen hosted on Google Vertex AI can benefit from Elastic’s Vertex AI integration to get a deeper understanding of model behavior and performance, and ensure that the AI-driven tools are not only effective but also reliable. Customers get out-of-the-box experience ingesting a curated set of metrics from Vertex AI as well as a pre-configured dashboard.

By continuously tracking these metrics, customers can proactively manage their AI resources, optimize operations, and ultimately enhance the overall customer experience.

Let's look at some of the metrics you get from the Google Vertex AI integration which are helpful in the context of using generative AI for customer support.

Prediction Latency: Measures the time taken to complete predictions, critical for real-time customer interactions.
Error Rate: Tracks errors in predictions, which is vital for maintaining the accuracy and reliability of AI-driven customer support.
Prediction Count: Counts the number of predictions made, helping assess the scale of AI usage in customer interactions.
Model Usage: Tracks how frequently the AI models are accessed by both virtual assistants and customer support tools.
Total Invocations: Measures the total number of times the AI services are used, providing insights into user engagement and dependency on these tools.
CPU and Memory Utilization: By observing CPU and memory usage, users can optimize resource allocation, ensuring that the AI tools are running efficiently without overloading the system.

To learn more about how Elastic's Google Vertex AI integration can augment your LLM observability, have a quick read of this blog.

2. Transforming Healthcare with Generative AI

The healthcare industry is embracing generative AI to enhance patient interactions and streamline operational workflows. By leveraging platforms like Amazon Bedrock, healthcare organizations deploy advanced large language models (LLMs) to power tools that convert doctor-patient conversations into structured medical notes, reducing administrative overhead and allowing clinicians to prioritize diagnosis and treatment. These AI-driven solutions provide real-time insights, enabling informed decision-making and improving patient outcomes. Additionally, patient-facing applications powered by LLMs offer secure access to health records, empowering individuals to manage their care proactively.

Robust observability is essential to maintain the reliability and performance of these generative AI applications in healthcare. Elastic’s Amazon Bedrock integration equips providers with tools to monitor LLM behavior, capturing critical metrics like invocation latency, error rates, token usage and guardrail invocation. Pre-configured dashboards provide visibility into prompt and completion text, enabling teams to verify the accuracy of AI-generated outputs, such as medical notes, and detect issues like hallucinations.

Additionally, customers who configure Guardrails for Amazon Bedrock to filter harmful content like hate speech, personal insults, and other inappropriate topics, can use the Bedrock Integration to observe the prompts and responses that caused the guardrail to filter them out. This helps application developers take proactive actions to maintain a safe and positive user experience.

Some of the logs and metrics that can be helpful for customers using LLMs hosted on Amazon Bedrock are the following

Invocation Details: This Integration records the Invocation latency, count, throttles. These metrics are critical for ensuring that generative AI models respond quickly and accurately to patient queries or appointment scheduling tasks, maintaining a seamless user experience.
Error Rates: Tracking error rates ensures that AI tools, such as patient query assistants or appointment systems, consistently deliver accurate and reliable results. By identifying and addressing issues early, healthcare providers can maintain trust in AI systems and prevent disruptions in critical patient interactions.
Token Usage: In healthcare, tracking token usage helps identify resource-intensive queries, such as detailed patient record summaries or complex symptom analyses, ensuring efficient model operation. By monitoring token usage, healthcare providers can optimize costs for AI-powered tools while maintaining scalability to handle growing patient interactions.
Prompt and Completion Text: Capturing prompt and completion text allows healthcare providers to analyze how AI models respond to specific patient queries or administrative tasks, ensuring meaningful and contextually accurate interactions. This insight helps refine prompts to improve the AI's understanding and ensures that generated responses, such as appointment details or treatment explanations, meet the quality standards expected in healthcare.
Prompt and response where guardrails intervened: Being able to track requests and responses that were deemed inappropriate by guardrails helps healthcare providers monitor what information patients are asking for. With this information users can make continuous adjustments to the LLMs to ensure appropriate responses, balancing flexibility and rich communication on the one hand, and on the other, privacy protection, hallucination prevention, and harmful content filtering.

Amazon Bedrock Gaurdrails OOTB dashboard

To learn about the Amazon Bedrock Integration, read this blog. To dive deeper into how the integration can help with observability of Guardrails for Amazon Bedrock, take a look at this blog.

3. Enhancing Telco Efficiency with GenAI

The telecommunication industry can leverage services like Azure OpenAI to transform customer interactions, optimize operations, and enhance service delivery. By integrating advanced generative AI models, telcos can offer highly personalized and responsive customer experiences across multiple channels. AI-powered virtual assistants streamline customer support by automating routine queries and providing accurate, context-aware responses, reducing the workload on human agents and enabling them to focus on complex issues while improving efficiency and satisfaction. Additionally, AI-driven insights help telcos understand customer preferences, anticipate needs, and deliver tailored offerings that boost customer loyalty. Operationally, LLMs such as Azure OpenAI enhance internal processes by enabling smarter knowledge management and faster access to critical information.

Elastic's LLM observability integrations like the Azure OpenAI integration can provide visibility into AI performance and costs, empowering telecom providers to make data-driven decisions and enhance customer engagement. It can help optimize resource allocation by analyzing call patterns, predicting service demands, and identifying trends, enabling telcos to scale their AI operations efficiently while maintaining high service quality.

Some of the key metrics and logs that Azure OpenAI that can provide insights are:

Error Counts: It provides critical insights into failed requests and incomplete transactions, enabling telecom providers to proactively identify and resolve issues in AI-powered applications.
Prompt Input and Completion Text: This captures the input queries provided to AI systems and the corresponding AI-generated outputs. These fields allow telecom providers to analyze customer queries, monitor response quality, and refine AI training datasets to improve relevance and accuracy.
Response Latency: It measures the time taken by AI models to generate responses, ensuring that virtual assistants and automated systems deliver quick and efficient replies to customer queries.
Token Usage: It tracks the number of input and output tokens processed by the AI model, offering insights into resource consumption and cost efficiency. This data helps telecom providers monitor AI usage patterns, optimize configurations, and scale resources effectively
Content Filter Results: In Azure OpenAI, this plays a crucial role in handling sensitive inputs provided by customers, ensuring compliance, safety, and responsible AI usage. This feature identifies and flags potentially inappropriate or harmful queries and responses in real time, enabling telecom providers to address sensitive topics with care and accuracy.

The Azure OpenAI content filtering OOTB dashboard

You can learn more about Elastic's Azure OpenAI integration from these two blogs - Part 1 and Part 2.

4. OpenAI Integration for Generative AI Applications

As AI-powered solutions become integral to modern workflows, OpenAI's sophisticated models, including language models like GPT-4o and GPT-3.5 Turbo, image generation models like DALL·E, and audio processing models like Whisper, drive innovation across applications such as virtual assistants, content creation, and speech-to-text systems. With growing complexity and scale, ensuring these models perform reliably, remain cost-efficient, and adhere to ethical guidelines is paramount. Elastic's OpenAI integration provides a robust solution, offering deep visibility into model behaviour to support seamless and responsible AI deployments.

By tapping into the OpenAI Usage API, Elastic's integration delivers actionable insights through intuitive, pre-configured dashboards, enabling Site Reliability Engineers (SREs) and DevOps teams to monitor performance and optimize resource usage across OpenAI's diverse model portfolio. This unified observability approach empowers organizations to track critical metrics, identify inefficiencies, and maintain high-quality AI-driven experiences. The following key metrics from Elastic's OpenAI integration help organizations achieve effective oversight:

Request Latency: Measures the time taken for OpenAI models to process requests, ensuring responsive performance for real-time applications like chatbots or transcription services.
Invocation Rates: Tracks the frequency of API calls across models, providing insights into usage patterns and helping identify high-demand workloads.
Token Usage: Monitors input and output tokens (e.g., prompt, completion, cached tokens) to optimize costs and fine-tune prompts for efficient resource consumption.
Error Counts: Captures failed requests or incomplete transactions, enabling proactive issue resolution to maintain application reliability.
Image Generation Metrics: Tracks invocation rates and output dimensions for models like DALL·E, helping assess costs and usage trends in image-based applications.
Audio Transcription Metrics: Monitors invocation rates and transcribed seconds for audio models like Whisper, supporting cost optimization in speech-to-text workflows.

To learn more about Elastic's OpenAI integration, read this blog.

Actionable LLM Observability

Elastic's LLM observability integrations empower users to take proactive control of their AI operations through actionable insights and real-time alerts. For instance, by setting a predefined threshold for token count, Elastic can trigger automated alerts when usage exceeds this limit, notifying Site Reliability Engineers (SREs) or DevOps teams via email, Slack, or other preferred channels. This ensures prompt awareness of potential cost overruns or resource-intensive queries, enabling teams to adjust model configurations or scale resources swiftly to maintain operational efficiency.

In the example below, the rule is set to alert the user if token_count crosses a threshold of 500.

The alert is triggered when the token count exceeds the threshold as seen below

Another example is tracking invocation spikes, such as when the number of predictions or API calls surpasses a defined Service Level Objective (SLO). For example, if a Bedrock AI-hosted model experiences a sudden surge in invocations due to increased customer interactions, Elastic can alert teams to investigate potential anomalies or scale infrastructure accordingly. These proactive measures help maintain the reliability and cost-effectiveness of LLM-powered applications.

By providing pre-configured dashboards and customizable alerts, Elastic ensures that organizations can respond to critical events in real time, keeping their AI systems aligned with cost and performance goals as well as standards for content safety and reliability.

Conclusion

LLMs are transforming industries, but their complexity requires effective oversight observability to ensure their reliability and safe use. Elastic's LLM observability integrations provide a comprehensive solution, empowering businesses to monitor performance, manage resources, and address challenges like hallucinations and content safety. As LLMs become increasingly integral to various sectors, robust observability tools like those offered by Elastic ensure that these AI-driven innovations remain dependable, cost-effective, and aligned with ethical and safety standards.

Windows Event Log Monitoring with OpenTelemetry & Elastic Streams

Thu, 05 Feb 2026 00:00:00 GMT

For system administrators and SREs, Windows Event Logs are both a goldmine and a graveyard. They contain the critical data needed to diagnose the root cause of a server crash or a security breach, but they are often buried under gigabytes of noise. Traditionally, extracting value from these logs required brittle regex parsers, manual rule creation, and a significant amount of human intuition.

However, the landscape of log management is shifting. By combining the industry-standard ingestion of OpenTelemetry (OTel) with the AI-driven capabilities of Elastic Streams, we can change how we monitor Windows infrastructure. This approach isn't just moving data. We are also using Large Language Models (LLMs) to understand it.

The Challenge with Traditional Windows Logging

Windows generates a massive variety of logs: System, Security, Application, Setup, and Forwarded Events. Within those categories, you have thousands of Event IDs. Historically, getting this data into an observability platform involved installing proprietary agents and configuring complex pipelines to strip out the XML headers and format the messages.

Once the data was ingested, we can try to figure out what "bad" looked like. You had to know in advance that Event ID 7031 indicated a service crash, and then write a specific alert for it. If you missed a specific Event ID or if the format changed, your monitoring went dark.

Step 1: Ingestion via OpenTelemetry

The first step in modernizing this workflow is adopting OpenTelemetry. The OTel collector has matured significantly and now offers robust support for Windows environments. By installing the collector directly on Windows servers, you can configure receivers to tap into the event log subsystems.

The beauty of this approach is standardization. You aren't locked into a vendor-specific shipping agent. The OTel collector acts as a universal router, grabbing the logs and sending them to your observability backend in this case, the Elastic logs index designed to handle high-throughput streams.

The key thing to pay attention to in this configuration is how we add this transform statement:

transform/logs-streams:
  log_statements:
    - context: resource
      statements:
        - set(attributes["elasticsearch.index"], "logs")

This works with the vanilla opentelemetry collector and when the data arrives in Elastic, it tells Elastic to use the new wired streams feature which enables all the downstream AI features we discuss in later steps.

Checkout my example configuration here

Step 2: AI-Driven Partitioning

Once the data arrives, the next challenge is organization. Dumping all Windows logs into a single logs-* index is a recipe for slow queries and confusion. In the past, we split indices based on hardcoded fields. Now, we can use AI to "fingerprint" the data.

This process involves analyzing the incoming stream to identify patterns. The system looks at the structure and content of the logs to determine their origin. For example, it can distinguish between a Windows Security Audit log and a Service Control Manager log purely based on the data shape.

The result is automatic partitioning. The system creates separate, optimized "buckets" or streams for each data type. You get a clean separation of concerns, Security logs go to one stream, File Manager logs to another, without having to write a single conditional routing rule. This partitioning is crucial for performance and for the next phase of the process: analysis.

Step 3: Significant Events and LLM Analysis

Once your data is partitioned (e.g., into a dedicated Service Control Manager stream), you can apply GenAI models to analyze the semantic meaning of that stream.

In a traditional setup, the system sees text strings. In an AI-driven setup, the system understands context. When an LLM analyzes the Service Control Manager stream, it identifies what that system is responsible for. It knows that this specific component manages the starting and stopping of system services.

Because the model understands the purpose of the log stream, it can generate suggestions for what constitutes a "Significant Event." It doesn't need you to tell it to look for crashes; it knows that for a Service Manager, a crash is a critical failure.

From Passive Storage to Proactive Suggestions

The workflow effectively automates the creation of detection rules. The LLM scans the logs and generates a list of potential problems relevant to that specific dataset, such as:

Service Crashes: High severity anomalies where background processes terminate unexpectedly.
Startup/Boot Failures: Critical errors preventing the OS from reaching a stable state.
Permission Denials: Security-relevant events regarding service interactions.

It bubbles these up as suggested observations. You can review a list of potential issues, see the severity the AI has assigned to them (e.g., Critical, Warning), and with a single click, generate the query required to find those logs.

Conclusion

The combination of OpenTelemetry for standardized ingestion and AI-driven Streams for analysis turns the chaotic flood of Windows logs into a structured, actionable intelligence source. We are moving away from the era of "log everything, look at nothing" to an era where our tools understand our infrastructure as well as we do.

The barrier to effective monitoring is no longer technical complexity. Whether you are tracking security audits or debugging boot loops, leveraging LLMs to partition and analyze your streams is the new standard for observability.

Try Streams today