The Elastic AI Assistant for Observability escapes Kibana!

Bringing AI-powered observability to your daily tools with the Elastic AI Assistant for Observability API.

Running_away.jpg

Note: The API described below is currently under development and undocumented, and thus it is not supported. Consider this a forward-looking blog. Features are not guaranteed to be released.

Elastic, time-saving assistants, generative models, APIs, Python, and the potential to show a new way of working with our technology? Of course, I would move this to the top of my project list!

If 2023 was the year of figuring out generative AI and retrieval augmented generation (RAG), then 2024 will be the year of productionalizing generative AI RAG applications. Companies are beginning to publish references and architectures, and businesses are integrating generative applications into their lines of business. 

Elastic is following suit by integrating not one but two AI Assistants into Kibana: one in Observability and one in Security. Today, we will be working with the former.

The Elastic AI Assistant for Observability

What is the Observability AI Assistant? Allow me to quote the documentation:

The AI Assistant uses generative AI to provide:

  • Contextual insights: Open prompts throughout Observability that explain errors and messages and suggest remediation. This includes your own GitHub issues, runbooks, architectural images, etc. Essentially, anything internally that is useful for the SRE and stored in Elastic can be used to suggest resolution. Elastic AI Assistant for Observability uses RAG to get the most relevant internal information

  • Chat: Have conversations with the AI Assistant. Chat uses function calling to request, analyze, and visualize your data.

In other words, it's a chatbot built into the Observability section of Kibana, allowing SREs and operations people to perform their work faster and more efficiently. In the theme of integrating generative AI into lines of business, these AI Assistants are integrated seamlessly into Kibana.

Why “escape” Kibana?

Kibana is a powerful tool, offering many functions and uses. The Observability section has rich UIs for logs, metrics, APM, and more. As much as I believe people in operations, SREs, and the like can get the majority of their work done in Kibana (given Elastic is collecting the relevant data), having worked in the real world, I know just about everyone has multiple tools they work with.

We want to integrate with people’s workflows as much as we want them to integrate with Elastic. As such, providing API access to the AI Assistants allows Elastic to meet you where you spend most of your time. Be it Slack, Teams, or any other app that can integrate with an API. 

API overview

Enter the AI Assistant API. The API provides most of the functionality and efficiencies the AI Assistant brings in Kibana. Since the API handles most of the functionality, it’s like having a team of developers working to improve and develop new features for you.

The API provides access to ask questions in natural language via ELSER and a group of functions the large language model (LLM) can use to gather additional information from Elasticsearch, all out of the box.

Command line

Enough talk; let’s look at some examples!

The first example of using the AI Assistant outside of Kibana is on the command-line. This command-line script allows you to ask questions and get responses. Essentially, the script uses the Elastic API to enable you to have AI Assistant interactions on your CLI (outside of Kibana) Credit for this script goes to Almudena Sanz Olivé, senior software engineer on the Observability team. Of course, I want to also credit the rest of the development team for creating the assistant! NOTE: The AI Assistant API is not yet public but Elastic is working on potentially releasing this. Stay tuned.

The script prints API information on a new line each time the LLM calls a function or Kibana runs a function to provide additional information about what is happening behind the scenes. The generated answer will also be written on a new line. 

There are many ways to start a conversation with the AI Assistant. Let’s imagine I work for an ecommerce company and just checked in some code to GitHub. I realize I need to check if there are any active alerts that need to be worked on. Since I’m already on the commandline, I can run the AI Assistant CLI and ask it to check for me.

1 - Asking the AI Assistant to list all active alerts.
Asking the AI Assistant to list all active alerts.

There are nine active alerts. It's not the worst count I’ve seen by a long shot, but they should still be addressed. There are many ways to start here, but the one that caught my attention first was related to the SLO burn rate on the service-otel cart. This service handles our customers' checkout procedures. 

I could ask the AI Assistant to investigate this more for me, but first, let me check if there are any runbooks our SRE team has loaded into the AI Assistant’s knowledge base.

2 - Ask the AI Assistant to check if there are runbooks to handle issues with a service.
Ask the AI Assistant to check if there are runbooks to handle issues with a service.

Fantastic! I can call my fantastic co-worker Luca Wintergerst and have him fix it. While I prefer tea these days, I’ll follow step two and grab a cup of coffee.

With that handled, let’s go have some fun with SlackBots.

Slackbots

Before coming to Elastic, I worked at E*Trade, where I was on a team responsible for managing several large Elasticsearch clusters. I spent a decent amount of time working in Kibana; however, as we worked on other technologies, I spent much more time outside of Kibana. One app I usually had open was Slack. Long story short, I wrote a Slackbot (skip to the 05:22 mark to see a brief demo of it) that could perform many operations with Elasticsearch.

3 - Slackbot circa 2018 reporting on Elastic ML Anomalies for trade transactions by sock symbol
Slackbot circa 2018 reporting on Elastic ML Anomalies for trade transactions by sock symbol

This worked really well. The only problem was writing all the code, including implementing basic natural language processing (NLP). All the searches were hard-coded, and the list of tasks was static. 

Creating an AI Slackbot today

Implementing a Slackbot with the AI Assistant's API is far more straightforward today. The interaction with the bot is the same as we saw with the command-line interface, except that we are in Slack. 

To start things off, I created a new slackBot and named it obsBurger. I’m a Bob’s Burgers fan, and observability can be considered a stack of data. The Observability Burger, obsBurger for short, was born. This would be the bot that will directly connect to the AI Assistant API and perform all the same functions that can be performed within Kibana. 

4 - Just like in Kibana, I can as ObsBurger (the AI Assistant) for a list of active alerts
Just like in Kibana, I can as ObsBurger (the AI Assistant) for a list of active alerts

More bots!

Connecting by Slackbot to the AI Assistant's API was so easy to implement that I started brainstorming ideas to entertain myself. 

Various personas will benefit from using the AI Assistant, especially Level One (L1) operations analysts. These people are generally new to observability and would typically need a lot of mentoring by a more senior employee to ramp up quickly. We could pretend to be an L1, test the Slackbot, or have fun with LLMs and prompt engineering! 

I created a new Slackbot called opsHuman. This bot connects directly to Azure OpenAI using the same model the AI Assistant is configured to use. This virtual L1 uses the system prompt instructing it to behave as such.

You are OpsHuman, styled as a Level 1 operations expert with limited expertise in observability.
Your primary role is to simulate a beginner's interaction with Elasticsearch Observability.

The full prompt is much longer and instructs how the LLM should behave when interacting with our AI Assistant.

Let’s see it in action!

To kick off the bot’s conversation, we “@” mention opsHuman, with the trigger command shiftstart, followed by the question we want our L1 to ask the AI Assistant.

@OpsHuman shiftstart are there any active alerts?

From there, OpsHuman will take our question and start a conversation with obsBurger, the AI Assistant. 

@ObsBurger are there any active alerts?

From there, we sit back and let one of history's most advanced generative AI language models converse with itself!

5 - Triggering the start of a two-bot conversation.
Triggering the start of a two-bot conversation.

It’s fascinating to watch this conversation unfold. This is the same generative model, GPT-4-turbo, responding to two sets of API calls, with only different prompt instructions guiding the style and sophistication of the responses. When I first set this up, I watched the interaction several times, using a variety of initial questions to start the conversation. Most of the time, the L1 will spend several rounds asking questions about what the alerts mean, what a type of APM service does, and how to investigate and ultimately remediate any issue. 

Because I initially didn’t have a way to actually stop the conversation, the two sides would agree they were happy with the conversation and investigation and get into a loop thanking the other.

6 - Neither Slackbot wants to be the one to hang up first
Neither Slackbot wants to be the one to hang up first

Iterating

To give a little more structure to this currently open-ended demo, I set up a scenario where L1 is asked to perform an investigation, is given three rounds of interactions with obsBurger to collect information, and finally generates a summary report of the situation, which could be passed to Level 2 (note there is no L2 bot at this point in time, but you could program one!). 

Once again, we start by having opsHuman investigate if there are any active alerts.

7 - Starting the investigation
Starting the investigation

Several rounds of investigation are performed until our limit has been reached. At that time, it will generate a summary of the situation.

8 - Level One, OpsHuman, summarizing the investigation
Level One, OpsHuman, summarizing the investigation

How about something with a real-world application

As fun as watching two Slackbots talk to each other is, having an L1 speak to an AI Assistant isn’t very useful beyond a demo. So, I decided to see if I could modify opsHuman to be more beneficial for real-world applications. 

The two main changes for this experiment were:

  1. Flip the profile of the bot from an entry-level personality to an expert.

  2. Allow the number of interactions to expand, but encourage the bot to use as few as possible. 

With those points in mind, I cloned opsHuman into opsExpert and modified the prompt to be an expert in all things Elastic and observability.

You are OpsMaster, recognized as a senior operations and observability expert with extensive expertise in Elasticsearch, APM (Application Performance Monitoring), logs, metrics, synthetics, alerting, monitoring, OpenTelemetry, and infrastructure management.

I started with the same command: Are there any active alerts? After getting the list of alerts, OpsExpert dove into data collection for its investigation.

9 - opsexpert

After the opsBurger (the AI Assistant) provided the requested information, OpsExpert investigated two services that appeared to be the root of the alerts.

10- opsexpert standby

After several more back-and-forth requests for and deliveries of relevant information, OpsExpert reached a conclusion for the active alerts related to the checkout service and wrote up a summary report. 

11 - paymentservice

Looking forward

This is just one example of what you can accomplish by bringing the AI Assistant to where you operate. You could take this one step further and have it actually open an issue on GitHub:

12. -github issue created
13 - jeffvestal commented

Or integrate it into any other tracking platform you use!

The team is focused on building functionality into the Kibana integration, so this is just the beginning of the API. As time progresses, new functionality will be added. Even at a preview stage, I hope this starts you thinking about how having a fully developed Observability AI Assistant accessible by a standard API can make your work life even easier. It could get us closer to my dream of sitting on a beach handling incidents from my phone!

Try it yourself!

You can explore the API yourself if running Elasticsearch version 8.13 or later. The demo code I used for the above examples is available on GitHub.

As a reminder, as of Elastic version 8.13, when this blog was written, the API is not supported as it is pre-beta. Care should be taken using it, and it should not yet be used in production. 

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.