Baha AzarmiJeff Vestal

Better RCAs with multi- agent AI Architecture

Discover how specialized LLM agents collaborate to tackle complex tasks with unparalleled efficiency

Better RCAs with multi-agent AI Architecture

What’s a multi agent architecture?

You might have heard the term Agent pop up recently in different open source projects or vendors focusing their go-to-market on GenAI. Indeed, while most GenAI applications are focused on RAG applications today, there is an increasing interest in isolating tasks that could be achieved with a more special model into what is called an Agent.

To be clear, an agent will be given a task, which could be a prompt, and execute the task by leveraging other models, data sources, and a knowledge base. Depending on the field of application, the results should ultimately look like generated text, pictures, charts, or sounds.

Now, what the multi-Agent Architecture, is the process of leveraging multiple agents around a given task by:

  • Orchestrating complex system oversight with multiple agents
  • Analyzing and strategizing in real-time with strategic reasoning
  • Specializing agents, tasks are decomposed into smaller focused tasks into expert-handled elements
  • Sharing insights for cohesive action plans, creating collaborative dynamics

In a nutshell, multi-agent architecture's superpower is tackling intricate challenges beyond human speed and solving complex problems. It enables a couple of things:

  • Scale the intelligence as the data and complexity grows. The tasks are decomposed into smaller work units, and the expert network grows accordingly.
  • Coordinate simultaneous actions across systems, scale collaboration
  • Evolving with data allows continuous adaptation with new data for cutting-edge decision-making.
  • Scalability, high performance, and resilience

Single Agent Vs Multi-Agent Architecture

Before double-clicking on the multi-agent architecture, let’s talk about the single-agent architecture. The single-agent architecture is designed for straightforward tasks and a late feedback loop from the end user. There are multiple single-agent frameworks such as ReAct (Reason+Act), RAISE (ReAct+ Short/Long term memory), Reflexion, AutoGPT+P, and LATS (Language Agent Tree Search). The general process these architectures enable is as follows:

The Agent takes action, observes, executes, and self-decides whether or not it looks complete, ends the process if finished, or resubmits the new results as an input action, the process keeps going.

While simple tasks are ok with this type of agent, such as a RAG application where a user will ask a question, and the agent returns an answer based on the LLM and a knowledge base, there are a couple of limitations:

  • Endless execution loop: the agent is never satisfied with the output and reiterates.
  • Hallucinations
  • Lack of feedback loop or enough data to build a feedback loop
  • Lack of planning

For these reasons, the need for a better self-evaluation loop, externalizing the observation phase, and division of labor is rising, creating the need for a multi-agent architecture.

Multi-agent architecture relies on taking a complex task, breaking it down into multiple smaller tasks, planning the resolution of these tasks, executing, evaluating, sharing insights, and delivering an outcome. For this, there is more than one agent; in fact, the minimum value for the network size N is N=2 with:

  • A Manager
  • An Expert

When N=2, the source task is simple enough only to need one expert agent as the task can not be broken down into multiple tasks. Now, when the task is more complex, this is what the architecture can look like:

With the help of an LLM, the Manager decomposes the tasks and delegates the resolutions to multiple agents. The above architecture is called Vertical since the agents directly send their results to the Manager. In a horizontal architecture, agents work and share insight together as groups, with a volunteer-based system to complete a task, they do not need a leader as shown below:

A very good paper covering these two architectures with more insights can be found here: https://arxiv.org/abs/2404.11584

Application Vertical Multi-Agent Architecture to Observability

Vertical Multi-Agent Architecture can have a manager, experts, and a communicator. This is particularly important when these architectures expose the task's result to an end user.

In the case of Observability, what we envision in this blog post is the scenario of an SRE running through a Root Cause Analysis (RCA) process. The high-level logic will look like this:

  • Communicator:
    • Read the initial command from the Human
    • Pass command to Manager
    • Provide status updates to Human
    • Provide a recommended resolution plan to the Human
    • Relay follow-up commands from Human to Manager
  • Manager:
    • Read the initial command from the Communicator

    • Create working group

    • Assign Experts to group

    • Evaluate signals and recommendations from Experts

    • Generate recommended resolution plan

    • Execute plan (optional)
  • Expert:
    • Each expert task with singular expertise tied to Elastic integration

    • Use o11y AI Assistant to triage and troubleshoot data related to their expertise

    • Work with other Experts as needed to correlate issues

    • Provide recommended root cause analysis for their expertise (if applicable)

    • Provide recommended resolution plan for their expertise (if applicable)

We believe that breaking down the experts by integration provides enough granularity in the case of observability and allows them to focus on a specific data source. Doing this also gives the manager a breakdown key when receiving a complex incident involving multiple data layers (application, network, datastores, infrastructures).

For example, a complex task initiated by an alert in an e-commerce application could be “Revenue dropped by 30% in the last hour.” This task would be submitted to the manager, who will look at all services, applications, datastores, network components, and infrastructure involved and decompose these into investigation tasks. Each expert would investigate within their specific scope and provide observations to the manager. The manager will be responsible for correlating and providing observations on what caused the problem.

Core Architecture

In the above example, we have decided to deploy the architecture on the below software architecture:

  • The agent manager and expert agent are deployed on GCP or your favorite cloud provider
  • Most of the components are written in Python
  • A task management layer is necessary to queue the task to the expert
  • Expert agents are specifically deployed by integration/data source and converse with the Elastic AI Assistant deployed in Kibana.
  • The AI Assistant can access a real-time context to help the expert resolve their task.
  • Elasticsearch is used as the AI Assistant context and as the expert memory to build its experience.
  • The backend LLM here is GPT-4, now GTP-4o, running on Azure.

Agent Experience

Agent experience is built based on previous events stored in Elasticsearch, to which the expert can look semantically for similar events. When they find one, they get the execution path stored in memory to execute it.

The beauty of using the Elasticsearch Vector Database for this is the semantic query the agent will be able to execute against the memory and how the memory itself can be managed. Indeed, there is a notion of short—and long-term memory that could be very interesting in the case of observability, where some events often happen and probably worth to be stored in the short-term memory because they are questioned more often. Less queried but important events can be stored in a longer-term memory with more cost-effective hardware.

The other aspect of the Agent Experience is the semantic reranking feature with Elasticsearch. When the agent executes a task, reranking is used to surface the best outcome compared to past experience:

If you are looking for a working example of the above, check this blog post where 2 agents are working together with the Elastic Observability AI Assistant on an RCA: