Elastic Observability Labs - Articles by Agi K Thomas

Observability for Amazon MQ with Elastic: Demystifying Messaging Flows with Real-Time Insights

Fri, 02 May 2025 00:00:00 GMT

Observability for Amazon MQ with Elastic: Demystifying Messaging Flows with Real-Time Insights

Managing the Hidden Complexity of Message-Driven Architectures

Amazon MQ is a managed message broker service for Apache ActiveMQ Classic and RabbitMQ that manages the setup, operation, and maintenance of message brokers. Messaging systems like RabbitMQ, managed by Amazon MQ, are pivotal in modern decoupled, event-driven applications. By serving as an intermediary between services, RabbitMQ facilitates asynchronous communication through message queuing, routing, and reliable delivery, making it an ideal fit for microservices, real-time pipelines, and event-driven architectures. However, this flexibility introduces operational challenges, such as retries, processing delays, consumer failures, and queue backlogs, which can gradually impact downstream performance and system reliability.

With Elastic’s Amazon MQ integration, users gain deep visibility into message flow patterns, queue performance, and consumer health. This integration allows for the proactive detection of bottlenecks, helps optimize system behaviour, and ensures reliable message delivery at scale.

In this blog, we'll dive into the operational challenges of RabbitMQ in modern architectures, while also examining the common gaps and strategies for overcoming them.

Why Observability for RabbitMQ on Amazon MQ Matters?

RabbitMQ brokers are integral to distributed systems, handling tasks ranging from order processing to payment workflows and notification delivery. Any disruption can cascade into significant downstream issues. Observability into RabbitMQ helps answer critical operational questions like:

Is CPU and memory utilization increasing over time?
What are the trends in the message publish rate, message confirmation rate?
Are consumers failing to acknowledge messages?
Which queues are experiencing abnormal growth?
Are there an increasing number of messages being dead-lettered over time?

Enhanced Observability with Amazon MQ Integration

Elastic provides a dedicated Amazon MQ integration for RabbitMQ that utilizes Amazon CloudWatch metrics and logs to deliver comprehensive observability data. This integration enables the ingestion of metrics related to connections, nodes, queues, exchanges, and system logs.

By deploying Elastic Agent with this integration, the users can monitor:

Queue performance and Dead-letter queue (DLQ) metrics include total message count (MessageCount.max), messages ready for delivery (MessageReadyCount.max), and unacknowledged messages (MessageUnacknowledgedCount.max). MessageCount.max metric tracks the total number of messages in a queue, including those that have been dead-lettered, and monitoring this over time can help identify trends in message accumulation, which may suggest issues leading to dead-lettering.
Consumer behaviour through metrics like consumer count (ConsumerCount.max) and acknowledgement rate (AckRate.max), which help identify underperforming consumers or potential backlogs.
Messaging throughput by tracking publish (PublishRate.max), confirm (ConfirmRate.max), and acknowledgement rates in real time. These are crucial for understanding application messaging patterns and flow.
Broker and node-level health, including memory usage (RabbitMQMemUsed.max), CPU utilization (SystemCpuUtilization.max), disk availability (RabbitMQDiskFree.min), and file descriptor usage (RabbitMQFdUsed.max). These indicators are essential for diagnosing resource saturation and avoiding service disruption.

Integrating Amazon MQ Metrics into Elastic Observability

Elastic's Amazon MQ integration facilitates the ingestion of CloudWatch metrics and logs into Elastic Observability, delivering near real-time insights into RabbitMQ. The prebuilt Amazon MQ dashboard visualizes this data, providing a centralized view of broker health, messaging activity, and resource usage, helping users quickly detect and resolve issues. Elastic's alerting for Observability enables proactive notifications based on custom conditions, while its SLO capabilities allow users to define and track key performance targets, strengthening system reliability and service commitments.

Elastic brings together logs and metrics from Amazon MQ alongside data from a wide range of other services and applications, whether running in AWS, on-premises, or across multi-cloud environments, offering unified observability from a single platform.

Prerequisites

To follow along, ensure you have:

An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.16.5 or higher. Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.

Architecture

Tracing Audit Flows from RabbitMQ to AWS Lambda

Consider a financial audit trail use case, where every user action, such as a funds transfer, is published to RabbitMQ. A Python-based AWS Lambda function consumes these messages, deduplicates them using the id field, and logs structured audit events for downstream analysis.

Sample payload sent through RabbitMQ:

{
  "id": "txn-849302",
  "type": "audit",
  "payload": {
    "user_id": "u-10245",
    "event": "funds.transfer",
    "amount": 1200.75,
    "currency": "USD",
    "timestamp": "T14:20:15Z",
    "ip": "192.168.0.8",
    "location": "New York, USA"
  }
}

You can now correlate message publishing activity from RabbitMQ with AWS Lambda invocation logs, track processing latency, and configure alerts for conditions like drops in consumer throughput or an unexpected surge in RabbitMQ queue depth.

AWS Lambda Function: Processing RabbitMQ Messages

This Python-based AWS Lambda function processes audit events received from RabbitMQ. It deduplicates messages based on the id field and logs structured event data for downstream analysis or compliance. Save the code below in a file named app.py.

import json
import logging
import base64
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# In-memory set to track processed message IDs for deduplication
processed_ids = set()
def lambda_handler(event, context):
    logger.info("Lambda triggered by RabbitMQ event")
    if 'rmqMessagesByQueue' not in event:
        logger.warning("Invalid event: missing 'rmqMessagesByQueue'")
        return {'statusCode': 400, 'body': 'Invalid RabbitMQ event'}
    for queue_name, messages in event['rmqMessagesByQueue'].items():
        logger.info(f"Processing queue: {queue_name}, Messages count: {len(messages)}")
        for msg in messages:
            try:
                raw_data = msg['data']
                decoded_json = base64.b64decode(raw_data).decode('utf-8')
                message = json.loads(decoded_json)
                logger.info(f"Decoded message: {json.dumps(message)}")
                message_id = message.get('id')
                if not message_id:
                    logger.warning("Message missing 'id', skipping.")
                    continue
                if message_id in processed_ids:
                    logger.warning(f"Duplicate message detected: {message_id}")
                    continue
                payload = message.get('payload', {})
                logger.info(f"Processing message ID: {message_id}")
                logger.info(f"Event Type: {message.get('type')}")
                logger.info(f"User ID: {payload.get('user_id')}")
                logger.info(f"Event: {payload.get('event')}")
                logger.info(f"Amount: {payload.get('amount')} {payload.get('currency')}")
                logger.info(f"Timestamp: {payload.get('timestamp')}")
                logger.info(f"IP Address: {payload.get('ip')}")
                logger.info(f"Location: {payload.get('location')}")
                processed_ids.add(message_id)
            except Exception as e:
                logger.error(f"Error processing message: {str(e)}")
    return {'statusCode': 200, 'body': 'Messages processed successfully'}

Setting up AWS Secrets Manager

To securely store and manage your RabbitMQ credentials, use AWS Secrets Manager.

Create a New Secret:
- Navigate to the AWS Secrets Manager console.
- Choose Store a new secret.
- Select Other type of secret.
- Enter the following key-value pairs:
  - username: Your RabbitMQ username
  - password: Your RabbitMQ password
Configure the Secret:
- Provide a meaningful name, such as RabbitMQAccess.
- Optionally, add tags and set rotation if needed.
Store the Secret:
- Review the settings and store the secret. Note the ARN of the secret you have created.

Setting up Amazon MQ for RabbitMQ

To get started with RabbitMQ on Amazon MQ, follow these steps to set up your broker.

Open the Amazon MQ console.
Create a new broker with the RabbitMQ engine.
Choose your preferred deployment option—single-instance or clustered
Use the same username and password that you previously stored in AWS Secrets Manager.
Under Additional settings, enable CloudWatch Logs for observability.
Configure access and security settings, ensuring that the broker is accessible to your AWS Lambda function.

After the broker is created, note the following important details:
- ARN of the RabbitMQ broker.
- RabbitMQ web console URL.
You’ll need the RabbitMQ log group ARN to set up Elastic’s Amazon MQ integration for RabbitMQ. Follow these steps to locate it:
- Go to the General – Enabled Logs section of the broker.
- Copy the CloudWatch log group ARN.

Create a RabbitMQ Queue

Now that the RabbitMQ broker is configured, use the management console to create a queue where messages will be published.

Access the RabbitMQ management console using the web console URL.
Create a new queue (example: myQueue) to receive messages.

Build and deploy the AWS Lambda function

In this section, we'll set up the Lambda function using AWS SAM, add the message processing logic, and deploy it to AWS. This Lambda function will be responsible for consuming messages from RabbitMQ and logging audit events.

Before continuing, make sure you have completed the following prerequisites.

Next, follow the steps outlined below to continue with the setup.

In your command line, run the command sam init from a directory of your choice.
The AWS SAM CLI will walk you through the setup.
- Select AWS Quick Start Templates.
- Choose the Hello World Example
- Use the Python runtime and zip package type.
- Proceed with the default options.
- Name your application as sample-rabbitmq-app.
- The AWS SAM CLI downloads your starting template and creates the application project directory structure.

From your command line, move to the newly created sample-rabbitmq-app directory.

Replace the content of the hello_world/app.py file with the lambda function code for rabbitmq message processing.

In the template.yaml file, use the values mentioned below to update the file content.

Resources: SampleRabbitMQApp:   Type: AWS::Serverless::Function   Properties:     CodeUri: hello_world/     Description: A starter AWS Lambda function.     MemorySize: 128     Timeout: 3     Handler: app.lambda_handler     Runtime: python3.10     PackageType: Zip     Policies:       - Statement:           - Effect: Allow             Resource: '*'             Action:               - mq:DescribeBroker               - secretsmanager:GetSecretValue               - ec2:CreateNetworkInterface               - ec2:DescribeNetworkInterfaces               - ec2:DescribeVpcs               - ec2:DeleteNetworkInterface               - ec2:DescribeSubnets               - ec2:DescribeSecurityGroups     Events:       MQEvent:         Type: MQ         Properties:           Broker:            Queues:             - myQueue           SourceAccessConfigurations:             - Type: BASIC_AUTH               URI:

Run the command sam deploy --guided and wait for the confirmation message. This deploys all of the resources.

Sending Audit Events to RabbitMQ and Triggering Lambda

To test the end-to-end setup, simulate the flow by publishing audit event data into RabbitMQ using its web UI. Once the message is sent, it triggers the Lambda function.

Navigate to the Amazon MQ console and select your newly created broker.
Locate and open the Rabbit web console URL
Under the Queues and Streams tab, select the target queue (example: myQueue).

Enter the message payload, and click Publish message to send it to the queue.
Here’s a sample payload published via RabbitMQ:

{
  "id": "txn-849302",
  "type": "audit",
  "payload": {
    "user_id": "u-10245",
    "event": "funds.transfer",
    "amount": 1200.75,
    "currency": "USD",
    "timestamp": "T14:20:15Z",
    "ip": "192.168.0.8",
    "location": "New York, USA"
  }
}

Navigate to the AWS Lambda function created earlier.
Under the Monitor tab, click View CloudWatch logs.
Check the latest log stream to confirm that the Lambda was triggered by Amazon MQ and that the message was processed successfully.

Configuring Amazon MQ integration for Metrics and Logs collection

Elastic’s Amazon MQ integration simplifies the collection of logs and metrics from RabbitMQ brokers managed by Amazon MQ. Logs are ingested via Amazon CloudWatch Logs, while metrics are fetched from the specified AWS region at a defined interval.

Elastic provides a default configuration for metrics collection. You can accept these defaults or adjust settings such as the Collection Period to better fit your needs.

To enable the collection of logs:

Navigate to the Amazon MQ console and select the newly created broker.
Click the Logs hyperlink under the General – Enabled Logs section to open the detailed log settings page.
From this page, copy the CloudWatch log group ARN.
In Elastic, set up the Amazon MQ integration and paste the CloudWatch log group ARN.
Accept Defaults or Customize Settings – Elastic provides a default configuration for logs collection. You can accept these defaults or adjust settings such as collection intervals to better fit your needs.

Visualizing RabbitMQ Workloads with the Pre-Built Amazon MQ Dashboard

You can access the RabbitMQ dashboard by:

Navigate to the Dashboard Menu – Select the Dashboard menu option in Elastic and search for [Amazon MQ] RabbitMQ Overview to open the dashboard.
Navigate to the Integrations Menu – Open the Integrations menu in Elastic, select Amazon MQ, go to the Assets tab, and choose [Amazon MQ] RabbitMQ Overview from the dashboard assets

The Amazon MQ RabbitMQ dashboard in the Elastic integration delivers a comprehensive overview of broker health and messaging activity. It provides real-time insights into broker resource utilization, queue and topic performance, connection trends, and messaging throughput. The dashboard helps users track system behaviour, detect performance bottlenecks, and ensure reliable message delivery across distributed applications.

Broker Metrics

This section provides a centralised view of the overall health and performance of the RabbitMQ broker on Amazon MQ. The visualizations highlights the number of configured exchanges and queues, active broker connections, producers, consumers, and total messages in flight. System-level metrics such as CPU utilization, memory consumption, and free disk space help assess whether the broker has sufficient resources to handle current workloads.

Message flow metrics such as publish rate, confirmation rate, and acknowledgement rate are displayed to provide visibility into how messages are processed through the broker. Monitoring trends in these values helps detect message delivery issues, throughput degradation, or potential saturation of the broker under load.

Node Metrics

Node-level visibility helps identify resource imbalances across nodes in clustered RabbitMQ setups. This section includes per-node CPU usage, memory consumption, and available disk space, offering insight into the underlying infrastructure's ability to support broker operations.

Queue Metrics

Queue-specific insights are critical for understanding message delivery patterns and backlog conditions. This section details total messages, ready messages, and unacknowledged messages, segmented by broker, virtual host, and queue.

By observing how these counts change over time, users can identify slow consumers, message build-ups, or delivery issues that may affect application performance or lead to dropped messages under pressure.

Logs

This section displays log level, process ID, and raw message content. These logs provide immediate visibility into events such as connection failures, resource thresholds being hit, or unexpected queue behaviors.

Detecting Queue Backlogs with Alerting Rules

Elastic’s alert framework allows you to define rules that monitor critical RabbitMQ metrics and automatically trigger actions when specific thresholds are breached.

Alert: Queue Backlog (Message Ready or Unacknowledged Messages)

This alert helps detect queue backlog in Amazon MQ by evaluating two metrics

MessageUnacknowledgedCount.max and
MessageReadyCount.max.

The alert is triggered if either condition persists for more than 10 minutes:

MessageUnacknowledgedCount.max exceeds 5,000
MessageReadyCount.max exceeds 7,000

These thresholds should be adjusted based on typical message volume and consumer throughput. Sustained high values can indicate that consumers are not keeping up or message delivery pipelines are congested, potentially causing delays or dropped messages. Sustained high values may result in processing delays or dropped messages if not addressed.

Tracking Resource Utilization to Maintain RabbitMQ Performance

Elastic’s Service-level objectives (SLOs) capabilities allow you to define and monitor performance targets using key indicators like latency, availability, and error rates. Once configured, Elastic continuously evaluates these SLOs in real time, offering intuitive dashboards, alerts for threshold violations, and insights into error budget consumption. This enables teams to stay ahead of issues, ensuring service reliability and consistent performance.

SLO: Node Resource Health (CPU, Memory, Disk)

This SLO focuses on ensuring RabbitMQ brokers and nodes have sufficient resources to process messages without performance degradation. It tracks CPU, memory, and disk usage across RabbitMQ brokers and nodes to prevent resource exhaustion that could lead to service interruptions.

Target thresholds:

SystemCpuUtilization.max remains below 85% for 99% of the time.
RabbitMQMemUsed.max remains below 80% of RabbitMQMemLimit.max for 99% of the time.
RabbitMQDiskFree.min remains above 25% of RabbitMQDiskFreeLimit.max for 99% of the time.

Sustained high values in CPU or memory usage can signal resource contention, which may result in slower message processing or downtime. Low disk availability may cause the broker to stop accepting messages, risking message loss. These thresholds are designed to catch early signs of resource saturation and ensure smooth, uninterrupted message flow across RabbitMQ deployments.

Conclusion

As RabbitMQ-based messaging architectures scale and become more complex, the need for in-depth visibility into system performance and potential issues deepens. Elastic’s Amazon MQ integration brings that visibility front and center—helping you go beyond basic health checks to understand real-time messaging throughput, queue backlog trends, and resource saturation across your brokers and consumers.

By leveraging the prebuilt dashboards, configuring alerts and SLOs, you can proactively detect anomalies, fine-tune consumer performance, and ensure reliable delivery across your event-driven applications.

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Mon, 01 Dec 2025 00:00:00 GMT

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Introduction

We're excited to introduce Elastic Observability’s Amazon Bedrock AgentCore integration, which allows users to observe Amazon Bedrock AgentCore and the agents' LLM interactions end-to-end. Agentic AI represents a fundamental shift in how we build applications.

Unlike standard LLM chatbots that simply generate text, agents can reason, plan, and execute multi-step workflows to complete complex tasks autonomously. Many times these agents are running on a platform such as Amazon Bedrock AgentCore, which helps developers build, deploy and scale agents. Amazon Bedrock AgentCore is Amazon Bedrock's platform providing the secure, scalable, and modular infrastructure services (like agent runtime, memory, and identity) necessary for developers to deploy and operate highly capable AI agents built with any framework or model.

Using a platform, such as Amazon Bedrock Agentcore, is easy, but troubleshooting an agent is far more complex than debugging a standard microservice. Key challenges include:

Non-Deterministic Behavior: Agents may choose different tools or reasoning paths for the same prompt, making it difficult to reproduce bugs.
"Black Box" Execution: When an agent fails or provides a hallucinated answer, it is often unclear if the issue lies in the LLM's reasoning, the context provided, or a failed tool execution.
Cost & Latency Blind Spots: A single user query can trigger recursive loops or expensive multi-step tool calls, leading to unexpected spikes in token usage and latency.

To effectively observe these systems, you need to correlate signals from two distinct layers:

The Platform Layer (Amazon Bedrock AgentCore): You need to understand the overall health of the managed service. This includes high-level metrics like invocation counts, latency, throttling, and platform-level errors that affect all agents running in AgentCore.
The Application Layer (Your Agentic Logic): You want to understand the granular "why" behind the behavior. This includes distributed traces, usually with OpenTelemetry, that visualize the full request lifecycle (e.g. waterfall view), identifying exactly which step in the reasoning chain failed or took too long.

Agentic AI Observability in Elastic provides a unified, end-to-end view of your agentic deployment by combining platform-level insights from Amazon Bedrock AgentCore, through the new Amazon Bedrock AgentCore integration, with deep application-level visibility from OpenTelemetry (OTel) traces, logs and metrics form the agent. This unified view in Elastic allows you to observe, troubleshoot, and optimize your agentic applications from end to end without switching tools. Additionally, Elastic provides Agent Builder which allows you to create agents to analyze any of the data from Amazon Bedrock AgentCore and the agents running on it.

Agentic AI Observability in Elastic

As mentioned above there are two main parts to end-to-end Agentic AI Observability in Elastic.

Amazon Bedrock AgentCore Platform Observability - using platform logs and metrics, Elastic provides comprehensive visibility into the high-level health of the AgentCore service by ingesting AWS vended logs and metrics across four critical components:
- Runtime: Monitor core performance indicators such as agent errors, overall latency, throttle counts, and invocation rates, for each endpoint.
- Gateway: specific insights into gateway and tool call performance, including invocations, error rates, and latency.
- Memory: Track short-term and long-term memory operations, including event creation, retrieval, and listing, alongside performance analysis, errors, and latency metrics.
- Identity: Audit security and access health with logs on successful and failed access attempts.

Agent Observability with APM, logs and metrics - To understand how your agent is behaving, Elastic ingests OTel-native traces, metrics and logs from your application running within AgentCore. This allows you to visualize the full execution path, including LLM reasoning steps and tool calls, in a detailed waterfall diagram.

Agentic AI Analysis - All of the data from Amazon Bedrock AgentCore and the agent running on it, can be analyzed with Elastic’s AI driven capabilities. These include:

Elastic AgentCore SRE Agent built on Elastic Agent Builder - We don't just monitor agents; we provide you with one to assist your team. The AgentCore SRE Agent is a specialized assistant built using Elastic Agent Builder. It possesses specialized knowledge of AgentCore applications observed in Elastic.
- How it helps: You can ask specific questions regarding your AgentCore environment, such as how to interpret a complex error log or why a specific trace shows latency.
- Get the Agent: You can deploy this agent yourself from our GitHub repository.
Elastic Observability AI Assistant - Use natural language anywhere in Elastic’s UI to help you pinpoint issues, analyze something specific, or just learn what the problem is through LLM knowledge base. Additionally, SREs can interpret log messages, errors, metrics patterns, optimize code, write reports, and even identify and execute a runbook, or find a related github issue.
Streams - AI-Driven Log Analysis - When you send AgentCore logs from your instrumented application into Elastic, you can parse and analyze them. Additionally, Streams finds Significant Events within your log stream allowing you to focus immediately on what matters most.
Dashboards and ES|QL Data is only useful if you can act on it. Elastic provides out-of-the-box (OOTB) assets to accelerate your mean time to resolution (MTTR). And Elastic provides ES|QL to help you perform ad-hoc analysis on any signal
- OOTB Dashboards: Pre-built visualizations based on AgentCore service signals. These dashboards provide an immediate, high-level overview of the usage, health, and performance of your AgentCore runtime, gateway, memory, and identity components.
- OOTB Alert Templates: Pre-configured alerts for common agentic issues (e.g., high error rates, latency spikes, or unusual token consumption), allowing you to move from reactive to proactive troubleshooting immediately.

Onboarding Amazon Bedrock AgentCore signals into Elastic

Amazon Bedrock AgentCore Integration

To get started with platform-level visibility, you need to enable the Amazon Bedrock AgentCore integration in Elastic. This integration automatically collects metrics and logs from your AgentCore runtime, gateway, memory, and identity components via Amazon CloudWatch.

Setup Steps:

Prepare AWS Environment: Ensure your AgentCore agents are deployed and running and that you have enabled logging on your AgentCore resources in the AWS console.
Add the Integration:
- In Elastic (Kibana), navigate to Integrations.
- Search for "Amazon Bedrock AgentCore". Select Add Amazon Bedrock AgentCore.
Configure & Deploy:

Configure Elastic's Amazon Bedrock AgentCore integration to collect CloudWatch metrics from your chosen AWS region at the specified collection interval. Logs will be added soon after the publication of this blog.

Onboard the Agent with OTel Instrumentation

The next step is observing the application logic itself. The beauty of Amazon Bedrock AgentCore is that the application runtime often comes pre-instrumented. You simply need to tell it where to send the telemetry data.

For this example, we will use the Travel Assistant from the Elastic Observability examples.

To instrument this agent, you do not need to modify the source code. Instead, when you invoke the agent using the agentcore CLI, you simply pass your Elastic connection details as environment variables. This redirects the OTel signals (traces, metrics, and logs) directly to the Elastic EDOT collector.

Example Invoke Command: Run the following command to launch the agent and start streaming telemetry to Elastic:

    agentcore launch \
    --env BEDROCK_MODEL_ID="us.anthropic.claude-3-5-sonnet-20240620-v1:0" \
    --env OTEL_EXPORTER_OTLP_ENDPOINT="https://.region.cloud.elastic.co:443" \
    --env OTEL_EXPORTER_OTLP_HEADERS="Authorization=ApiKey " \
    --env OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" \
    --env OTEL_METRICS_EXPORTER="otlp" \
    --env OTEL_TRACES_EXPORTER="otlp" \
    --env OTEL_LOGS_EXPORTER="otlp" \
    --env OTEL_RESOURCE_ATTRIBUTES="service.name=travel_assistant,service.version=1.0.0" \
    --env AGENT_OBSERVABILITY_ENABLED="true" \
    --env DISABLE_ADOT_OBSERVABILITY="true" \
    --env TAVILY_API_KEY=""

Key Configuration Parameters:

OTEL_EXPORTER_OTLP_ENDPOINT: Your Elastic OTLP endpoint (ensure port 443 is specified).
OTEL_EXPORTER_OTLP_HEADERS: The Authorization header containing your Elastic API Key.
DISABLE_ADOT_OBSERVABILITY=true: This ensures the native AgentCore signals are routed exclusively to your defined endpoint (Elastic) rather than default AWS paths.

Analyzing Agentic Data in Elastic Observability

As we walk through the analysis features below, we will use the Travel Assistant agent which we instrumented earlier as well as any other apps you may be running on AgentCore. For the purposes of this example, as a second agent, we will use Customer Support Assistant from the AWS Labs AgentCore samples

Out-of-the-Box (OOTB) Dashboards

Elastic populates a set of comprehensive dashboards based on Amazon Bedrock AgentCore service logs and metrics. These appear as a unified view with tabs, providing a "single pane of glass" into the operational health of your platform.

This view is divided into four key zones, each addressing specific components of AgentCore - Runtime, Gateway, Memory, Identity. Note that note all agentic applications use all 4 components. In our example only the Customer Assistant uses all four components, whereas the Travel agent uses only Runtime.

Runtime Health

Visualize agent invocations, session metrics, error trends (system vs. user), and performance stats like latency and throttling, split per endpoint. This dashboard helps you answer questions like

"How are my Travel Assistant agent and Customer Support agent performing in terms of overall traffic and latency, and are there any spikes in errors or throttling?"

Gateway Performance

Analyze invocations across Lambda and MCP (Model Context Protocol), with detailed breakdowns for tool vs. non-tool calls. The dashboard highlights throttling detection, target execution times, and separates system errors from user errors.

Question answered: "Are my external integrations (Lambda, MCP) performing efficiently, or are specific tool calls experiencing high latency, throttling, or system-level errors?"

Memory Operations

Track core operations like event creation, retrieval, and listing, alongside deep dives into long-term memory processing. This includes extraction and consolidation metrics broken down by strategy type, as well as specific monitoring for throttling and system vs. user errors.

Question answered: "Are failures in memory consolidation strategies or high retrieval latency preventing the agent from effectively recalling user context?"

Identity & Access

Monitor identity token fetch operations (workload, OAuth, API keys) and real-time authentication success/failure rates. The dashboard breaks down activity by provider and highlights throttling or capacity bottlenecks.

Question answered: "Are authentication failures or token fetch bottlenecks from specific providers preventing agents from accessing required resources?"

Out-of-the-Box (OOTB) Alert Templates

Observability isn't just about looking at dashboards; it's about knowing when to act. To move from reactive checking to proactive monitoring, Elastic provides OOTB Alert Rule Templates (starting with Elastic version 9.2.1).

These templates eliminate guesswork by pre-selecting the optimal metrics to monitor and applying sensible thresholds. This configuration focuses on high-fidelity alerts for genuine anomalies, helping you catch critical issues early while minimizing alert fatigue.

Suggested OOTB Alerts:

Agent Runtime System Errors: Detects server-side errors (500 Internal Server Error) during agent runtime invocations, indicating infrastructure or service issues with AWS Bedrock AgentCore.
Agent Runtime User Errors: Flags client-side errors (4xx) during agent runtime invocations, including validation failures (400), resource not found (404), access denied (403), and resource conflicts (409). This helps catch misconfigured permissions, invalid input, or missing resources early.
Agent Runtime High Latency: Triggers when the average latency for agent runtime invocations exceeds 10 seconds (10,000ms). Latency measures the time elapsed between receiving a request and sending the final response token.

APM Tracing

While logs and metrics tell you that an issue exists, APM Tracing tells you exactly where and why it is happening. By ingesting the OpenTelemetry signals from your instrumented agent, Elastic generates a detailed distributed trace (e.g. waterfall view) for every interaction. To get further details on LLM information such as prompts, responses, token usage, etc, you can explore the APM logs.

This allows you to peer inside the "black box" of the agent's execution flow:

Visualize the Chain of Thought: See the full sequence of events, from the user's initial prompt to the final response, including all intermediate reasoning steps.
Pinpoint Tool Failures: Identify exactly which external tool (e.g., a Lambda function for flight booking or a knowledge base query) failed or timed out.
Analyze Latency Contributors: Distinguish between latency caused by the LLM's generation time versus latency caused by slow downstream API calls.
Debug with Context: Drill down into individual spans to see specific error messages, attributes, and metadata that explain why a particular step failed.

Conclusion

As organizations move from experimental chatbots to complex, autonomous agents in production, the need for robust observability has never been greater. Agentic applications introduce new layers of complexity—non-deterministic behaviors, multi-step reasoning loops, and cost implications—that standard monitoring tools simply cannot see.

Elastic Agentic AI Observability for Amazon Bedrock AgentCore bridges this gap. By unifying platform-level health metrics from AgentCore with deep, transaction-level distributed tracing from OpenTelemetry, Elastic gives SREs and developers the complete picture. Whether you are debugging a failed tool call, optimizing latency, or controlling token costs, you have the visibility needed to run agentic AI with confidence.

Complete Visibility: AgentCore + Amazon Bedrock: For the most comprehensive view, we recommend onboarding Elastic’s Amazon Bedrock integration alongside AgentCore. While the AgentCore integration focuses on the orchestration layer—monitoring agent errors, tool latency, and invocations—the Bedrock integration provides deep visibility into the underlying foundation models themselves. This includes tracking model-specific latency, token usage, full prompts and responses, and even Guardrails usage and effectiveness. By combining both, you ensure complete coverage from the high-level agent workflow down to the raw model inference.

Read more: Monitor Amazon Bedrock with Elastic
Read more: Amazon Bedrock Guardrails Observability

Get Started Today Ready to see your agents in action?

Try it out: Log in to Elastic Cloud and add the Amazon Bedrock AgentCore integration. Or use Elastic from Amazon Marketplace
Explore the Code: Check out our GitHub repository for the Travel assistant which you saw in this blog, as well as the AgentCore SRE Agent.
Learn More: Read the full documentation on setting up integration for Agentic AI Observability for Amazon Bedrock AgentCore.

LLM observability with Elastic: Taming the LLM with Guardrails for Amazon Bedrock

Sun, 02 Mar 2025 00:00:00 GMT

In a previous blog we showed you how to set up observability for your models hosted on Amazon Bedrock using Elastic’s integration. You can now effortlessly enable observability for your Amazon Bedrock guardrails using the enhanced Elastic Amazon Bedrock integration. If you previously onboarded the Amazon Bedrock integration, just upgrade it and you will automatically get all guardrails-related updates. The enhanced integration provides a single pane of glass dashboard with two panels - one focusing on overall Bedrock visualizations as well as a separate panel dedicated to Guardrails. You can now ingest and visualize metrics and logs specific to Guardrails, such as guardrail invocation count, invocation latency, text unit utilization, guardrail policy types associated with interventions and many more.

In this blog we will show you how to set up observability for Amazon Bedrock Guardrails, how you can make use of the enhanced dashboards and what key signals to alert on for an effective observability coverage of your Bedrock guardrails.

Prerequisites

To follow along with this blog, please make sure you have:

An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.16.2 or higher. Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.

Steps to create a guardrail for Amazon Bedrock

Before you set up observability for the guardrails, ensure that you have configured guardrails for your model. Follow the steps below to create an Amazon Bedrock Guardrail

Access the Amazon Bedrock Console
- Sign in to the AWS Management Console with appropriate permissions and navigate to the Amazon Bedrock console.
Navigate to Guardrails
- From the left-hand menu, select Guardrails.
Create a New Guardrail
- Select Create guardrail.
- Provide a descriptive name, an optional brief description, and specify a message to display when the guardrail blocks the user prompt.
  - Example: Sorry, I am not configured to answer such questions. Kindly ask a different question.
Configure Guardrail Policies
- Content Filters: Adjust settings to block harmful content and prompt attacks.
- Denied Topics: Specify topics to block.
- Word Filters: Define specific words or phrases to block.
- Sensitive Information Filters: Set up filters to detect and remove sensitive information.
- Contextual Grounding:
  - Configure the Grounding Threshold to set the minimum confidence level for factual accuracy.
  - Set the Relevance Threshold to ensure responses align with user queries.
Review and Create
- Review your settings and select Create to finalize the guardrail.
Create a Guardrail Version
- In the Version section, select Create.
- Optionally add a description, then select Create Version.

After creating a version of your guardrail, it's important to note down the Guardrail ID and the Guardrail Version Name. These identifiers are essential when integrating the guardrail into your application, as you'll need to specify them during guardrail invocation.

Example code to integrate with Amazon Bedrock guardrails

Integrating Amazon Bedrock's ChatBedrock into your Python application enables advanced language model interactions with customisable safety measures. By configuring guardrails, you can ensure that the model adheres to predefined policies, preventing it from generating inappropriate or sensitive content.

The following code demonstrates how to integrate Amazon Bedrock with guardrails to enforce contextual grounding in AI-generated responses. It sets up a Bedrock client using AWS credentials, defines a reference grounding statement, and uses the ChatBedrock API to process user queries with contextual constraints. The converse_with_guardrails function sends a user query alongside a predefined grounding reference, ensuring that responses align with the provided knowledge source.

Setting Up Environment Variables

Before running the script, configure the required AWS credentials and guardrail settings as environment variables. These variables allow the script to authenticate with Amazon Bedrock and apply the necessary guardrails for safe and controlled AI interactions.

Create a .env file in the same directory as your script and add:

AWS_ACCESS_KEY="your-access-key" 
AWS_SECRET_KEY="your-secret-key" 
AWS_REGION="your-aws-region" 
GUARDRAIL_ID="your-guardrail-id" 
GUARDRAIL_VERSION="your-guardrail-version"

Create a Python script and run

Create a Python script using the code below and execute it to interact with the Amazon Bedrock Guardrails you set up.

import os
import boto3
from dotenv import load_dotenv
from langchain_aws import ChatBedrock
import json
from botocore.exceptions import ClientError

# Load environment variables
load_dotenv()

# Function to check for hallucinations using contextual grounding
def check_hallucination(response):
   output_assessments = response.get("trace", {}).get("guardrail", {}).get("outputAssessments", {})

   # Iterate over all assessments
   for key, assessments in output_assessments.items():
       for assessment in assessments:
           contextual_policy = assessment.get("contextualGroundingPolicy", {})
          
           if "filters" in contextual_policy:
               grounding = relevance = None
               grounding_threshold = relevance_threshold = None

               for filter_result in contextual_policy["filters"]:
                   filter_type = filter_result.get("type")
                   if filter_type == "RELEVANCE":
                       relevance = filter_result.get("score", 0)
                       relevance_threshold = filter_result.get("threshold", 0)
                   elif filter_type == "GROUNDING":
                       grounding = filter_result.get("score", 0)
                       grounding_threshold = filter_result.get("threshold", 0)
          
           if relevance < relevance_threshold or grounding < grounding_threshold:
               return True, relevance, grounding, relevance_threshold, grounding_threshold  # Hallucination detected
  
   return False, relevance, grounding, relevance_threshold, grounding_threshold  # No hallucination detected

def converse_with_guardrails(bedrock_client, messages, grounding_reference):
   message = [
       {
           "role": "user",
           "content": [
               {
                   "guardContent": {
                       "text": {
                           "text": grounding_reference,
                           "qualifiers": ["grounding_source"],
                       }
                   }
               },
               {
                   "guardContent": {
                       "text": {
                           "text": messages,
                           "qualifiers": ["query"],
                       }
                   }
               },
           ],
       }
   ]
   converse_config = {
       "modelId": os.getenv('CHAT_MODEL'),
       "messages": message,
       "guardrailConfig": {
           "guardrailIdentifier": os.getenv("GUARDRAIL_ID"),
           "guardrailVersion": os.getenv("GUARDRAIL_VERSION"),
           "trace": "enabled"
       },
       "inferenceConfig": {
           "temperature": 0.5       
       },
   }
   try:
       response = bedrock_client.converse(**converse_config)
       return response
   except ClientError as e:
       error_message = e.response['Error']['Message']
       print(f"An error occurred: {error_message}")
       print("Converse config:")
       print(json.dumps(converse_config, indent=2))
       return None
  
def pretty_print_response(response, is_hallucination, relevance, relevance_threshold, grounding, grounding_threshold):
   print("\n" + "="*60)
   print(" Guardrail Assessment")
   print("="*60)
   # Extract response message safely
   response_text = response.get("output", {}).get("message", {}).get("content", [{}])[0].get("text", "N/A")
   print("\n **Model Response:**")
   print(f"   {response_text}")
   print("\n **Guardrail Assessment:**")
   print(f"   Is Hallucination : {is_hallucination}")
   print("\n **Contextual Grounding Policy Scores:**")
   print(f"   - Relevance Score : {relevance:.2f} (Threshold: {relevance_threshold:.2f})")
   print(f"   - Grounding Score : {grounding:.2f} (Threshold: {grounding_threshold:.2f})")
   print("\n" + "="*60 + "\n")
  
def main():
   bs = boto3.Session(
       aws_access_key_id=os.getenv('AWS_ACCESS_KEY'),
       aws_secret_access_key=os.getenv('AWS_SECRET_KEY'),
       region_name=os.getenv('AWS_REGION')
   )

   # Initialize Bedrock client
   bedrock_client = bs.client("bedrock-runtime")

   # Grounding reference
   grounding_reference = "The Wright brothers made the first powered aircraft flight on December 17, 1903."

   # User query
   user_query = "Who were the first to fly an airplane?"
  
   # Get model response
   response = converse_with_guardrails(bedrock_client, user_query, grounding_reference)

   # Check for hallucinations
   is_hallucination, relevance, grounding, relevance_threshold, grounding_threshold = check_hallucination(response)

   # Print the results
   pretty_print_response(response, is_hallucination, relevance, relevance_threshold, grounding, grounding_threshold)


if __name__ == "__main__":
   main()

Identifying Hallucinations with Contextual Grounding

The contextual grounding feature proved effective in identifying potential hallucinations by comparing model responses against reference information. Relevance and grounding scores provided quantitative measures to assess the accuracy of model outputs.

The python script run output below demonstrates how the Grounding Score helps detect hallucinations:

============================================================
 Guardrail Assessment
============================================================

 **Model Response:**
   Sorry, I am not configured to answer such questions. Kindly ask a different question.

 **Guardrail Assessment:**
   Is Hallucination : True

 **Contextual Grounding Policy Scores:**
   - Relevance Score : 1.00 (Threshold: 0.99)
   - Grounding Score : 0.03 (Threshold: 0.99)

============================================================

Here, the Grounding Score of 0.03 is significantly lower than the configured threshold of 0.99, indicating that the response lacks factual accuracy. Since the score falls below the threshold, the system flags the response as a hallucination, highlighting the need to monitor guardrail outputs to ensure AI safety.

Configuring Amazon Bedrock Guardrails Metrics & Logs Collection

Elastic makes it easy to collect both logs and metrics from Amazon Bedrock Guardrails using the Amazon Bedrock integration. By default, Elastic provides a curated set of logs and metrics, but you can customize the configuration based on your needs. The integration supports Amazon S3 and Amazon CloudWatch Logs for log collection, along with metrics collection from your chosen AWS region at a specified interval.

Follow these steps to enable the collection of metrics and logs:

Navigate to Amazon Bedrock Settings - In the AWS Console, go to Amazon Bedrock and open the Settings section.
Choose Logging Destination - Select whether to send logs to Amazon S3 or Amazon CloudWatch Logs.
Provide Required Details
- If using Amazon S3, logs can be collected from objects referenced in S3 notification events (read from an SQS queue) or by direct polling from an S3 bucket.
- If using CloudWatch Logs: you need to create a CloudWatch log group and note its ARN, as this will be required for configuring both Amazon Bedrock and Elastic Amazon Bedrock integration.

Configure Elastic's Amazon Bedrock integration - In Elastic, set up the Amazon Bedrock integration, ensuring the logging destination matches the one configured in Amazon Bedrock. Logs from your selected source and metrics from your AWS region will be collected automatically.

Accept Defaults or Customize Settings - Elastic provides a default configuration for logs and metrics collection. You can accept these defaults or adjust settings such as collection intervals to better fit your needs.

Understanding the pre-configured dashboard for Amazon Bedrock Guardrails

You can access the Amazon Bedrock Guardrails dashboard using either of the following methods:

Navigate to the Dashboard Menu - Select the Dashboard menu option in Elastic and search for [Amazon Bedrock] Guardrails to open the dashboard.
Navigate to the Integrations Menu - Open the Integrations menu in Elastic, select Amazon Bedrock, go to the Assets tab, and choose [Amazon Bedrock] Guardrails from the dashboard assets.

The Amazon Bedrock Guardrails dashboard in the Elastic integration provides insights into guardrail performance, tracking total invocations, API latency, text unit usage, and intervention rates. It analyzes policy-based interventions, highlighting trends, text consumption, and frequently triggered policies. The dashboard also showcases instances where guardrails modified or blocked responses and offers a detailed breakdown of invocations by policy and content source.

Guardrail invocation overview

This dashboard section provides a comprehensive summary of key metrics related to guardrail performance and usage:

Total guardrails API invocations: Displays the overall count of times guardrails were invoked.
Average Guardrails API invocation latency: Shows the average response time for guardrail API calls, offering insights into system performance.
Total text unit utilization: Indicates the volume of text processed during guardrail invocations. For pricing of text units refer to Amazon Bedrock pricing page.
Invocations - with and without guardrail interventions: A pie chart representation showing the distribution of LLM invocations based on guardrail activity. It displays the count of invocations where no guardrail interventions occurred, those where guardrails intervened and detected policy violations, and those where guardrails intervened but found no violations.

These metrics help users evaluate guardrail effectiveness, track intervention patterns, and optimize configurations to ensure policy enforcement while maintaining system performance.

Guardrail policy types for interventions

This section provides a comprehensive view of guardrail policy interventions and their impact:

Interventions by Policy Type: Bar charts display the number of interventions applied to user inputs and model outputs, categorized by policy type (e.g., Contextual Grounding Policy, Word Policy, Content Policy, Sensitive Information Policy, Topic Policy).
Text Unit Utilization by Policy Type: Panels highlight the text units consumed by various policy interventions, separately for user inputs and model outputs.
Policy Usage Trends: A word cloud visualisation reveals the most frequently applied policy types, offering insights into intervention patterns.

By analyzing intervention counts, text unit usage, and policy trends, users can identify frequently triggered policies, optimize guardrail settings, and ensure LLM interactions align with compliance and safety requirements.

Prompt and response where guardrails intervened

This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding guardrail response. The text panel presents the prompt alongside the model's response after applying guardrail interventions. These interventions occur when input evaluation or model responses violate configured policies, leading to blocked or masked outputs.

The section also includes additional details to enhance visibility into how guardrails operate. It indicates whether a violation was detected, along with the violation type (e.g., GROUNDING, RELEVANCE) and the action taken (BLOCKED, NONE). For contextual grounding, the dashboard also shows the filter threshold, which defines the minimum confidence level required for a response to be considered valid, and the confidence score, which reflects how well the response aligns with the expected criteria.

By analyzing violations, actions taken, and confidence scores, users can adjust guardrail thresholds to balance blocking unsafe responses and allowing valid ones, ensuring optimal accuracy and compliance. This process is particularly crucial for detecting and mitigating hallucinations—instances where models generate information not grounded in source data. Implementing contextual grounding checks enables the identification of such ungrounded or irrelevant content, enhancing the reliability of applications like retrieval-augmented generation (RAG).

Guardrail invocation by guardrail policy

This section offers insights into the number of Guardrails API invocations, the overall latency, the total text units categorised by various guardrail policies (identified by guardrail ARN) and the policy versions.

Guardrail invocation by content source (Input & Output)

This section provides a detailed overview of critical metrics related to guardrail performance and usage. It includes the total number of guardrail invocations, the count of intervention invocations where policies were applied, the volume of text units consumed during these interventions for both user inputs and model outputs and the average guardrail API invocation latency.

These insights help users understand how guardrails operate across different policies and content sources. By analyzing invocation counts, latency, and text unit consumption, users can assess policy effectiveness, track intervention patterns, and optimize configurations. Evaluating how guardrails interact with user inputs and model outputs ensures consistent enforcement, helping refine thresholds and improve compliance strategies.

Configure SLOs and Alerts

To create an SLO for monitoring contextual grounding accuracy, define a custom query SLI where good events are model responses that meet contextual grounding criteria, ensuring factual accuracy and alignment with the provided reference.

A suitable query for tracking good events is:

gen_ai.prompt : "*qualifiers[\\\"grounding_source\\\"]*" and 
(gen_ai.compliance.violation_detected : false or 
not gen_ai.compliance.violation_detected : *)

The total query considers all relevant interactions having contextual grounding check is:

gen_ai.prompt : "*qualifiers[\\\"grounding_source\\\"]*"

Set an SLO target of 99.5%, ensuring that the vast majority of responses remain factually grounded. This helps detect hallucinations and misaligned outputs in real-time. By continuously monitoring contextual grounding accuracy, you can proactively address inconsistencies, retrain models, or refine RAG pipelines before inaccuracies impact end users.

Elastic's alerting capabilities enable proactive monitoring of key performance metrics. For instance, by setting up an alert on the average aws_bedrock.guardrails.invocation_latency with a 500ms threshold, you can promptly identify and address performance bottlenecks, ensuring that policy enforcement remains efficient without causing unexpected delays.

Conclusion

The Elastic Amazon Bedrock integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Amazon Bedrock including Guardrails. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.

If you haven’t already done so, read our previous blog on what you can do with the Amazon Bedrock integration, set up guardrails for your Bedrock models, and enable the Bedrock integration to start observing your Bedrock models and guardrails today!

LLM Observability with the new Amazon Bedrock Integration in Elastic Observability

Mon, 25 Nov 2024 00:00:00 GMT

As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Amazon Bedrock, while minimizing downtime and keeping costs in check.

Elastic is expanding support for LLM Observability with Elastic Observability's new Amazon Bedrock integration. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models from leading AI companies and from Amazon available through Amazon Bedrock. The new Amazon Bedrock Observability integration offers an out-of-the-box experience by simplifying the collection of Amazon Bedrock metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Amazon Bedrock.

This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.

Prerequisites

To follow along with this blog, please make sure you have:

An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.13 or higher.
An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.

Configuring Amazon Bedrock Logs Collection

To collect Amazon Bedrock logs, you can choose from the following options:

Amazon Simple Storage Service (Amazon S3) bucket
Amazon CloudWatch logs

S3 Bucket Logs Collection: When collecting logs from the Amazon S3 bucket, you can retrieve logs from Amazon S3 objects pointed to by Amazon S3 notification events, which are read from an SQS queue, or by directly polling a list of Amazon S3 objects in an Amazon S3 bucket. Refer to Elastic’s Custom AWS Logs integration for more details.

CloudWatch Logs Collection: In this option, you will need to create a CloudWatch log group. After creating the log group, be sure to note down the ARN of the newly created log group, as you will need it for the Amazon Bedrock settings configuration and Amazon Bedrock integration configuration for logs.

Configure the Amazon Bedrock CloudWatch logs with the Log group ARN to start collecting CloudWatch logs.

Please visit the AWS Console and navigate to the "Settings" section under Amazon Bedrock and select your preferred method of collecting logs. Based on the value you select from the Logging Destination in the Amazon Bedrock settings, you will need to enter either the Amazon S3 location or the CloudWatch log group ARN.

Configuring Amazon Bedrock Metrics Collection

Configure Elastic's Amazon Bedrock integration to collect Amazon Bedrock metrics from your chosen AWS region at the specified collection interval.

Maximize Visibility with Out-of-the-Box Dashboards

Amazon Bedrock integration offers rich out-of-the-box visibility into the performance and usage information of models in Amazon Bedrock, including text and image models. The Amazon Bedrock Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.

The Text / Chat metrics section in the Amazon Bedrock Overview dashboard provides insights into token usage for Text models in Amazon Bedrock. This includes use cases such as text content generation, summarization, translation, code generation, question answering, and sentiment analysis.

The Image metrics section in the Amazon Bedrock Overview dashboard offers valuable insights into the usage of Image models in Amazon Bedrock.

The Logs section of the Amazon Bedrock Overview dashboard in Elastic provides detailed insights into the usage and performance of LLM requests. It enables you to monitor key details such as model name, version, LLM prompt and response, usage tokens, request size, completion tokens, response size, and any error codes tied to specific LLM requests.

The detailed logs provide full visibility into raw model interactions, capturing both the inputs (prompts) and the outputs (responses) generated by the models. This transparency enables you to analyze and optimize how your LLM handles different requests, allowing for more precise fine-tuning of both the prompt structure and the resulting model responses. By closely monitoring these interactions, you can refine prompt strategies and enhance the quality and reliability of model outputs.

Amazon Bedrock Overview dashboard provides a comprehensive view of the initial and final response times. It includes a percentage comparison graph that highlights the performance differences between these response stages, enabling you to quickly identify efficiency improvements or potential bottlenecks in your LLM interactions.

Creating Alerts and SLOs to Monitor Amazon Bedrock

As with any Elastic integration, Amazon Bedrock logs and metrics are fully integrated into Elastic Observability, allowing you to leverage features like SLOs, alerting, custom dashboards, and detailed logs exploration.

To create an alert, for example to monitor LLM invocation latency in Amazon Bedrock, you can apply a Custom Threshold rule on the Amazon Bedrock datastream. Set the rule to trigger an alert when the LLM invocation latency exceeds a defined threshold. This ensures proactive monitoring of model performance, allowing you to detect and address latency issues before they impact the user experience.

When a violation occurs, the Alert Details view linked in the notification provides detailed context, including when the issue began, its current status, and any history of similar violations. This rich information enables rapid triaging, investigation, and root cause analysis to resolve issues efficiently.

Similarly, to create an SLO for monitoring Amazon Bedrock invocation performance for instance, you can define a custom query SLI where good events are those Amazon Bedrock invocations that do not result in client errors or server errors and have latency less than 10 seconds. Set an appropriate SLO target, such as 99%. This will help you identify errors and latency issues in applications using LLMs, allowing you to take timely corrective actions before they affect the overall user experience.

The image below highlights the SLOs, SLIs, and the remaining error budget for Amazon Bedrock models. The observed violations are a result of deliberately crafted long text generation prompts, which led to extended response times. This example demonstrates how the system tracks performance against defined targets, helping you quickly identify latency issues and performance bottlenecks. By monitoring these metrics, you gain valuable insights for proactive issue triaging, allowing for timely corrective actions and improved user experience of applications using LLM.

Try it out today

The Amazon Bedrock playgrounds provide a console environment to experiment with running inference on different models and configurations before deciding to use them in an application. Start your own 7-day free trial by signing up via AWS Marketplace and quickly spin up a deployment in minutes on any of the Elastic Cloud regions on AWS around the world.

Deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from AWS Marketplace then spin up the new technical preview of Amazon Bedrock integration, open the curated dashboards in Kibana and start monitoring your Amazon Bedrock service!