Elastic Observability Labs - Articles by Ishleen Kaur

Customize your data ingestion with Elastic input packages

Tue, 26 Sep 2023 00:00:00 GMT

Elastic^® has enabled the collection, transformation, and analysis of data flowing between the external data sources and Elastic Observability Solution through integrations. Integration packages achieve this by encapsulating several components, including agent configuration, inputs for data collection, and assets like ingest pipelines, data streams, index templates, and visualizations. The breadth of these assets supported in the Elastic Stack increases day by day.

This blog dives into how input packages provide an extremely generic and flexible solution to the advanced users for customizing their ingestion experience in Elastic.

What are input packages?

An Elastic Package is an artifact that contains a collection of assets that extend the Elastic Stack, providing new capabilities to accomplish a specific task like integration with an external data source. The first use of Elastic packages is integration packages, which provide an end-to-end experience — from configuring Elastic Agent, to collecting signals from the data source, to ingesting them correctly and using the data once ingested.

However, advanced users may need to customize data collection, either because an integration does not exist for a specific data source, or even if it does, they want to collect additional signals or in a different way. Input packages are another type of Elastic package that provides the capability to configure Elastic Agent to use the provided inputs in a custom way.

Let’s look at an example

Say hello to Julia, who works as an engineer at Ascio Innovation firm. She is currently working with Oracle Weblogic server and wants to get a set of metrics for monitoring it. She goes ahead and installs Elastic Oracle Weblogic Integration, which uses Jolokia in the backend to fetch the metrics.

Now, her team wants to advance in the monitoring and has the following requirements:

We should be able to extract metrics other than the default ones, which are not supported by the default Oracle Weblogic Integration.
We want to have our own bespoke pipelines, visualizations, and experience.
We should be able to identify the metrics coming in from two different instances of Weblogic Servers by having data mapped to separate indices.

All the above requirements can be met by using the Jolokia input package to get a customized experience. Let's see how.

Julia can add the configuration of Jolokia input package as below, fulfilling the first requirement.

hostname, JMX Mappings for the fields you want to fetch for the JVM application, and the data set name to which the response fields would get mapped.

Julia can customize her data by writing her own ingest pipelines and providing her customized mappings. Also, she can then build her own bespoke dashboards, hence meeting her second requirement.

Let’s say now Julia wants to use another instance of Oracle Weblogic and get a different set of metrics.

This can be achieved by adding another instance of Jolokia input package and specifying a new data set name as shown in the screenshot below. The resultant metrics will be mapped to a different index/data set hence fulfilling her third requirement. This will help Julia to differentiate metrics coming in from two different instances of Oracle Weblogic.

The resultant metrics of the query will be indexed to the new data set, jolokia_second_dataset in the below example.

As we can see above, the Jolokia input package provides the flexibility to get new metrics by specifying different JMX Mappings, which are not supported in the default Oracle Weblogic integration (the user gets metrics from a predetermined set of JMX Mappings).

The Jolokia Input package also can be used for monitoring any Java-based application, which pushes its metrics through JMX. So a single input package can be used to collect metrics from multiple Java applications/services.

Elastic input packages

Elastic has started supporting input packages from the 8.8.0 release. Some of the input packages are now available in beta and will mature gradually:

SQL input package: The SQL input package allows you to execute queries against any SQL database and store the results in Elasticsearch^®.
Prometheus input package: This input package can collect metrics from Prometheus Exporters (Collectors).It can be used by any service exporting its metrics to a Prometheus endpoint.
Jolokia input package: This input package collects metrics from Jolokia agents running on a target JMX server or dedicated proxy server. It can be used for monitoring any Java-based application, which pushes its metrics through JMX.
Statsd input package: The statsd input package spawns a UDP server and listens for metrics in StatsD compatible format. This input can be used to collect metrics from services that send data over the StatsD protocol.
GCP Metrics input package: The GCP Metrics input package can collect custom metrics for any GCP service.

Try it out!

Now that you know more about input packages, try building your own customized integration for your service through input packages, and get started with an Elastic Cloud free trial.

We would love to hear from you about your experience with input packages on the Elastic Discuss forum or in the Elastic Integrations repository.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Elastic MongoDB Atlas Integration: Complete Database Monitoring and Observability

Thu, 24 Jul 2025 00:00:00 GMT

In today's data-driven landscape, MongoDB Atlas has emerged as the leading multi-cloud developer data platform, enabling organizations to work seamlessly with document-based data models while ensuring flexible schema design and easy scalability. However, as your Atlas deployments grow in complexity and criticality, comprehensive observability becomes essential for maintaining optimal performance, security, and reliability.

The Elastic MongoDB Atlas integration transforms how you monitor and troubleshoot your Atlas infrastructure by providing deep insights into every aspect of your deployment—from real-time alerts and audit trails to detailed performance metrics and organizational activities. This integration empowers teams to minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) while gaining actionable insights for capacity planning and performance optimization.

Why MongoDB Atlas Observability Matters

MongoDB Atlas abstracts much of the operational complexity of running MongoDB, but this doesn't eliminate the need for monitoring. Modern applications demand:

Proactive Issue Detection: Identify performance bottlenecks, resource constraints, and security threats before they impact users
Comprehensive Audit Trails: Track database operations, user activities, and configuration changes for compliance and security
Performance Optimization: Monitor query performance, resource utilization, and capacity trends to optimize costs and user experience
Operational Insights: Understand organizational activities, project changes, and infrastructure events across your multi-cloud deployments

The Elastic MongoDB Atlas integration addresses these needs by collecting comprehensive telemetry data and presenting it through powerful visualizations and alerting capabilities.

Integration Architecture and Data Streams

The MongoDB Atlas integration leverages the Atlas Administration API to collect eight distinct data streams, each providing specific insights into different aspects of your Atlas deployment:

Log Data Streams

Alert Logs: Capture real-time alerts generated by your Atlas instances, covering resource utilization thresholds (CPU, memory, disk space), database operations, security issues, and configuration changes. These alerts provide immediate visibility into critical events that require attention.

Database Logs: Collect comprehensive operational logs from MongoDB instances, including incoming connections, executed commands, performance diagnostics, and issues encountered. These logs are invaluable for troubleshooting performance problems and understanding database behavior.

MongoDB Audit Logs: Enable administrators to track system activity across deployments with multiple users and applications. These logs capture detailed events related to database operations including insertions, updates, deletions, user authentication, and access patterns—essential for security compliance and forensic analysis.

Organization Logs: Provide enterprise-level visibility into organizational activities, enabling tracking of significant actions involving database operations, billing changes, security modifications, host management, encryption settings, and user access management across teams.

Project Logs: Offer project-specific event tracking, capturing detailed records of configuration modifications, user access changes, and general project activities. These logs are crucial for project-level auditing and change management.

Metrics Data Streams

Hardware Metrics: Collect comprehensive hardware performance data including CPU usage, memory consumption, JVM memory utilization, and overall system resource metrics for each process in your Atlas groups.

Disk Metrics: Monitor storage performance with detailed insights into I/O operations, read/write latency, and space utilization across all disk partitions used by MongoDB Atlas. These metrics help identify storage bottlenecks and plan capacity expansion.

Process Metrics: Gather host-level metrics per MongoDB process, including detailed CPU usage patterns, I/O operation counts, memory utilization, and database-specific performance indicators like connection counts, operation rates, and cache utilization.

Implementation Guide

Setting Up the Integration

Getting started with MongoDB Atlas observability requires establishing API access and configuring the integration in Kibana:

Generate Atlas API Keys: Create programmatic API keys with Organization Owner permissions in the Atlas console, then invite these keys to your target projects with appropriate roles (Project Read Only for alerts/metrics, Project Data Access Read Only for audit logs).
Enable Prerequisites: Enable database auditing in Atlas for projects where you want to collect audit and database logs. Gather your Project ID and Organization ID from the Atlas UI.
Configure in Kibana: Navigate to Management > Integrations, search for "MongoDB Atlas," and add the integration using your API credentials.

The integration supports different permission levels for each data stream, ensuring you can collect operational metrics with minimal privileges while protecting sensitive audit data with elevated permissions.

Considerations and Limitations

Cluster Support: Log collection doesn't support M0 free clusters, M2/M5 shared clusters, or serverless instances
Historical Data: Most log streams collect the previous 30 minutes of historical data
Performance Impact: Large time spans may cause request timeouts; adjust HTTP Client Timeout accordingly

Real-World Use Cases and Benefits

Security and Compliance Monitoring

Audit Trail Management: Organizations in regulated industries leverage the audit logs to maintain comprehensive records of database access and modifications. The integration automatically parses and indexes audit events, making it easy to search for specific user activities, failed authentication attempts, or unauthorized access patterns.

Security Incident Response: When security events occur, teams can quickly correlate alert logs with audit trails to understand the scope and timeline of incidents.

Performance Optimization and Capacity Planning

Proactive Resource Management: By monitoring disk, hardware, and process metrics, teams can identify resource constraints before they impact application performance. For example, tracking disk I/O latency trends helps predict when storage upgrades are needed.

Query Performance Analysis: Database logs combined with process metrics provide insights into slow queries, connection patterns, and resource utilization that enable database performance tuning.

Operational Excellence

Multi-Environment Monitoring: Organizations running Atlas across development, staging, and production environments can standardize monitoring across all environments while maintaining environment-specific alerting thresholds.

Change Management: Project and organization logs provide complete audit trails for infrastructure changes, enabling teams to correlate application issues with recent configuration modifications.

Let's Try It!

The MongoDB Atlas integration delivers comprehensive database observability that enables proactive management and optimization of your Atlas deployments. With pre-built dashboards and alerting capabilities, teams can gain immediate value while leveraging rich data streams for advanced analytics and custom monitoring solutions.

Deploy a cluster on Elastic Cloud or Elastic Serverless, or download the Elasticsearch stack, then spin up the MongoDB Atlas Integration, open the curated dashboards in Kibana and start monitoring your service!

Elastic SQL inputs: A generic solution for database metrics observability

Mon, 11 Sep 2023 00:00:00 GMT

Elastic^® SQL inputs (metricbeat module and input package) allows the user to execute SQL queries against many supported databases in a flexible way and ingest the resulting metrics to Elasticsearch^®. This blog dives into the functionality of generic SQL and provides various use cases for advanced users to ingest custom metrics to Elastic^®, for database observability. The blog also introduces the fetch from all database new capability, released in 8.10.

Why “Generic SQL”?

Elastic already has metricbeat and integration packages targeted for specific databases. One example is metricbeat for MySQL — and the corresponding integration package. These beats modules and integrations are customized for a specific database, and the metrics are extracted using pre-defined queries from the specific database. The queries used in these integrations and the corresponding metrics are not available for modification.

Whereas the Generic SQL inputs (metricbeat or input package) can be used to scrape metrics from any supported database using the user's SQL queries. The queries are provided by the user depending on specific metrics to be extracted. This enables a much more powerful mechanism for metrics ingestion, where users can choose a specific driver and provide the relevant SQL queries and the results get mapped to one or more Elasticsearch documents, using a structured mapping process (table/variable format explained later).

Generic SQL inputs can be used in conjunction with the existing integration packages, which already extract specific database metrics, to extract additional custom metrics dynamically, making this input very powerful. In this blog, Generic SQL input and Generic SQL are used interchangeably.

Functionalities details

This section covers some of the features that would help with the metrics extraction. We provide a brief description of the response format configuration. Then we dive into the merge_results functionality, which is used to combine results from multiple SQL queries into a single document.

The next key functionality users may be interested in is to collect metrics from all the custom databases, which is now possible with the fetch_from_all_databases feature.

Now let's dive into the specific functionalities:

Different drivers supported

The generic SQL can fetch metrics from the different databases. The current version has the capability to fetch metrics from the following drivers: MySQL, PostgreSQL, Oracle, and Microsoft SQL Server(MSSQL).

Response format

The response format in generic SQL is used to manipulate the data in either table or in variable format. Here’s an overview of the formats and syntax for creating and using the table and variables.

Syntax: response_format: table {{or}} variables

Response format table
This mode generates a single event for each row. The table format has no restrictions on the number of columns in the response. This format can have any number of columns.

Example:

driver: "mssql"
sql_queries:
 - query: "SELECT counter_name, cntr_value FROM sys.dm_os_performance_counters WHERE counter_name= 'User Connections'"
   response_format: table

This query returns a response similar to this:

"sql":{
      "metrics":{
         "counter_name":"User Connections ",
         "cntr_value":7
      },
      "driver":"mssql"
}

The response generated above adds the counter_name as a key in the document.

Response format variables
The variable format supports key:value pairs. This format expects only two columns to fetch in a query.

Example:

driver: "mssql"
sql_queries:
 - query: "SELECT counter_name, cntr_value FROM sys.dm_os_performance_counters WHERE counter_name= 'User Connections'"
   response_format: variables

The variable format takes the first variable in the query above as the key:

"sql":{
      "metrics":{
         "user connections ":7
      },
      "driver":"mssql"
}

In the above response, you can see the value of counter_name is used to generate the key in variable format.

Response optimization: merge_results

We are now supporting merging multiple query responses into a single event. By enabling merge_results , users can significantly optimize the storage space of the metrics ingested to Elasticsearch. This mode enables an efficient compaction of the document generated, where instead of generating multiple documents, a single merged document is generated wherever applicable. The metrics of a similar kind, generated from multiple queries, are combined into a single event.

Syntax: merge_results: true {{or}} false

In the below example, you can see how the data is loaded into Elasticsearch for the below query when the merge_results is disabled.

Example:

In this example, we are using two different queries to fetch metrics from the performance counter.

merge_results: false
driver: "mssql"
sql_queries:
  - query: "SELECT cntr_value As 'user_connections' FROM sys.dm_os_performance_counters WHERE counter_name= 'User Connections'"
    response_format: table
  - query: "SELECT cntr_value As 'buffer_cache_hit_ratio' FROM sys.dm_os_performance_counters WHERE counter_name = 'Buffer cache hit ratio' AND object_name like '%Buffer Manager%'"
    response_format: table

As you can see, the response for the above example generates a single document for each query.

The resulting document from the first query:

"sql":{
      "metrics":{
         "user_connections":7
      },
      "driver":"mssql"
}

And resulting document from the second query:

"sql":{
      "metrics":{
         "buffer_cache_hit_ratio":87
      },
      "driver":"mssql"
}

When we enable the merge_results flag in the query, both the above metrics are combined together and the data gets loaded in a single document.

You can see the merged document in the below example:

"sql":{
      "metrics":{
         "user connections ":7,
         “buffer_cache_hit_ratio”:87
      },
      "driver":"mssql"
}

However, such a merge is possible only if the table queries are merged, and each produces a single row. There is no restriction on variable queries being merged.

Introducing a new capability: fetch_from_all_databases

This is a new functionality to fetch all the database metrics automatically from the system and user databases of the Microsoft SQL Server, by enabling the fetch_from_all_databases flag.

Keep an eye out for the 8.10 release version where you can start using the fetch all database feature. Prior to the 8.10 version, users had to provide the database names manually to fetch metrics from custom/user databases.

Syntax: fetch_from_all_databases: true {{or}} false

Below is the sample query with fetch all databases flag as disabled:

fetch_from_all_databases: false
driver: "mssql"
sql_queries:
  - query: "SELECT @@servername AS server_name, @@servicename AS instance_name, name As 'database_name', database_id FROM sys.databases WHERE name='master';"

The above query fetches metrics only for the provided database name. Here the input database is master, so the metrics are fetched only for the master.

Below is the sample query with the fetch all databases flag as enabled:

fetch_from_all_databases: true
driver: "mssql"
sql_queries:
  - query: SELECT @@servername AS server_name, @@servicename AS instance_name, DB_NAME() AS 'database_name', DB_ID() AS database_id;
    response_format: table

The above query fetches metrics from all available databases. This is useful when the user wants to get data from all the databases.

Please note: currently this feature is supported only for Microsoft SQL Server and will be used by MS SQL integration internally, to support extracting metrics for all user DBs by default.

Using generic SQL: Metricbeat

The generic SQL metricbeat module provides flexibility to execute queries against different database drivers. The metricbeat input is available as GA for any production usage. Here, you can find more information on configuring the generic SQL for different drivers with various examples.

Using generic SQL: Input package

The input package provides a flexible solution to advanced users for customizing their ingestion experience in Elastic. Generic SQL is now also available as an SQLinput package. The input package is currently available for early users as a beta release. Let's take a walk through how users can use generic SQL via the input package.

Configurations of generic SQL input package:

The configuration options for the generic SQL input package are as below:

Driver** :** This is the SQL database for which you want to use the package. In this case, we will take mysql as an example.
Hosts: Here the user enters the connection string to connect to the database. It would vary depending on which database/driver is being used. Refer here for examples.
SQL Queries: Here the user writes the SQL queries they want to fire and the response_format is specified.
Data set: The user specifies a data set name to which the response fields get mapped.
Merge results** :** This is an advanced setting, used to merge queries into a single event.

Metrics extensibility with customized SQL queries

Let's say a user is using MYSQL Integration, which provides a fixed set of metrics. Their requirement now extends to retrieving more metrics from the MYSQL database by firing new customized SQL queries.

This can be achieved by adding an instance of SQL input package, writing the customized queries and specifying a new data set name as shown in the screenshot below.

This way users can get any metrics by executing corresponding queries. The resultant metrics of the query will be indexed to the new data set, sql_second_dataset.

When there are multiple queries, users can club them into a single event by enabling the Merge Results toggle.

Customizing user experience

Users can customize their data by writing their own ingest pipelines and providing their customized mappings. Users can also build their own bespoke dashboards.

As we can see above, the SQL input package provides the flexibility to get new metrics by running new queries, which are not supported in the default MYSQL integration (the user gets metrics from a predetermined set of queries).

The SQL input package also supports multiple drivers: mssql, postgresql and oracle. So a single input package can be used to cater to all these databases.

Note: The fetch_from_all_databases feature is not supported in the SQL input package yet.

Try it out!

Now that you know about various use cases and features of generic SQL, get started with Elastic Cloud and try using the SQL input package for your SQL database and get customized experience and metrics. If you are looking for newer metrics for some of our existing SQL based integrations — like Microsoft SQL Server, Oracle, and more — go ahead and give the SQL input package a swirl.

LLM Observability for Google Cloud’s Vertex AI platform - understand performance, cost and reliability

Wed, 09 Apr 2025 00:00:00 GMT

As organizations increasingly adopt large language models (LLMs) for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Google Cloud’s Vertex AI.

New Elastic Observability LLM integration with Google Cloud’s Vertex AI platform

We are thrilled to announce general availability of monitoring LLMs hosted in Google Cloud through the Elastic integration with Vertex AI. This integration enables users to experience enhanced LLM Observability by providing deep insights into the usage, cost and operational performance of models on Vertex AI, including latency, errors, token usage, frequency of model invocations as well as resources utilized by models. By leveraging this data, organizations can optimize resource usage, identify and resolve performance bottlenecks, and enhance the model efficiency and accuracy.

Observability needs for AI-powered applications using the Vertex AI platform

Leveraging AI models creates unique needs around the observability and monitoring of AI-powered applications. Some of the challenges that come with using LLMs are related to the high cost to call the LLMs, the quality and safety of LLM responses, and the performance, reliability and availability of the LLMs.

Lack of visibility into LLM observability data can make it harder for SREs and DevOps teams to ensure their AI-powered applications meet their service level objectives for reliability, performance, cost and quality of the AI-generated content and have enough telemetry data to troubleshoot related issues. Thus, robust LLM observability and detection of anomalies in the performance of models hosted on Google Cloud’s Vertex AI platform in real time is critical for the success of AI-powered applications.

Depending on the needs of their LLM applications, customers can make use of a growing list of models hosted on the Vertex AI platform such as Gemini 2.0 Pro, Gemini 2.0 Flash, and Imagen for image generation. Each model excels in specific areas and generates content in some modalities including Language, Audio, Vision, Code, etc. No two models are the same; each model has specific performance characteristics. So, it is important that service operators are able to track the individual performance, behaviour and cost of each model.

Unlocking Insights with Vertex AI Metrics

The Elastic integration with Google Cloud’s Vertex AI platform collects a wide range of metrics from models hosted on Vertex AI, enabling users to monitor, analyze, and optimize their AI deployments effectively.

Once you use the integration, you can review all the metrics in the Vertex AI dashboard

These metrics can be categorized into the following groups:

1. Prediction Metrics

Prediction metrics provide critical insights into model usage, performance bottlenecks, and reliability. These metrics help ensure smooth operations, optimize response times, and maintain robust, accurate predictions.

Prediction Count by Endpoint: Measures the total number of predictions across different endpoints.
Prediction Latency: Provides insights into the time taken to generate predictions, allowing users to identify bottlenecks in performance.
Prediction Errors: Monitors the count of failed predictions across endpoints.

2. Model Performance Metrics

Model performance metrics provide crucial insights into deployment efficiency, and responsiveness. These metrics help optimize model performance and ensure reliable operations.

Model Usage: Tracks the usage distribution among different model deployments.
Token Usage: Tracks the number of tokens consumed by each model deployment, which is critical for understanding model efficiency.

Invocation Rates: Tracks the frequency of invocations made by each model deployment.
Model Invocation Latency: Measures the time taken to invoke a model, helping in diagnosing performance issues.

3. Resource Utilization Metrics

Resource utilization metrics are vital for monitoring resource efficiency and workload performance. They help optimize infrastructure, prevent bottlenecks, and ensure smooth operation of AI deployments.

CPU Utilization: Monitors CPU usage to ensure optimal resource allocation for AI workloads.
Memory Usage: Tracks the memory consumed across all model deployments.
Network Usage: Measures bytes sent and received, providing insights into data transfer during model interactions.

4. Overview Metrics

These metrics give an overview of the models deployed in Google Cloud’s Vertex AI platform. They are essential for tracking overall performance, optimizing efficiency, and identifying potential issues across deployments.

Total Invocations: The overall count of prediction invocations across all models and endpoints, providing a comprehensive view of activity.
Total Tokens: The total number of tokens processed across all model interactions, offering insights into resource utilization and efficiency.
Total Errors: The total count of errors encountered across all models and endpoints, helping identify reliability issues.

All metrics can be filtered by region, offering localized insights for better analysis.

Note: The Elastic I integration with Vertex AI provides comprehensive visibility into both deployment models: provisioned throughput, where capacity is pre-allocated, and pay-as-you-go, where resources are consumed on demand.

Conclusion

This integration with Vertex AI represents a significant step forward in enhancing the LLM Observability for users of Google Cloud’s Vertex AI platform. By unlocking a wealth of actionable data, organizations can assess the health, performance and cost of LLMs and troubleshoot operational issues, ensuring scalability, and accuracy in AI-driven applications.

Now that you know how the Vertex AI integration enhances LLM Observability, it’s your turn to try it out n. Spin up an Elastic Cloud, and start monitoring your LLM applications hosted on Google Cloud’s Vertex AI platform.

Infrastructure monitoring with OpenTelemetry in Elastic Observability

Wed, 24 Jul 2024 00:00:00 GMT

At Elastic, we recently made a decision to fully embrace OpenTelemetry as the premier data collection framework. As an Observability engineer, I firmly believe that vendor agnosticism is essential for delivering the greatest value to our customers. By committing to OpenTelemetry, we are not only staying current with technological advancements but also driving them forward. This investment positions us at the forefront of the industry, championing a more open and flexible approach to observability.

Elastic donated Elastic Common Schema (ECS) to OpenTelemetry and is actively working to converge it with semantic conventions. In the meantime, we are dedicated to support our users by ensuring they don’t have to navigate different standards. Our goal is to provide a seamless end-to-end experience while using OpenTelemetry with our application and infrastructure monitoring solutions. This commitment allows users to benefit from the best of both worlds without any friction.

In this blog, we explore how to use the OpenTelemetry (OTel) collector to capture core system metrics from various sources such as AWS EC2, Google Compute, Kubernetes clusters, and individual systems running Linux or MacOS.

Powering Infrastructure UIs with Two Ingest Paths

Elastic users who wish to have OpenTelemetry as their data collection mechanism can now monitor the health of the hosts where the OpenTelemetry collector is deployed using the Hosts and Inventory UIs available in Elastic Observability.

Elastic offers two distinct ingest paths to power Infrastructure UIs: the ElasticsearchExporter Ingest Path and the OTLP Exporter Ingest Path.

ElasticsearchExporter Ingest Path:

The hostmetrics receiver in OpenTelemetry collects system-level metrics such as CPU, memory, and disk usage from the host machine in OTel Schema. The ElasticsearchExporter ingest path leverages the Hostmetrics Receiver to generate host metrics in the OTel schema. We've developed the ElasticInfraMetricsProcessor, which utilizes the opentelemetry-lib to convert these metrics into a format that Elastic UIs understand.

For example, the system.network.io OTel metric includes a direction attribute with values receive or transmit. These correspond to system.network.in.bytes and system.network.out.bytes, respectively, within Elastic.

The processor then forwards these metrics to the Elasticsearch Exporter, now enhanced to support exporting metrics in ECS mode. The exporter sends the metrics to an Elasticsearch endpoint, lighting up the Infrastructure UIs with insightful data.

To utilize this path, you can deploy the collector from the Elastic Collector Distro, available here.

An example collector config for this Ingest Path:

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
          system.cpu.logical.count:
            enabled: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      process:
        metrics:
          process.open_file_descriptors:
            enabled: true
          process.memory.utilization:
            enabled: true
          process.disk.operations:
            enabled: true
      network:
      processes:
      load:
      disk:
      filesystem:

processors:
  resourcedetection/system:
    detectors: ["system", "ec2"]
  elasticinframetrics:

exporters:  
  logging:
    verbosity: detailed
  elasticsearch/metrics: 
    endpoints: 
    api_key: 
    mapping:
      mode: ecs

service:
  pipelines:
    metrics/host:
      receivers: [hostmetrics]
      processors: [resourcedetection/system, elasticinframetrics]
      exporters: [logging, elasticsearch/ metrics]

The Elastic exporter path is ideal for users who would prefer using the custom Elastic Collector Distro. This path includes the ElasticInfraMetricsProcessor, which sends data to Elasticsearch via Elasticsearch exporter.

OTLP Exporter Ingest Path:

In the OTLP Exporter Ingest path, the hostmetrics receiver collects system-level metrics such as CPU, memory, and disk usage from the host machine in OTel Schema. These metrics are sent to the OTLP Exporter, which forwards them to the APM Server endpoint. The APM Server, using the same opentelemetry-lib, converts these metrics into a format compatible with Elastic UIs. Subsequently, the APM Server pushes the metrics to Elasticsearch, powering the Infrastructure UIs.

An example collector configuration for the APM Ingest Path

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
          system.cpu.logical.count:
            enabled: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      process:
        metrics:
          process.open_file_descriptors:
            enabled: true
          process.memory.utilization:
            enabled: true
          process.disk.operations:
            enabled: true
      network:
      processes:
      load:
      disk:
      filesystem:

processors:
  resourcedetection/system:
    detectors: ["system"]
    system:
      hostname_sources: ["os"]

exporters:
  otlphttp:
    endpoint: 
    tls:
      insecure: false
    headers:
      Authorization: 
  logging:
    verbosity: detailed

service:
  pipelines:
    metrics/host:
      receivers: [hostmetrics]
      processors: [resourcedetection/system]
      exporters: [logging, otlphttp]

The OTLP Exporter Ingest path can help existing users who are already using Elastic APM and want to see the Infrastructure UIs populated as well. These users can use the default OpenTelemetry Collector.

A glimpse of the Infrastructure UIs

The Infrastructure UIs showcase both Host and Kubernetes level views. Below are some of the glimpses of the UIs

The Hosts Overview UI

The Hosts Inventory UI

The Process-related Details of the Host

The Kubernetes Inventory UI

Pod level Metrics

Our next step is to create Infrastructure UIs powered by native OTel data, with dedicated OTel dashboards that run on this native data.

Conclusion

Elastic's integration with OpenTelemetry simplifies the observability landscape and while we are diligently working to align ECS with OpenTelemetry’s semantic conventions, our immediate priority is to support our users by simplifying their experience. With this added support, we aim to deliver a seamless, end-to-end experience for those using OpenTelemetry with our application and infrastructure monitoring solutions. We are excited to see how our users will leverage these capabilities to gain deeper insights into their systems.

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Mon, 01 Dec 2025 00:00:00 GMT

Troubleshooting your Agents and Amazon Bedrock AgentCore with Elastic Observability

Introduction

We're excited to introduce Elastic Observability’s Amazon Bedrock AgentCore integration, which allows users to observe Amazon Bedrock AgentCore and the agents' LLM interactions end-to-end. Agentic AI represents a fundamental shift in how we build applications.

Unlike standard LLM chatbots that simply generate text, agents can reason, plan, and execute multi-step workflows to complete complex tasks autonomously. Many times these agents are running on a platform such as Amazon Bedrock AgentCore, which helps developers build, deploy and scale agents. Amazon Bedrock AgentCore is Amazon Bedrock's platform providing the secure, scalable, and modular infrastructure services (like agent runtime, memory, and identity) necessary for developers to deploy and operate highly capable AI agents built with any framework or model.

Using a platform, such as Amazon Bedrock Agentcore, is easy, but troubleshooting an agent is far more complex than debugging a standard microservice. Key challenges include:

Non-Deterministic Behavior: Agents may choose different tools or reasoning paths for the same prompt, making it difficult to reproduce bugs.
"Black Box" Execution: When an agent fails or provides a hallucinated answer, it is often unclear if the issue lies in the LLM's reasoning, the context provided, or a failed tool execution.
Cost & Latency Blind Spots: A single user query can trigger recursive loops or expensive multi-step tool calls, leading to unexpected spikes in token usage and latency.

To effectively observe these systems, you need to correlate signals from two distinct layers:

The Platform Layer (Amazon Bedrock AgentCore): You need to understand the overall health of the managed service. This includes high-level metrics like invocation counts, latency, throttling, and platform-level errors that affect all agents running in AgentCore.
The Application Layer (Your Agentic Logic): You want to understand the granular "why" behind the behavior. This includes distributed traces, usually with OpenTelemetry, that visualize the full request lifecycle (e.g. waterfall view), identifying exactly which step in the reasoning chain failed or took too long.

Agentic AI Observability in Elastic provides a unified, end-to-end view of your agentic deployment by combining platform-level insights from Amazon Bedrock AgentCore, through the new Amazon Bedrock AgentCore integration, with deep application-level visibility from OpenTelemetry (OTel) traces, logs and metrics form the agent. This unified view in Elastic allows you to observe, troubleshoot, and optimize your agentic applications from end to end without switching tools. Additionally, Elastic provides Agent Builder which allows you to create agents to analyze any of the data from Amazon Bedrock AgentCore and the agents running on it.

Agentic AI Observability in Elastic

As mentioned above there are two main parts to end-to-end Agentic AI Observability in Elastic.

Amazon Bedrock AgentCore Platform Observability - using platform logs and metrics, Elastic provides comprehensive visibility into the high-level health of the AgentCore service by ingesting AWS vended logs and metrics across four critical components:
- Runtime: Monitor core performance indicators such as agent errors, overall latency, throttle counts, and invocation rates, for each endpoint.
- Gateway: specific insights into gateway and tool call performance, including invocations, error rates, and latency.
- Memory: Track short-term and long-term memory operations, including event creation, retrieval, and listing, alongside performance analysis, errors, and latency metrics.
- Identity: Audit security and access health with logs on successful and failed access attempts.

Agent Observability with APM, logs and metrics - To understand how your agent is behaving, Elastic ingests OTel-native traces, metrics and logs from your application running within AgentCore. This allows you to visualize the full execution path, including LLM reasoning steps and tool calls, in a detailed waterfall diagram.

Agentic AI Analysis - All of the data from Amazon Bedrock AgentCore and the agent running on it, can be analyzed with Elastic’s AI driven capabilities. These include:

Elastic AgentCore SRE Agent built on Elastic Agent Builder - We don't just monitor agents; we provide you with one to assist your team. The AgentCore SRE Agent is a specialized assistant built using Elastic Agent Builder. It possesses specialized knowledge of AgentCore applications observed in Elastic.
- How it helps: You can ask specific questions regarding your AgentCore environment, such as how to interpret a complex error log or why a specific trace shows latency.
- Get the Agent: You can deploy this agent yourself from our GitHub repository.
Elastic Observability AI Assistant - Use natural language anywhere in Elastic’s UI to help you pinpoint issues, analyze something specific, or just learn what the problem is through LLM knowledge base. Additionally, SREs can interpret log messages, errors, metrics patterns, optimize code, write reports, and even identify and execute a runbook, or find a related github issue.
Streams - AI-Driven Log Analysis - When you send AgentCore logs from your instrumented application into Elastic, you can parse and analyze them. Additionally, Streams finds Significant Events within your log stream allowing you to focus immediately on what matters most.
Dashboards and ES|QL Data is only useful if you can act on it. Elastic provides out-of-the-box (OOTB) assets to accelerate your mean time to resolution (MTTR). And Elastic provides ES|QL to help you perform ad-hoc analysis on any signal
- OOTB Dashboards: Pre-built visualizations based on AgentCore service signals. These dashboards provide an immediate, high-level overview of the usage, health, and performance of your AgentCore runtime, gateway, memory, and identity components.
- OOTB Alert Templates: Pre-configured alerts for common agentic issues (e.g., high error rates, latency spikes, or unusual token consumption), allowing you to move from reactive to proactive troubleshooting immediately.

Onboarding Amazon Bedrock AgentCore signals into Elastic

Amazon Bedrock AgentCore Integration

To get started with platform-level visibility, you need to enable the Amazon Bedrock AgentCore integration in Elastic. This integration automatically collects metrics and logs from your AgentCore runtime, gateway, memory, and identity components via Amazon CloudWatch.

Setup Steps:

Prepare AWS Environment: Ensure your AgentCore agents are deployed and running and that you have enabled logging on your AgentCore resources in the AWS console.
Add the Integration:
- In Elastic (Kibana), navigate to Integrations.
- Search for "Amazon Bedrock AgentCore". Select Add Amazon Bedrock AgentCore.
Configure & Deploy:

Configure Elastic's Amazon Bedrock AgentCore integration to collect CloudWatch metrics from your chosen AWS region at the specified collection interval. Logs will be added soon after the publication of this blog.

Onboard the Agent with OTel Instrumentation

The next step is observing the application logic itself. The beauty of Amazon Bedrock AgentCore is that the application runtime often comes pre-instrumented. You simply need to tell it where to send the telemetry data.

For this example, we will use the Travel Assistant from the Elastic Observability examples.

To instrument this agent, you do not need to modify the source code. Instead, when you invoke the agent using the agentcore CLI, you simply pass your Elastic connection details as environment variables. This redirects the OTel signals (traces, metrics, and logs) directly to the Elastic EDOT collector.

Example Invoke Command: Run the following command to launch the agent and start streaming telemetry to Elastic:

    agentcore launch \
    --env BEDROCK_MODEL_ID="us.anthropic.claude-3-5-sonnet-20240620-v1:0" \
    --env OTEL_EXPORTER_OTLP_ENDPOINT="https://.region.cloud.elastic.co:443" \
    --env OTEL_EXPORTER_OTLP_HEADERS="Authorization=ApiKey " \
    --env OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" \
    --env OTEL_METRICS_EXPORTER="otlp" \
    --env OTEL_TRACES_EXPORTER="otlp" \
    --env OTEL_LOGS_EXPORTER="otlp" \
    --env OTEL_RESOURCE_ATTRIBUTES="service.name=travel_assistant,service.version=1.0.0" \
    --env AGENT_OBSERVABILITY_ENABLED="true" \
    --env DISABLE_ADOT_OBSERVABILITY="true" \
    --env TAVILY_API_KEY=""

Key Configuration Parameters:

OTEL_EXPORTER_OTLP_ENDPOINT: Your Elastic OTLP endpoint (ensure port 443 is specified).
OTEL_EXPORTER_OTLP_HEADERS: The Authorization header containing your Elastic API Key.
DISABLE_ADOT_OBSERVABILITY=true: This ensures the native AgentCore signals are routed exclusively to your defined endpoint (Elastic) rather than default AWS paths.

Analyzing Agentic Data in Elastic Observability

As we walk through the analysis features below, we will use the Travel Assistant agent which we instrumented earlier as well as any other apps you may be running on AgentCore. For the purposes of this example, as a second agent, we will use Customer Support Assistant from the AWS Labs AgentCore samples

Out-of-the-Box (OOTB) Dashboards

Elastic populates a set of comprehensive dashboards based on Amazon Bedrock AgentCore service logs and metrics. These appear as a unified view with tabs, providing a "single pane of glass" into the operational health of your platform.

This view is divided into four key zones, each addressing specific components of AgentCore - Runtime, Gateway, Memory, Identity. Note that note all agentic applications use all 4 components. In our example only the Customer Assistant uses all four components, whereas the Travel agent uses only Runtime.

Runtime Health

Visualize agent invocations, session metrics, error trends (system vs. user), and performance stats like latency and throttling, split per endpoint. This dashboard helps you answer questions like

"How are my Travel Assistant agent and Customer Support agent performing in terms of overall traffic and latency, and are there any spikes in errors or throttling?"

Gateway Performance

Analyze invocations across Lambda and MCP (Model Context Protocol), with detailed breakdowns for tool vs. non-tool calls. The dashboard highlights throttling detection, target execution times, and separates system errors from user errors.

Question answered: "Are my external integrations (Lambda, MCP) performing efficiently, or are specific tool calls experiencing high latency, throttling, or system-level errors?"

Memory Operations

Track core operations like event creation, retrieval, and listing, alongside deep dives into long-term memory processing. This includes extraction and consolidation metrics broken down by strategy type, as well as specific monitoring for throttling and system vs. user errors.

Question answered: "Are failures in memory consolidation strategies or high retrieval latency preventing the agent from effectively recalling user context?"

Identity & Access

Monitor identity token fetch operations (workload, OAuth, API keys) and real-time authentication success/failure rates. The dashboard breaks down activity by provider and highlights throttling or capacity bottlenecks.

Question answered: "Are authentication failures or token fetch bottlenecks from specific providers preventing agents from accessing required resources?"

Out-of-the-Box (OOTB) Alert Templates

Observability isn't just about looking at dashboards; it's about knowing when to act. To move from reactive checking to proactive monitoring, Elastic provides OOTB Alert Rule Templates (starting with Elastic version 9.2.1).

These templates eliminate guesswork by pre-selecting the optimal metrics to monitor and applying sensible thresholds. This configuration focuses on high-fidelity alerts for genuine anomalies, helping you catch critical issues early while minimizing alert fatigue.

Suggested OOTB Alerts:

Agent Runtime System Errors: Detects server-side errors (500 Internal Server Error) during agent runtime invocations, indicating infrastructure or service issues with AWS Bedrock AgentCore.
Agent Runtime User Errors: Flags client-side errors (4xx) during agent runtime invocations, including validation failures (400), resource not found (404), access denied (403), and resource conflicts (409). This helps catch misconfigured permissions, invalid input, or missing resources early.
Agent Runtime High Latency: Triggers when the average latency for agent runtime invocations exceeds 10 seconds (10,000ms). Latency measures the time elapsed between receiving a request and sending the final response token.

APM Tracing

While logs and metrics tell you that an issue exists, APM Tracing tells you exactly where and why it is happening. By ingesting the OpenTelemetry signals from your instrumented agent, Elastic generates a detailed distributed trace (e.g. waterfall view) for every interaction. To get further details on LLM information such as prompts, responses, token usage, etc, you can explore the APM logs.

This allows you to peer inside the "black box" of the agent's execution flow:

Visualize the Chain of Thought: See the full sequence of events, from the user's initial prompt to the final response, including all intermediate reasoning steps.
Pinpoint Tool Failures: Identify exactly which external tool (e.g., a Lambda function for flight booking or a knowledge base query) failed or timed out.
Analyze Latency Contributors: Distinguish between latency caused by the LLM's generation time versus latency caused by slow downstream API calls.
Debug with Context: Drill down into individual spans to see specific error messages, attributes, and metadata that explain why a particular step failed.

Conclusion

As organizations move from experimental chatbots to complex, autonomous agents in production, the need for robust observability has never been greater. Agentic applications introduce new layers of complexity—non-deterministic behaviors, multi-step reasoning loops, and cost implications—that standard monitoring tools simply cannot see.

Elastic Agentic AI Observability for Amazon Bedrock AgentCore bridges this gap. By unifying platform-level health metrics from AgentCore with deep, transaction-level distributed tracing from OpenTelemetry, Elastic gives SREs and developers the complete picture. Whether you are debugging a failed tool call, optimizing latency, or controlling token costs, you have the visibility needed to run agentic AI with confidence.

Complete Visibility: AgentCore + Amazon Bedrock: For the most comprehensive view, we recommend onboarding Elastic’s Amazon Bedrock integration alongside AgentCore. While the AgentCore integration focuses on the orchestration layer—monitoring agent errors, tool latency, and invocations—the Bedrock integration provides deep visibility into the underlying foundation models themselves. This includes tracking model-specific latency, token usage, full prompts and responses, and even Guardrails usage and effectiveness. By combining both, you ensure complete coverage from the high-level agent workflow down to the raw model inference.

Read more: Monitor Amazon Bedrock with Elastic
Read more: Amazon Bedrock Guardrails Observability

Get Started Today Ready to see your agents in action?

Try it out: Log in to Elastic Cloud and add the Amazon Bedrock AgentCore integration. Or use Elastic from Amazon Marketplace
Explore the Code: Check out our GitHub repository for the Travel assistant which you saw in this blog, as well as the AgentCore SRE Agent.
Learn More: Read the full documentation on setting up integration for Agentic AI Observability for Amazon Bedrock AgentCore.

Supercharge Your vSphere Monitoring with Enhanced vSphere Integration

Wed, 11 Dec 2024 00:00:00 GMT

vSphere is VMware's cloud computing virtualization platform that provides a powerful suite for managing virtualized resources. It allows organizations to create, manage, and optimize virtual environments, providing advanced capabilities such as high availability, load balancing, and simplified resource allocation. vSphere enables efficient utilization of hardware resources, reducing costs while increasing the flexibility and scalability of IT infrastructure.

With the release of an upgraded vSphere integration we now support an enhanced set of metrics and datastreams. Package version 1.15.0 onwards introduces new datastreams that significantly improve the collection of performance metrics, providing deeper insights into your vSphere environment.

This enhanced version includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.

We have expanded the performance metrics to encompass a broader range of insights across all datastreams, while also introducing new datastreams for clusters, resource pools, and networks. This enhanced integration version now includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools.

Each datastream also includes detailed alarm information, such as the alarm name, description, status (e.g. critical or warning), and the affected entity's name. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.

Overview of the Datastreams

Host Datastream: This datastream monitors the disk performance of the host, including metrics such as disk latency, average read/write bytes, uptime, and status. It also captures network metrics, such as packet information, network bandwidth, and utilization, as well as CPU and memory usage of the host. Additionally, it lists associated datastores, virtual machines, and networks within vSphere.

Virtual Machine Datastream: This datastream tracks the used and available CPU and memory resources of virtual machines, along with the uptime and status of each VM. It includes information about the host on which the VM is running, as well as detailed snapshot metrics like the number of snapshots, creation dates, and descriptions. Additionally, it provides insights into associated hosts and datastores.

Datastore Datastream: This datastream provides information on the total, used, and available capacity of datastores, along with their overall status. It also captures metrics such as the average read/write rate and lists the hosts and virtual machines connected to each datastore.
Datastore Cluster: A datastore cluster in vSphere is a collection of datastores grouped together for efficient storage management. This datastream provides details on the total capacity and free space in the storage pod, along with the list of datastores within the cluster.

Resource Pool: Resource pools in vSphere serve as logical abstractions that allow flexible allocation of CPU and memory resources. This datastream captures memory metrics, including swapped, ballooned, and shared memory, as well as CPU metrics like distributed and static CPU entitlement. It also lists the virtual machines associated with each resource pool.
Network Datastream: This datastream captures the overall configuration and status of the network, including network types (e.g., vSS, vDS). It also lists the hosts and virtual machines connected to each network.
Cluster Datastream: A Cluster in vSphere is a collection of ESXi hosts and their associated virtual machines that function as a unified resource pool. Clustering in vSphere allows administrators to manage multiple hosts and resources centrally, providing high availability, load balancing, and scalability to the virtual environment. This datastream includes metrics indicating whether HA or admission control is enabled and lists the hosts, networks, and datastores associated with the cluster.

Alarms support in vSphere Integration

Alarms are a vital part of the vSphere integration, providing real-time insights into critical events across your virtual environment. In the updated Elastic’s vSphere integration, alarms are now reported for all the entities. They include detailed information such as the alarm name, description, severity (e.g., critical or warning), affected entity, and triggered time. These alarms are seamlessly integrated into datastreams, helping administrators and SREs quickly identify and resolve issues like resource shortages or performance bottlenecks.

Example Alarm

"triggered_alarms": [
  {
    "description": "Default alarm to monitor host memory usage",
    "entity_name": "host_us",
    "id": "alarm-4.host-12",
    "name": "Host memory usage",
    "status": "red",
    "triggered_time": "2024-08-28T10:31:26.621Z"
  }
]

This example highlights a triggered alarm for monitoring host memory usage, indicating a critical status (red) for the host "host_us." Such alarms empower teams to act swiftly and maintain the stability of their vSphere environment.

Lets Try It Out!

The new vSphere integration in Elastic Cloud is more than just a monitoring tool; it’s a comprehensive solution that empowers you to manage and optimize your virtual environments effectively. With deeper insights and enhanced data granularity, you can ensure high availability, improved load balancing, and smarter resource allocation. Spin up an Elastic Cloud, and start monitoring your vSphere infrastructure.

Transforming Industries and the Critical Role of LLM Observability: How to use Elastic's LLM integrations in real-world scenarios

Thu, 08 May 2025 00:00:00 GMT

In today's tech-centric world, Large Language Models (LLMs) are transforming sectors from finance and healthcare to research. LLMs are starting to underpin products and services across the spectrum. Take for example recent advanced coding developments in Google's Gemini 2.5 which enable it to use its reasoning capabilities to create a video game by producing the executable code from a short prompt. Or new ways to interact with Amazon's Alexa - for example, you could send a picture of a live music schedule, and have Alexa add the details to your calendar. And let's not forget Microsoft's personalization of Copilot which remembers what you talk about, so it learns your likes and dislikes and details about your life; the name of your dog, that tricky project at work, what keeps you motivated to stick to your new workout routine.

Despite their widespread utility of LLMs, deploying these sophisticated tools in real-world scenarios poses distinct challenges, especially in managing their complex behaviors. For users such as Site Reliability Engineers (SREs), DevOps teams, and AI/ML engineers, ensuring reliability, performance, and compliance of these models introduces an additional layer of complexity. This is where the concept of LLM Observability becomes essential. It offers crucial insights into the performance of these models, ensuring that these advanced AI systems operate both effectively and ethically.

Why LLM Observability Matters and How Elastic Makes It Easy

LLMs are not just another piece of software; they are sophisticated systems capable of human-like capabilities such as text generation, comprehension, and even coding. But with great power comes greater need for oversight. The opaque nature of these models can obscure how decisions are made and content generated. This makes it even more critical to implement robust observability to monitor and troubleshoot issues such as hallucinations, inappropriate content, cost overruns, errors and performance degradation. By monitoring these models closely, we can safeguard against unexpected outcomes and maintain user trust.

Real-World Scenarios

Let's explore real-world scenarios where companies leverage LLM-powered applications to enhance productivity and user experience, and how Elastic's LLM observability solutions monitor critical aspects of these models.

1. Generative AI for Customer Support

Companies are increasingly leveraging LLMs and generative AI to enhance customer support, using platforms like Google Vertex AI for hosting these models efficiently. With the introduction of advanced AI models such as Google's Gemini, which is integrated into Vertex AI, businesses can deploy sophisticated chatbots that manage customer inquiries, from basic questions to complex issues, in real time. These AI systems understand and respond with natural language, offering instant support for issues such as product troubleshooting or managing orders thus reducing wait times. They also learn from each interaction to improve accuracy continuously. This boosts customer satisfaction and allows human agents to focus on complex tasks, enhancing overall efficiency. Other ways that AI tools can further empower customer care agents is with real-time analytics, sentiment detection, and conversation summarization.

To support use cases like the AI-powered customer support described above, Elastic recently launched LLM observability integrations including support for LLMs hosted on GCP Vertex AI. Customers who wish to monitor foundation models such as Gemini and Imagen hosted on Google Vertex AI can benefit from Elastic’s Vertex AI integration to get a deeper understanding of model behavior and performance, and ensure that the AI-driven tools are not only effective but also reliable. Customers get out-of-the-box experience ingesting a curated set of metrics from Vertex AI as well as a pre-configured dashboard.

By continuously tracking these metrics, customers can proactively manage their AI resources, optimize operations, and ultimately enhance the overall customer experience.

Let's look at some of the metrics you get from the Google Vertex AI integration which are helpful in the context of using generative AI for customer support.

Prediction Latency: Measures the time taken to complete predictions, critical for real-time customer interactions.
Error Rate: Tracks errors in predictions, which is vital for maintaining the accuracy and reliability of AI-driven customer support.
Prediction Count: Counts the number of predictions made, helping assess the scale of AI usage in customer interactions.
Model Usage: Tracks how frequently the AI models are accessed by both virtual assistants and customer support tools.
Total Invocations: Measures the total number of times the AI services are used, providing insights into user engagement and dependency on these tools.
CPU and Memory Utilization: By observing CPU and memory usage, users can optimize resource allocation, ensuring that the AI tools are running efficiently without overloading the system.

To learn more about how Elastic's Google Vertex AI integration can augment your LLM observability, have a quick read of this blog.

2. Transforming Healthcare with Generative AI

The healthcare industry is embracing generative AI to enhance patient interactions and streamline operational workflows. By leveraging platforms like Amazon Bedrock, healthcare organizations deploy advanced large language models (LLMs) to power tools that convert doctor-patient conversations into structured medical notes, reducing administrative overhead and allowing clinicians to prioritize diagnosis and treatment. These AI-driven solutions provide real-time insights, enabling informed decision-making and improving patient outcomes. Additionally, patient-facing applications powered by LLMs offer secure access to health records, empowering individuals to manage their care proactively.

Robust observability is essential to maintain the reliability and performance of these generative AI applications in healthcare. Elastic’s Amazon Bedrock integration equips providers with tools to monitor LLM behavior, capturing critical metrics like invocation latency, error rates, token usage and guardrail invocation. Pre-configured dashboards provide visibility into prompt and completion text, enabling teams to verify the accuracy of AI-generated outputs, such as medical notes, and detect issues like hallucinations.

Additionally, customers who configure Guardrails for Amazon Bedrock to filter harmful content like hate speech, personal insults, and other inappropriate topics, can use the Bedrock Integration to observe the prompts and responses that caused the guardrail to filter them out. This helps application developers take proactive actions to maintain a safe and positive user experience.

Some of the logs and metrics that can be helpful for customers using LLMs hosted on Amazon Bedrock are the following

Invocation Details: This Integration records the Invocation latency, count, throttles. These metrics are critical for ensuring that generative AI models respond quickly and accurately to patient queries or appointment scheduling tasks, maintaining a seamless user experience.
Error Rates: Tracking error rates ensures that AI tools, such as patient query assistants or appointment systems, consistently deliver accurate and reliable results. By identifying and addressing issues early, healthcare providers can maintain trust in AI systems and prevent disruptions in critical patient interactions.
Token Usage: In healthcare, tracking token usage helps identify resource-intensive queries, such as detailed patient record summaries or complex symptom analyses, ensuring efficient model operation. By monitoring token usage, healthcare providers can optimize costs for AI-powered tools while maintaining scalability to handle growing patient interactions.
Prompt and Completion Text: Capturing prompt and completion text allows healthcare providers to analyze how AI models respond to specific patient queries or administrative tasks, ensuring meaningful and contextually accurate interactions. This insight helps refine prompts to improve the AI's understanding and ensures that generated responses, such as appointment details or treatment explanations, meet the quality standards expected in healthcare.
Prompt and response where guardrails intervened: Being able to track requests and responses that were deemed inappropriate by guardrails helps healthcare providers monitor what information patients are asking for. With this information users can make continuous adjustments to the LLMs to ensure appropriate responses, balancing flexibility and rich communication on the one hand, and on the other, privacy protection, hallucination prevention, and harmful content filtering.

Amazon Bedrock Gaurdrails OOTB dashboard

To learn about the Amazon Bedrock Integration, read this blog. To dive deeper into how the integration can help with observability of Guardrails for Amazon Bedrock, take a look at this blog.

3. Enhancing Telco Efficiency with GenAI

The telecommunication industry can leverage services like Azure OpenAI to transform customer interactions, optimize operations, and enhance service delivery. By integrating advanced generative AI models, telcos can offer highly personalized and responsive customer experiences across multiple channels. AI-powered virtual assistants streamline customer support by automating routine queries and providing accurate, context-aware responses, reducing the workload on human agents and enabling them to focus on complex issues while improving efficiency and satisfaction. Additionally, AI-driven insights help telcos understand customer preferences, anticipate needs, and deliver tailored offerings that boost customer loyalty. Operationally, LLMs such as Azure OpenAI enhance internal processes by enabling smarter knowledge management and faster access to critical information.

Elastic's LLM observability integrations like the Azure OpenAI integration can provide visibility into AI performance and costs, empowering telecom providers to make data-driven decisions and enhance customer engagement. It can help optimize resource allocation by analyzing call patterns, predicting service demands, and identifying trends, enabling telcos to scale their AI operations efficiently while maintaining high service quality.

Some of the key metrics and logs that Azure OpenAI that can provide insights are:

Error Counts: It provides critical insights into failed requests and incomplete transactions, enabling telecom providers to proactively identify and resolve issues in AI-powered applications.
Prompt Input and Completion Text: This captures the input queries provided to AI systems and the corresponding AI-generated outputs. These fields allow telecom providers to analyze customer queries, monitor response quality, and refine AI training datasets to improve relevance and accuracy.
Response Latency: It measures the time taken by AI models to generate responses, ensuring that virtual assistants and automated systems deliver quick and efficient replies to customer queries.
Token Usage: It tracks the number of input and output tokens processed by the AI model, offering insights into resource consumption and cost efficiency. This data helps telecom providers monitor AI usage patterns, optimize configurations, and scale resources effectively
Content Filter Results: In Azure OpenAI, this plays a crucial role in handling sensitive inputs provided by customers, ensuring compliance, safety, and responsible AI usage. This feature identifies and flags potentially inappropriate or harmful queries and responses in real time, enabling telecom providers to address sensitive topics with care and accuracy.

The Azure OpenAI content filtering OOTB dashboard

You can learn more about Elastic's Azure OpenAI integration from these two blogs - Part 1 and Part 2.

4. OpenAI Integration for Generative AI Applications

As AI-powered solutions become integral to modern workflows, OpenAI's sophisticated models, including language models like GPT-4o and GPT-3.5 Turbo, image generation models like DALL·E, and audio processing models like Whisper, drive innovation across applications such as virtual assistants, content creation, and speech-to-text systems. With growing complexity and scale, ensuring these models perform reliably, remain cost-efficient, and adhere to ethical guidelines is paramount. Elastic's OpenAI integration provides a robust solution, offering deep visibility into model behaviour to support seamless and responsible AI deployments.

By tapping into the OpenAI Usage API, Elastic's integration delivers actionable insights through intuitive, pre-configured dashboards, enabling Site Reliability Engineers (SREs) and DevOps teams to monitor performance and optimize resource usage across OpenAI's diverse model portfolio. This unified observability approach empowers organizations to track critical metrics, identify inefficiencies, and maintain high-quality AI-driven experiences. The following key metrics from Elastic's OpenAI integration help organizations achieve effective oversight:

Request Latency: Measures the time taken for OpenAI models to process requests, ensuring responsive performance for real-time applications like chatbots or transcription services.
Invocation Rates: Tracks the frequency of API calls across models, providing insights into usage patterns and helping identify high-demand workloads.
Token Usage: Monitors input and output tokens (e.g., prompt, completion, cached tokens) to optimize costs and fine-tune prompts for efficient resource consumption.
Error Counts: Captures failed requests or incomplete transactions, enabling proactive issue resolution to maintain application reliability.
Image Generation Metrics: Tracks invocation rates and output dimensions for models like DALL·E, helping assess costs and usage trends in image-based applications.
Audio Transcription Metrics: Monitors invocation rates and transcribed seconds for audio models like Whisper, supporting cost optimization in speech-to-text workflows.

To learn more about Elastic's OpenAI integration, read this blog.

Actionable LLM Observability

Elastic's LLM observability integrations empower users to take proactive control of their AI operations through actionable insights and real-time alerts. For instance, by setting a predefined threshold for token count, Elastic can trigger automated alerts when usage exceeds this limit, notifying Site Reliability Engineers (SREs) or DevOps teams via email, Slack, or other preferred channels. This ensures prompt awareness of potential cost overruns or resource-intensive queries, enabling teams to adjust model configurations or scale resources swiftly to maintain operational efficiency.

In the example below, the rule is set to alert the user if token_count crosses a threshold of 500.

The alert is triggered when the token count exceeds the threshold as seen below

Another example is tracking invocation spikes, such as when the number of predictions or API calls surpasses a defined Service Level Objective (SLO). For example, if a Bedrock AI-hosted model experiences a sudden surge in invocations due to increased customer interactions, Elastic can alert teams to investigate potential anomalies or scale infrastructure accordingly. These proactive measures help maintain the reliability and cost-effectiveness of LLM-powered applications.

By providing pre-configured dashboards and customizable alerts, Elastic ensures that organizations can respond to critical events in real time, keeping their AI systems aligned with cost and performance goals as well as standards for content safety and reliability.

Conclusion

LLMs are transforming industries, but their complexity requires effective oversight observability to ensure their reliability and safe use. Elastic's LLM observability integrations provide a comprehensive solution, empowering businesses to monitor performance, manage resources, and address challenges like hallucinations and content safety. As LLMs become increasingly integral to various sectors, robust observability tools like those offered by Elastic ensure that these AI-driven innovations remain dependable, cost-effective, and aligned with ethical and safety standards.