Elastic Observability Labs - Observability

Elastic MongoDB Atlas Integration: Complete Database Monitoring and Observability

Thu, 24 Jul 2025 00:00:00 GMT

In today's data-driven landscape, MongoDB Atlas has emerged as the leading multi-cloud developer data platform, enabling organizations to work seamlessly with document-based data models while ensuring flexible schema design and easy scalability. However, as your Atlas deployments grow in complexity and criticality, comprehensive observability becomes essential for maintaining optimal performance, security, and reliability.

The Elastic MongoDB Atlas integration transforms how you monitor and troubleshoot your Atlas infrastructure by providing deep insights into every aspect of your deployment—from real-time alerts and audit trails to detailed performance metrics and organizational activities. This integration empowers teams to minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) while gaining actionable insights for capacity planning and performance optimization.

Why MongoDB Atlas Observability Matters

MongoDB Atlas abstracts much of the operational complexity of running MongoDB, but this doesn't eliminate the need for monitoring. Modern applications demand:

Proactive Issue Detection: Identify performance bottlenecks, resource constraints, and security threats before they impact users
Comprehensive Audit Trails: Track database operations, user activities, and configuration changes for compliance and security
Performance Optimization: Monitor query performance, resource utilization, and capacity trends to optimize costs and user experience
Operational Insights: Understand organizational activities, project changes, and infrastructure events across your multi-cloud deployments

The Elastic MongoDB Atlas integration addresses these needs by collecting comprehensive telemetry data and presenting it through powerful visualizations and alerting capabilities.

Integration Architecture and Data Streams

The MongoDB Atlas integration leverages the Atlas Administration API to collect eight distinct data streams, each providing specific insights into different aspects of your Atlas deployment:

Log Data Streams

Alert Logs: Capture real-time alerts generated by your Atlas instances, covering resource utilization thresholds (CPU, memory, disk space), database operations, security issues, and configuration changes. These alerts provide immediate visibility into critical events that require attention.

Database Logs: Collect comprehensive operational logs from MongoDB instances, including incoming connections, executed commands, performance diagnostics, and issues encountered. These logs are invaluable for troubleshooting performance problems and understanding database behavior.

MongoDB Audit Logs: Enable administrators to track system activity across deployments with multiple users and applications. These logs capture detailed events related to database operations including insertions, updates, deletions, user authentication, and access patterns—essential for security compliance and forensic analysis.

Organization Logs: Provide enterprise-level visibility into organizational activities, enabling tracking of significant actions involving database operations, billing changes, security modifications, host management, encryption settings, and user access management across teams.

Project Logs: Offer project-specific event tracking, capturing detailed records of configuration modifications, user access changes, and general project activities. These logs are crucial for project-level auditing and change management.

Metrics Data Streams

Hardware Metrics: Collect comprehensive hardware performance data including CPU usage, memory consumption, JVM memory utilization, and overall system resource metrics for each process in your Atlas groups.

Disk Metrics: Monitor storage performance with detailed insights into I/O operations, read/write latency, and space utilization across all disk partitions used by MongoDB Atlas. These metrics help identify storage bottlenecks and plan capacity expansion.

Process Metrics: Gather host-level metrics per MongoDB process, including detailed CPU usage patterns, I/O operation counts, memory utilization, and database-specific performance indicators like connection counts, operation rates, and cache utilization.

Implementation Guide

Setting Up the Integration

Getting started with MongoDB Atlas observability requires establishing API access and configuring the integration in Kibana:

Generate Atlas API Keys: Create programmatic API keys with Organization Owner permissions in the Atlas console, then invite these keys to your target projects with appropriate roles (Project Read Only for alerts/metrics, Project Data Access Read Only for audit logs).
Enable Prerequisites: Enable database auditing in Atlas for projects where you want to collect audit and database logs. Gather your Project ID and Organization ID from the Atlas UI.
Configure in Kibana: Navigate to Management > Integrations, search for "MongoDB Atlas," and add the integration using your API credentials.

The integration supports different permission levels for each data stream, ensuring you can collect operational metrics with minimal privileges while protecting sensitive audit data with elevated permissions.

Considerations and Limitations

Cluster Support: Log collection doesn't support M0 free clusters, M2/M5 shared clusters, or serverless instances
Historical Data: Most log streams collect the previous 30 minutes of historical data
Performance Impact: Large time spans may cause request timeouts; adjust HTTP Client Timeout accordingly

Real-World Use Cases and Benefits

Security and Compliance Monitoring

Audit Trail Management: Organizations in regulated industries leverage the audit logs to maintain comprehensive records of database access and modifications. The integration automatically parses and indexes audit events, making it easy to search for specific user activities, failed authentication attempts, or unauthorized access patterns.

Security Incident Response: When security events occur, teams can quickly correlate alert logs with audit trails to understand the scope and timeline of incidents.

Performance Optimization and Capacity Planning

Proactive Resource Management: By monitoring disk, hardware, and process metrics, teams can identify resource constraints before they impact application performance. For example, tracking disk I/O latency trends helps predict when storage upgrades are needed.

Query Performance Analysis: Database logs combined with process metrics provide insights into slow queries, connection patterns, and resource utilization that enable database performance tuning.

Operational Excellence

Multi-Environment Monitoring: Organizations running Atlas across development, staging, and production environments can standardize monitoring across all environments while maintaining environment-specific alerting thresholds.

Change Management: Project and organization logs provide complete audit trails for infrastructure changes, enabling teams to correlate application issues with recent configuration modifications.

Let's Try It!

The MongoDB Atlas integration delivers comprehensive database observability that enables proactive management and optimization of your Atlas deployments. With pre-built dashboards and alerting capabilities, teams can gain immediate value while leveraging rich data streams for advanced analytics and custom monitoring solutions.

Deploy a cluster on Elastic Cloud or Elastic Serverless, or download the Elasticsearch stack, then spin up the MongoDB Atlas Integration, open the curated dashboards in Kibana and start monitoring your service!

Elastic Ramen: A CLI harness for SRE investigation and remediation

Mon, 27 Apr 2026 00:00:00 GMT

Observability tools tell you what went wrong. They rarely help you fix it. When responding to an incident, engineers split their time across Kibana, Slack, and the terminal. At each step, the AI assistant stays behind in the previous surface, and the investigation starts over from scratch.

Elastic Ramen (Root-cause Analysis & Monitoring Engine) bridges that gap. It is a local CLI agent that connects directly to Elastic Agent Builder, carrying the same conversation, skills, and Elastic context into the terminal. Ramen operates directly in the environment where fixes actually happen. No handoff. No re-auth. No translation layer. Ramen is open source and available at elastic/elastic-ramen.

Why the terminal matters

Agent Builder gives engineers a strong environment for querying observability data. Ramen takes that same capability to the two workflows that need it most.

Onboarding. Configuring collectors, managing credentials, and validating data flow all happen in the shell. A local agent can guide that work right where the credentials and tools already live.

Mitigation. The actual fix, whether restarting pods, scaling deployments, or rolling back releases, requires kubectl, gcloud, git, or internal scripts. A CLI agent runs on hardware the team already trusts, using the credentials already present on the engineer's machine.

How Ramen works

Ramen is a CLI client for Agent Builder. It is not a separate assistant with its own memory. It connects your local environment to the same conversations, skills, and tools you already use in Kibana through a simple authentication flow.

On first launch, Ramen connects to your Elastic deployment and gives you everything out of the box:

LLM inference through the Kibana gateway, using your existing AI connector
Native Kibana tools for managing workflows and agents
The Agent Builder MCP server for ES|QL queries and documentation search
An embedded elastic CLI for cluster health, data streams, and SLOs
Built-in skills for root cause analysis and SLO management

The agent carries your investigation history across surfaces, so you never re-explain the incident when moving from the UI to the CLI. Terminal interactions sync back to Elastic automatically, building a searchable record of operational knowledge for the team.

Get started

You need an Elastic Observability Serverless project. In Kibana, open Stack Management, then Advanced Settings, or go directly to https:///app/management/kibana/settings?query=ramen. Enable elasticRamen:enabled, then install the CLI:

npm i -g @elastic/ramen
bun add -g @elastic/ramen

You can also use the install script or download a pre-built binary from GitHub Releases:

curl -fsSL https://raw.githubusercontent.com/elastic/elastic-ramen/dev/install | bash

Once installed, connect to your deployment:

elastic-ramen --kibana-base=https://

Ramen opens a browser auth flow, generates credentials, and stores them locally. After that, it reconnects automatically. Start a conversation in Agent Builder and resume it in the terminal with /kibana-conversations.

What is next

Ramen is the first surface of a multi-surface agent system. The same architecture extends to every surface engineers already use:

Space-scoped collaboration for shared agent context during outages
Slack, Teams, Jira, PagerDuty integration: start from an alert, collaborate in chat, mitigate in the terminal, one thread
Shared memory: progressively distill conversations into durable operational context that improves future investigations

Beyond incident response, the same model applies to deployment risk analysis, production debugging, CI/CD policy checks, and cost anomaly investigation.

Summary

Ramen connects signal to action: Elastic data and Agent Builder context, plus the ability to act with local tools, in one continuous thread. Elastic as the persistent context layer, every surface you use as the interface.

Try it out on GitHub and let us know what you think.

Getting more from your logs with OpenTelemetry

Thu, 11 Sep 2025 00:00:00 GMT

Getting more from your logs with OpenTelemetry

Most people today use their logging tools mostly still in the same way we have for decades as a simple search lake, essentially still grepping for logs but from a centralized platform. There’s nothing wrong with this, you can get a lot of value by having a centralized logging platform but the question becomes how can I start to evolve beyond this basic log and search use case? Where can I start to be more effective with my incident investigations? In this blog we start from where most of our customers are today and give you some practical tips on how to move a little beyond this simple logging use case.

Ingestion

Let's start at the beginning, ingest. Typically many of you are using older tools for ingestion today. If you want to be more forward thinking here, it’s time to introduce you to OpenTelemetry. OpenTelemetry was once not very mature or capable for logging but things have changed significantly. Elastic has been working particularly hard to improve the log capabilities resident in OpenTelemetry. So let's start by exploring how we can get started bringing logs into Elastic via the OpenTelemetry collector.

Firstly if you want to follow along simply create a host to run the log generator and OpenTelemetry collector.

Follow the instructions here to get the log generator running:

https://github.com/davidgeorgehope/log-generator-bin/

To get the OpenTelemetry collector up and running in Elastic Serverless, you can click on Add Data from the bottom left, then 'host' and finally 'opentelemetry'

Follow the instructions but don’t start the collector just yet.

Our host here is running a 3 tier application with an Nginx frontend, backend and connected to a MySQL database. So let's start by bringing the logs into Elastic.

First we’ll install the Elastic Distributions for OpenTelemetry but before starting it, we will make a small change to the OpenTelemetry configuration file to expand the directories it will search for logs in. Edit the otel.yml by simply using vi or your favorite editor:

vi otel.yml

Instead of simply /var/log/.log we will add /var/log/**/*.log to bring in all our log files.

receivers:
  # Receiver for platform specific log files
  filelog/platformlogs:
    include: [ /var/log/**/*.log ]
    retry_on_failure:
      enabled: true
    start_at: end
    storage: file_storage

Start the otel collector

sudo ./otelcol --config otel.yml

And we can see these are being brought in, in discover

Now one thing that is immediately noticeable is that we automatically without changing anything get a bunch of useful additional information such as the os name and cpu information.

The OpenTelemetry collector has automatically, without any changes, started to enrich our logs, making it useful for additional processing, though we could do significantly better!

To start with we want to give our logs some structure. Lets edit that otel.yml file and add some OTTL to extract some key data from our NGINX logs.

  transform/parse_nginx:
    trace_statements: []
    metric_statements: []
    log_statements:
      - context: log
        conditions:
          - 'attributes["log.file.name"] != nil and IsMatch(attributes["log.file.name"], "access.log")'
        statements:
          - merge_maps(attributes, ExtractPatterns(body, "^(?P\\S+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "^\\S+ - (?P\\S+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\\[(?P[^\\]]+)\\]"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\"(?P\\S+) "), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\"\\S+ (?P\\S+)\\?"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "req_id=(?P[^ ]+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\" (?P\\d+) "), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\" \\d+ (?P\\d+)"), "upsert")
.....

   logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx,resourcedetection]
      exporters: [elasticsearch/otel]

Now when we start the Otel collector with this new configuration

sudo ./otelcol --config otel.yml

We will see that we now have structured logs!!

Store and Optimize

To ensure you aren’t blowing your budget out with all this additional structured data there are few things you can do to help maximize storage efficiency.

You can use the filter processors in the Otel collector with granular filtering/dropping of irrelevant attributes to control volume going out of the collector for example.

processors:
  filter/drop_logs_without_user_attributes:
    logs:
      log_record:
        - 'attributes["user"] == nil'
  filter/drop_200_logs:
    logs:
      log_record:
        - 'attributes["status"] == "200"'

service:
  pipelines:
    logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx, filter/drop_logs_without_user_attributes, filter/drop_200_logs, resourcedetection]
      exporters: [elasticsearch/otel]

The filter processor will help reduce the noise for example if you wanted to drop the debug logs or logs from a noisy service. Great ways to keep a lid on your observability spend.

Additionally for your most critical flows and logs where you don’t want to drop any data, Elastic has you covered. In version 9.x of Elastic you now have LogsDB switched on by default.

With LogsDB, Elastic has reduced the storage footprint of log data in Elasticsearch by up to 65% allowing you to store more observability and security data without exceeding your budget, while keeping all data accessible and searchable.

LogsDB reduces log storage by up to 65%. This dramatically minimizes storage footprints by leveraging advanced compression techniques like ZSTD, delta encoding, and run-length encoding, and it also reconstructs the _source field on demand, saving about 40% more storage by not retaining the original JSON document. Synthetic _source represents the introduction of columnar storage within Elasticsearch.

Analytics

So we have our data in Elastic, it’s structured, it conforms to the idea of a wide-event log since it has lots of good context, user ids, request ids and the data is captured at the start of a request Next we’re going to look at the analytics part of this. First let's take a stab at looking at the number of Errors for each user transaction in our application.

FROM logs-generic.otel-default
| WHERE log.file.name == "access.log"
| WHERE attributes.status >= "400"
| STATS error_count = COUNT(*) BY attributes.user
| SORT error_count DESC

It’s pretty easy now to save this and put it on a dashboard, we just click the save button:

Next let's look at putting something together to show the global impact, first we will update our collector config to enrich our log data with geo location.

Update the OTTL configuration with this new line:

   log_statements:
      - context: log
        conditions:
          - 'attributes["log.file.name"] != nil and IsMatch(attributes["log.file.name"], "access.log")'
        statements:
          - merge_maps(attributes, ExtractPatterns(body, "^(?P\\S+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "^\\S+ - (?P\\S+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\\[(?P[^\\]]+)\\]"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\"(?P\\S+) "), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\"\\S+ (?P\\S+)\\?"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "req_id=(?P[^ ]+)"), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\" (?P\\d+) "), "upsert")
          - merge_maps(attributes, ExtractPatterns(body, "\" \\d+ (?P\\d+)"), "upsert")
          - set(attributes["source.address"], attributes["client_ip"]) where attributes["client_ip"] != nil

Next add a new processor (you will need to download the GeoIP database from MaxMind)

geoip:
  context: record
  source:
    from: attributes
  providers:
    maxmind:
      database_path: /opt/geoip/GeoLite2-City.mmdb

And add this to the log pipeline after the parse_nginx

service:
  pipelines:
    logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx, geoip, resourcedetection]
      exporters: [elasticsearch/otel]

Start the otel collector

sudo ./otelcol --config otel.yml

Once the data starts flowing we can add a map visualization:

Add a layer:

Use ES|QL

Use the following ES|QL

And this should give you a map showing the locations of all your NGINX server requests!

As you can see, analytics is a breeze with your new Otel data collection pipeline.

Conclusion: Beyond log aggregation to operational intelligence

The journey from basic log aggregation to structured, enriched observability represents more than a technical upgrade, it's a shift in how organizations approach system understanding and incident response. By adopting OpenTelemetry for ingestion, implementing intelligent filtering to manage costs, and leveraging LogsDB's storage optimizations, you're not just modernizing your ELK stack; you're building the foundation for proactive system management.

The structured logs, geographic enrichment, and analytical capabilities demonstrated here transform raw log data into actionable intelligence with ES|QL. Instead of reactive grepping through logs during incidents, you now have the infrastructure to identify patterns, track user journeys, and correlate issues across your entire stack before they become critical problems.

But here's the key question: Are you prepared to act on these insights? Having rich, structured data is only valuable if your organization can shift from a reactive "find and fix" mentality to a proactive "predict and prevent" approach. The real evolution isn't in your logging stack, it's in your operational culture.

Get started with this today in Elastic Serverless

Migrating Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform

Tue, 28 Apr 2026 00:00:00 GMT

The Observability Migration Platform is a CLI-driven workflow that translates supported Grafana and Datadog assets into Kibana-native outputs and produces the evidence needed to review the result. It changes migration from a manual rebuild into a translation-and-verification workflow that gets teams into Elastic Observability faster.

Migrations covered by the Observability Migration Platform

The current scope covers Datadog and Grafana. The platform can work from exported assets or live APIs, and it focuses on dashboards and alerting content on the Datadog and Grafana paths it currently covers.

Support is not identical across the two sources. Datadog has end-to-end extraction, validation, compile, upload, smoke, and verification workflows, but it currently covers a narrower slice of widgets and monitors. Grafana coverage is broader. The platform provides a practical translation pipeline for the supported paths.

The screenshots below show examples of dashboards after migration.

How the Observability Migration Platform works

At a high level, the workflow has two halves: source-aware translation on the way in and target-aware validation and delivery on the way out. That split matters because Grafana and Datadog differ not only in JSON shape, but also in query languages, panel types, controls, and alerting models.

A run starts with exported assets or live source APIs. From there, the workflow normalizes source-specific objects, chooses a translation path for each supported dashboard, panel, and alerting artifact, and emits Kibana-native output. This is where most of the source-specific logic lives: translating queries or Datadog formulas, mapping panel semantics, carrying forward controls and links where possible, and deciding when an exact translation is not the right answer.

The second half is target-aware. The emitted output can be validated against an Elastic target, compiled, and uploaded to Kibana through the shared runtime. In the happy path, that yields a working translated dashboard. In rougher cases, validation may show that a panel cannot run safely as emitted. When that happens, the workflow is designed to fail conservatively: it can mark the panel for manual review or replace it with an upload-safe placeholder instead of shipping a broken runtime panel.

Just as important, the outcome is not simply "a dashboard showed up in Kibana." The workflow also produces reviewer-facing evidence such as a migration report, manifest, verification packets, and rollout plan so you can see what translated cleanly, what was downgraded or manualized, and what still needs human judgment. Those artifacts are what make the process operationally credible: they give teams something concrete to inspect, compare, and act on.

Running the migration

The platform is CLI-driven, and a good fit for migration work that needs to be repeatable, reviewable, and easy to automate. Users can start with a representative slice of dashboards and alerting content from Grafana or Datadog, point the workflow at an Elastic target, and use that first run to understand translation quality, validation results, and how much follow-up review is required.

To run the full path against Elastic, create an Elastic Observability Serverless project, generate a Serverless project API key, and point the CLI at your Elasticsearch and Kibana endpoints:

obs-migrate migrate \
  --source grafana \
  --input-mode files \
  --input-dir ./grafana_exports \
  --output-dir ./migration_output \
  --assets all \
  --native-promql \
  --data-view "metrics-*" \
  --validate \
  --es-url "$ELASTICSEARCH_ENDPOINT" \
  --es-api-key "$KEY" \
  --kibana-url "$KIBANA_ENDPOINT" \
  --kibana-api-key "$KEY" \
  --upload

The run validates the emitted queries against Elastic, compiles the generated dashboards, uploads them to Kibana, and produces the standard migration artifacts for review.

A typical run looks like this:

Start with exported assets or live source APIs from Grafana or Datadog.
Choose the asset scope with --assets dashboards, --assets alerts, or --assets all.
Translate the supported dashboards, queries, controls, and alerting artifacts into Kibana-native output.
Validate the emitted content against an Elastic target (if configured), then compile and upload the translated dashboards for dashboard-capable runs.
Review the migration evidence, including migration_report.json, verification_packets.json, run_summary.json, etc., to understand what translated cleanly, where semantic gaps remain, and which dashboards, panels, or alert rules still require human review.
If alert rule creation is enabled, review the migrated rules (which are disabled by default) in Kibana before deciding which ones to enable or redesign.

What's next

The platform is still evolving, and will continue to gain depth and self-service capabilities. The biggest open areas are stronger measured source-to-target semantic verification, further coverage for Datadog, deeper coverage for harder query families and non-dashboard surfaces, and cleaner shared runtime contracts across the workflow.

It is also built to grow over time. The source and target boundaries are explicit by design, which gives the platform room to expand coverage and support additional source paths in the future.

In conclusion

If you are planning a move into Elastic, a good starting point is to create an Elastic Observability Serverless project. That gives you the target environment where translated dashboards and alerting content can be validated and reviewed.

To learn more about the migration workflow, talk to your Elastic representative about current access, supported coverage, and how it can help with your migration needs.

OpenTelemetry for PHP: EDOT PHP joins the OpenTelemetry project

Mon, 10 Nov 2025 00:00:00 GMT

The OpenTelemetry community has officially accepted Elastic's proposal to contribute the Elastic Distribution of OpenTelemetry for PHP (EDOT PHP) — marking an important milestone in bringing first-class observability to one of the web's most widely used languages.

For decades, PHP has powered everything from small business websites to large-scale SaaS platforms. Yet observability in PHP has often required manual setup, compilers, custom extensions, or changes to application code — challenges that limited adoption in production environments. This upcoming donation aims to change that, by making OpenTelemetry for PHP as easy to deploy as any other runtime.

What's coming

Once the contribution process is complete, EDOT PHP will become part of the OpenTelemetry project — providing a complete, production-ready distribution that's optimized for performance, simplicity, and scalability.

EDOT PHP introduces a new approach to PHP observability:

Simple installation - installing OpenTelemetry for PHP will be as straightforward as installing a standard system package. From that point, the agent automatically detects and instruments PHP applications — no code changes, no manual setup.
Automatic agent loading - works transparently in cloud and container environments without modifying application deployments.
Zero configuration - ships as a single, self-contained binary; no need to install or compile any external extensions.
Native C++ performance - a built-in serializer written in C++ reduces telemetry overhead by up to 5×.
Automatic instrumentation - instruments popular frameworks and libraries out of the box.
Inferred spans - reveals the behavior of even uninstrumented code paths, providing full trace coverage.
Automatic root spans - ensures complete traces, even in legacy or partially instrumented applications.
OpAMP readiness - while the OpenTelemetry community continues to standardize configuration schemas and management workflows, the implementation in EDOT PHP is fully prepared to support these upcoming specifications — ensuring seamless adoption once the OpAMP ecosystem matures.
Asynchronous backend communication - telemetry data is exported to the OpenTelemetry Collector or backend asynchronously, without blocking the instrumented application. This ensures that span and metric exports do not add latency to user requests or impact response times, even under heavy load.

Together, these features make EDOT PHP the first truly zero-effort observability solution for PHP — from local testing to cloud-scale production systems.

The native C++ serializer and asynchronous export pipeline in EDOT PHP reduce average request time from 49 ms to 23 ms, more than 2× faster than the pure PHP implementation.

Building on the existing foundation

EDOT PHP doesn't replace the existing OpenTelemetry PHP SDK — it extends and strengthens it. It packages the SDK, automatic instrumentation, and native extension into a single, unified agent package that works seamlessly with existing OpenTelemetry specifications and APIs.

By contributing this work, Elastic helps the OpenTelemetry community accelerate PHP adoption, align implementations across languages, and make distributed tracing truly universal.

“This isn't a hand-off — it's a collaboration. We're contributing years of development to help OpenTelemetry for PHP evolve faster, run more efficiently, and reach more users in every environment.”

Elastic Observability team

Ongoing improvements

Elastic continues to invest in advancing EDOT PHP ahead of its integration into OpenTelemetry. The team is currently focused on reducing resource usage and memory footprint, particularly in multi-worker server environments such as PHP-FPM or Apache prefork. These optimizations aim to make the agent more predictable and efficient under heavy load — ensuring that telemetry remains lightweight even in large-scale production deployments.

Beyond that, we're exploring further improvements that can enhance both performance and interoperability. Areas under investigation include smarter coordination in high-concurrency scenarios, better sharing of telemetry resources across workers, and future alignment with additional OpenTelemetry signals such as metrics and logs.

Together, these efforts will help make EDOT PHP not only faster, but also more adaptable and seamlessly integrated into diverse runtime architectures.

Why it matters

This contribution is about more than performance — it's about removing barriers. By making OpenTelemetry for PHP installable as a simple system package and automatically loaded into running applications, the project opens observability to every PHP developer, operator, and platform provider.

For the OpenTelemetry ecosystem, it fills one of the last major language gaps, extending visibility to a vast portion of the internet — all under open governance and community collaboration.

Looking ahead

In the months ahead, Elastic and the OpenTelemetry PHP SIG will work closely on the technical integration, documentation, and community onboarding process. Once the transition is complete, developers will gain a fully open, community-driven, and production-ready OpenTelemetry agent that “just works” — without friction, configuration, or code changes.

Together, we're building a future where observability just works — for every language, every framework, and every environment.

For more information:

EDOT documentation
Learn about OTLP Endpoint

Supercharge Your vSphere Monitoring with Enhanced vSphere Integration

Wed, 11 Dec 2024 00:00:00 GMT

vSphere is VMware's cloud computing virtualization platform that provides a powerful suite for managing virtualized resources. It allows organizations to create, manage, and optimize virtual environments, providing advanced capabilities such as high availability, load balancing, and simplified resource allocation. vSphere enables efficient utilization of hardware resources, reducing costs while increasing the flexibility and scalability of IT infrastructure.

With the release of an upgraded vSphere integration we now support an enhanced set of metrics and datastreams. Package version 1.15.0 onwards introduces new datastreams that significantly improve the collection of performance metrics, providing deeper insights into your vSphere environment.

This enhanced version includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.

We have expanded the performance metrics to encompass a broader range of insights across all datastreams, while also introducing new datastreams for clusters, resource pools, and networks. This enhanced integration version now includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools.

Each datastream also includes detailed alarm information, such as the alarm name, description, status (e.g. critical or warning), and the affected entity's name. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.

Overview of the Datastreams

Host Datastream: This datastream monitors the disk performance of the host, including metrics such as disk latency, average read/write bytes, uptime, and status. It also captures network metrics, such as packet information, network bandwidth, and utilization, as well as CPU and memory usage of the host. Additionally, it lists associated datastores, virtual machines, and networks within vSphere.

Virtual Machine Datastream: This datastream tracks the used and available CPU and memory resources of virtual machines, along with the uptime and status of each VM. It includes information about the host on which the VM is running, as well as detailed snapshot metrics like the number of snapshots, creation dates, and descriptions. Additionally, it provides insights into associated hosts and datastores.

Datastore Datastream: This datastream provides information on the total, used, and available capacity of datastores, along with their overall status. It also captures metrics such as the average read/write rate and lists the hosts and virtual machines connected to each datastore.
Datastore Cluster: A datastore cluster in vSphere is a collection of datastores grouped together for efficient storage management. This datastream provides details on the total capacity and free space in the storage pod, along with the list of datastores within the cluster.

Resource Pool: Resource pools in vSphere serve as logical abstractions that allow flexible allocation of CPU and memory resources. This datastream captures memory metrics, including swapped, ballooned, and shared memory, as well as CPU metrics like distributed and static CPU entitlement. It also lists the virtual machines associated with each resource pool.
Network Datastream: This datastream captures the overall configuration and status of the network, including network types (e.g., vSS, vDS). It also lists the hosts and virtual machines connected to each network.
Cluster Datastream: A Cluster in vSphere is a collection of ESXi hosts and their associated virtual machines that function as a unified resource pool. Clustering in vSphere allows administrators to manage multiple hosts and resources centrally, providing high availability, load balancing, and scalability to the virtual environment. This datastream includes metrics indicating whether HA or admission control is enabled and lists the hosts, networks, and datastores associated with the cluster.

Alarms support in vSphere Integration

Alarms are a vital part of the vSphere integration, providing real-time insights into critical events across your virtual environment. In the updated Elastic’s vSphere integration, alarms are now reported for all the entities. They include detailed information such as the alarm name, description, severity (e.g., critical or warning), affected entity, and triggered time. These alarms are seamlessly integrated into datastreams, helping administrators and SREs quickly identify and resolve issues like resource shortages or performance bottlenecks.

Example Alarm

"triggered_alarms": [
  {
    "description": "Default alarm to monitor host memory usage",
    "entity_name": "host_us",
    "id": "alarm-4.host-12",
    "name": "Host memory usage",
    "status": "red",
    "triggered_time": "2024-08-28T10:31:26.621Z"
  }
]

This example highlights a triggered alarm for monitoring host memory usage, indicating a critical status (red) for the host "host_us." Such alarms empower teams to act swiftly and maintain the stability of their vSphere environment.

Lets Try It Out!

The new vSphere integration in Elastic Cloud is more than just a monitoring tool; it’s a comprehensive solution that empowers you to manage and optimize your virtual environments effectively. With deeper insights and enhanced data granularity, you can ensure high availability, improved load balancing, and smarter resource allocation. Spin up an Elastic Cloud, and start monitoring your vSphere infrastructure.

The next evolution of observability: unifying data with OpenTelemetry and generative AI

Wed, 11 Jun 2025 00:00:00 GMT

The Observability industry today stands at a critical juncture. While our applications generate more telemetry data than ever before, this wealth of information typically exists in siloed tools, separate systems for logs, metrics, and traces. Meanwhile, Generative AI is hurtling toward us like an asteroid about to make a tremendous impact on our industry.

As SREs, we've grown accustomed to jumping between dashboards, log aggregators, and trace visualizers when troubleshooting issues. But what if there was a better way? What if AI could analyze all your observability data holistically, answering complex questions in natural language, and identifying root causes automatically?

This is the next evolution of observability. But to harness this power, we need to rethink how we collect, store, and analyze our telemetry data.

The problem: siloed data limits AI effectiveness

Traditional observability setups separate data into distinct types:

Metrics: Numeric measurements over time (CPU, memory, request rates)
Logs: Detailed event records with timestamps and context
Traces: Request journeys through distributed systems
Profiles: Code-level execution patterns showing resource consumption and performance bottlenecks at the function/line level

This separation made sense historically due to the way the industry evolved. Different data types have traditionally had different cardinality, structure, access patterns and volume characteristics. However, this approach creates significant challenges for AI-powered analysis:

Metrics (Prometheus) → "CPU spiked at 09:17:00"
Logs (ELK) → "Exception in checkout service at 09:17:32" 
Traces (Jaeger) → "Slow DB queries in order-service at 09:17:28"
Profiles (pyroscope) -> "calculate_discount() is taking 75% of CPU time"

When these data sources live in separate systems, AI tools must either:

Work with an incomplete picture (seeing only metrics but not the related logs)
Rely on complex, brittle integrations that often introduce timing skew
Force developers to manually correlate information across tools

Imagine asking an AI, "Why did checkout latency spike at 09:17?" To answer comprehensively, it needs access to logs (to see the stack trace), traces (to understand the service path), and metrics (to identify resource strain). With siloed tools, the AI either sees only fragments of the story or requires complex ETL jobs that are slower than the incident itself.

Why traditional machine learning (ML) falls short

Traditional machine learning for observability typically focuses on anomaly detection within a single data dimension. It can tell you when metrics deviate from normal patterns, but struggles to provide context or root cause.

ML models trained on metrics alone might flag a latency spike, but can't connect it to a recent deployment (found in logs) or identify that it only affects requests to a specific database endpoint (found in traces). They behave like humans with extreme tunnel vision, seeing only a fraction of the relevant information and only the information that a specific vendor has given you an opinionated view into.

This limitation becomes particularly problematic in modern microservice architectures where problems frequently cascade across services. Without a unified view, traditional ML can detect symptoms but struggles to identify the underlying cause.

The solution: unified data with enriched logs

The solution is conceptually simple but transformative: unify metrics, logs, and traces into a single data store, ideally with enriched logs that contain all signals about a request in a single JSON document. We're about to see a merging of signals.

Think of traditional logs as simple text lines:

[2025-05-19 09:17:32] ERROR OrderService - Failed to process checkout for user 12345

Now imagine an enriched log that contains not just the error message, but also:

The complete distributed trace context
Related metrics at that moment
System environment details
Business context (user ID, cart value, etc.)

This approach creates a holistic view where every signal about the same event sits side-by-side, perfect for AI analysis.

How generative AI changes things

Generative AI differs fundamentally from traditional ML in its ability to:

Process unstructured data: Understanding free-form log messages and error text
Maintain context: Connecting related events across time and services
Answer natural language queries: Translating human questions into complex data analysis
Generate explanations: Providing reasoning alongside conclusions
Surface hidden patterns: Discovering correlations and anomalies in log data that would be impractical to find through manual analysis or traditional querying

With access to unified observability data, GenAI can analyze complete system behavior patterns and correlate across previously disconnected signals.

For example, when asked "Why is our checkout service slow?" a GenAI model with access to unified data can:

Analyze unified enriched logs to identify which specific operations are slow and to find errors or warnings in those components
Check attached metrics to understand resource utilization
Correlate all these signals with deployment events or configuration changes
Present a coherent explanation in natural language with supporting graphs and visualizations

Implementing unified observability with OpenTelemetry

OpenTelemetry provides the perfect foundation for unified observability with its consistent schema across metrics, logs, and traces. Here's how to implement enriched logs in a Java application:

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.DoubleHistogram;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Scope;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;

public class OrderProcessor {
    private static final Logger logger = LoggerFactory.getLogger(OrderProcessor.class);
    private final Tracer tracer;
    private final DoubleHistogram cpuUsageHistogram;
    private final OperatingSystemMXBean osBean;

    public OrderProcessor(OpenTelemetry openTelemetry) {
        this.tracer = openTelemetry.getTracer("order-processor");
        Meter meter = openTelemetry.getMeter("order-processor");
        this.cpuUsageHistogram = meter.histogramBuilder("system.cpu.load")
                                      .setDescription("System CPU load")
                                      .setUnit("1")
                                      .build();
        this.osBean = ManagementFactory.getOperatingSystemMXBean();
    }

    public void processOrder(String orderId, double amount, String userId) {
        Span span = tracer.spanBuilder("processOrder").startSpan();
        try (Scope scope = span.makeCurrent()) {
            // Add attributes to the span
            span.setAttribute("order.id", orderId);
            span.setAttribute("order.amount", amount);
            span.setAttribute("user.id", userId);
            // Populate MDC for structured logging
            MDC.put("trace_id", span.getSpanContext().getTraceId());
            MDC.put("span_id", span.getSpanContext().getSpanId());
            MDC.put("order_id", orderId);
            MDC.put("order_amount", String.valueOf(amount));
            MDC.put("user_id", userId);
            // Record CPU usage metric associated with the current trace context
            double cpuLoad = osBean.getSystemLoadAverage();
            if (cpuLoad >= 0) {
                cpuUsageHistogram.record(cpuLoad);
                MDC.put("cpu_load", String.valueOf(cpuLoad));
            }
            // Log a structured message
            logger.info("Processing order");
            // Simulate business logic
            // ...
            span.setAttribute("order.status", "completed");
            logger.info("Order processed successfully");
        } catch (Exception e) {
            span.recordException(e);
            span.setAttribute("order.status", "failed");
            logger.error("Order processing failed", e);
        } finally {
            MDC.clear();
            span.end();
        }
    }
}

This code demonstrates how to:

Create a span for the operation
Add business attributes
Add current CPU usage
Link everything with consistent IDs
Record exceptions and outcomes in the backend system

When configured with an appropriate exporter, this creates enriched logs that contain both application events and their complete context.

Powerful queries across previously separate data

With data that has not yet been enriched, there is still hope. Firstly with GenAI powered ingestion it is possible to extract key fields to help correlate data such as a session id's. This will help you enrich your logs so they get the structure they need to behave like other signals. Below we can see Elastic's Auto Import mechanism that will automatically generate ingest pipelines and pull unstructured information from logs into a structured format perfect for analytics.

Once you have this data in the same data store, you can perform powerful join queries that were previously impossible. For example, finding slow database queries that affected specific API endpoints:

FROM logs-nginx.access-default 
| LOOKUP JOIN .ds-logs-mysql.slowlog-default-2025.05.01-000002 ON request_id 
| KEEP request_id, mysql.slowlog.query, url.query 
| WHERE mysql.slowlog.query IS NOT NULL

This query joins web server logs with database slow query logs, allowing you to directly correlate user-facing performance with database operations.

For GenAI interfaces, these complex queries can be generated automatically from natural language questions:

"Show me all checkout failures that coincided with slow database queries"

The AI translates this into appropriate queries across your unified data store, correlating application errors with database performance.

Real-world applications and use cases

Natural language investigation

Imagine asking your observability system:

"Why did checkout latency spike at 09:17 yesterday?"

A GenAI-powered system with unified data could respond:

"Checkout latency increased by 230% at 09:17:32 following deployment v2.4.1 at 09:15. The root cause appears to be increased MySQL query times in the inventory-service. Specifically, queries to the 'product_availability' table are taking an average of 2300ms compared to the normal 95ms. This coincides with a CPU spike on database host db-03 and 24 'Lock wait timeout' errors in the inventory service logs."

Here's an example of Claude Desktop connected to Elastic's MCP (Model Context Protocol) Server which demonstrates how powerful natural language investigations can be. Here we ask Claude "analyze my web traffic patterns" and as you can see it has correctly identified that this is in our demo environment.

Unknown problem detection

GenAI can identify subtle patterns by correlating signals that would be missed in siloed systems. For example, it might notice that a specific customer ID appears in error logs only when a particular network path is taken through your microservices—indicating a data corruption issue affecting only certain user flows.

Predictive maintenance

By analyzing the unified historical patterns leading up to previous incidents, GenAI can identify emerging problems before they cause outages:

"Warning: Current load pattern on authentication-service combined with increasing error rates in user-profile-service matches 87% of the signature that preceded the April 3rd outage. Recommend scaling user-profile-service pods immediately."

The future: agentic AI for observability

The next frontier is agentic AI, systems that not only analyze but take action automatically.

These AI agents could:

Continuously monitor all observability signals
Autonomously investigate anomalies
Implement fixes for known patterns
Learn from the effectiveness of previous interventions

For example, an observability agent might:

Detect increased error rates in a service
Analyze logs and traces to identify a memory leak
Correlate with recent code changes
Increase the memory limit temporarily
Create a detailed ticket with the root cause analysis
Monitor the fix effectiveness

This is about creating systems that understand your application's behavior patterns deeply enough to maintain them proactively. See how this works in Elastic Observability, in the screenshot at the end of the RCA we are sending an email summary but this could trigger any action.

Business outcomes

Unifying observability data for GenAI analysis delivers concrete benefits:

Faster resolution times: Problems that previously required hours of manual correlation can be diagnosed in seconds
Fewer escalations: Junior engineers can leverage AI to investigate complex issues before involving specialists
Improved system reliability: Earlier detection and resolution of emerging issues
Better developer experience: Less time spent context-switching between tools
Enhanced capacity planning: More accurate prediction of resource needs

Implementation steps

Ready to start your observability transformation? Here's a practical roadmap:

Adopt OpenTelemetry: Standardize on OpenTelemetry for all telemetry data collection and use it to generate enriched logs.
Choose a unified storage solution: Select a platform that can efficiently store and query metrics, logs, traces and enriched logs together
Enrich your telemetry: Update application instrumentation to include relevant context
Create correlation IDs: Ensure every request has identifiers
Implement semantic conventions: Follow consistent naming patterns across your telemetry data
Start with focused use cases: Begin with high-value scenarios like checkout flows or critical APIs
Leverage GenAI tools: Integrate tools that can analyze your unified data and respond to natural language queries

Remember, AI can only be as smart as the data you feed it. The quality and completeness of your telemetry data will determine the effectiveness of your AI-powered observability.

Generative AI: an evolutionary catalyst for observability

The unification of observability data for GenAI analysis represents an evolutionary leap forward comparable to the transition from Internet 1.0 to 2.0. Early adopters will gain a significant competitive advantage through faster problem resolution, improved system reliability, and more efficient operations. GAI is a huge step for increasing observability maturity and moving your team to a more proactive stance.

Think of traditional observability as a doctor trying to diagnose a patient while only able to see their heart rate. Unified observability with GenAI is like giving that doctor a complete health picture, vital signs, lab results, medical history, and genetic data all accessible through natural conversation.

As SREs, we stand at the threshold of a new era in system observability. The asteroid of GenAI isn't a threat to be feared, it's an opportunity to evolve our practices and tools to build more reliable, understandable systems. The question isn't whether this transformation will happen, but who will lead it.

Will you?

From Uptime to Synthetics in Elastic: Your migration Playbook

Thu, 11 Sep 2025 00:00:00 GMT

Have you seen the warning that Uptime is deprecated and want to know how to easily migrate to Synthetics? Then you are in the right place. Starting with version 8.15.0, uptime checks have been deprecated in favor of synthetic monitoring.

Many users may have a large number of TCP, ICMP, and HTTP monitors and need to migrate them to Synthetics. In this guide, we will explain how to perform this migration easily while ensuring that it will be future-proof or able to develop more advanced checks such as Browser monitors.

First, we must consider the number of monitors to migrate; if the number is small, the easiest way would be to do it manually through the Synthetics UI. However, in this guide we will assume that we have dozens or hundreds of monitors to migrate, and doing it manually in the Synthetics UI is not an option.

Private Location

Traditionally, uptime monitors required a Heartbeat to be deployed in your infrastructure, which indirectly allowed you to monitor endpoints or hosts on your private network. If this is still a requirement, you will need to either configure Private Location or allow Elastic’s global managed infrastructure to access your private endpoints (only on ECH & Serverless).

In this guide, we will use Private Locations, which will allow you to monitor both internal and external resources. More details can be found here: Monitor resources on private networks

Step 1: Set up Fleet Server and Elastic Agent

Private Locations are simply Elastic Agents enrolled in Fleet and managed through an agent policy.

If you don't have a Fleet Server yet, start setting up a Fleet Server. This step is not necessary if you use ECH, as it comes by default.

Next, you will need to create an Agent Policy. Go to Observability → Monitors (Synthetics) → Settings (top right) → Private Location → + Create Location

Fill in the fields and create a new policy for this Private Location. It is important to know that Private Location should be set up against an agent policy that runs on a single Elastic Agent.

Step 2: Deploy the Elastic Agent

Now we need to deploy the Elastic Agent that will be responsible for running all the monitors. We can use the same host we were using for Heartbeat. There is only one requirement: we must be able to run Docker containers, since to take advantage of all the features of Synthetics, we must use the elastic-agent-complete Docker Image.

Go to Fleet –> Enrollment tokens and note the enrollment token relevant to the policy you just created for the Private Location. Now go to Settings and note the default Fleet server host URL.
On the host, run the following commands. For more information on running Elastic Agent with Docker, refer to Run Elastic Agent in a container.

docker run \
  --env FLEET_ENROLL=1 \
  --env FLEET_URL={fleet_server_host_url} \
  --env FLEET_ENROLLMENT_TOKEN={enrollment_token} \
  --cap-add=NET_RAW \
  --cap-add=SETUID \
  --rm docker.elastic.co/elastic-agent/elastic-agent-complete:9.3.2

Synthetic Project

At this point, we already have the location from which our Synthetic monitors will run. Now we need to load our Uptime monitors as Synthetics.

As we mentioned earlier, there are two ways to do this: either manually through the Synthetics UI or through a Synthetics Project. In our case, since we have so many monitors to migrate and don't want to do it manually, we will use Synthetics Projects.

The great thing about Synthetics Project is that it has some backward compatibility with the definition of monitors in heartbeat.yml and we will be leveraging it.

What's Synthetics project?

Synthetics project is the most powerful and flexible way to manage synthetic monitors in Elastic, based on the Infrastructure as Code principle and compatible with Git-Ops flows. Instead of configuring monitors from the interface, you define them as code: .yml files for lightweight monitors and JavaScript or TypeScript scripts for browser-type monitors (journeys).

This approach allows you to structure your monitors in a repository, version them with Git, validate them, and deploy them automatically using CI/CD flows, providing traceability, reviews, and consistent deployments.

Step 3: Initialize your Synthetics project

You will no longer need to connect to the hosts where you deployed the Elastic Agent, as the remaining steps can be performed locally as long as you have connectivity to Kibana!

Since Synthetics Projects is based on Node.js, make sure you have it installed.

Install the package:

npm install -g @elastic/synthetics

Confirm your system is setup correctly:

npx @elastic/synthetics -h

Start by creating your first Synthetics project. Run the command below to create a new Synthetics project named synthetic-project-test in the current directory.

npx @elastic/synthetics init synthetic-project-test

Follow the prompt instructions to configure the default variables for your Synthetics project. Make sure to at least select your Private Location. Once that’s done, set the SYNTHETICS_API_KEY environment variable in your terminal, which allows the project to authenticate with Kibana.
1. To generate an API key go to Synthetics Kibana.
2. Click Settings.
3. Switch to the Project API Keys tab.
4. Click Generate Project API key.

More details for all the steps can be found here: Create monitors with a Synthetics project

Step 4: Add your `heartbeat.yml` files

Once the project is initialized, access the folder it has created and take a look at the project structure:

journeys is where you’ll add .ts and .js files defining your browser monitors. It currently contains files defining sample monitors.
lightweight is where we’ll add our heartbeat.yml files defining our lightweight monitors. It currently contains a file defining sample monitors.

Therefore, all we have to do is copy our heartbeat.yml files to this lightweight folder. Before copying heartbeat.yml, keep in mind that we don't need all the content, we are only interested in the heartbeat.monitors part.
We recommend considering splitting the file into logical groups. Instead of maintaining a single large YAML file, you could create multiple smaller YAML files, with each file representing either a single check or a group of related checks. This approach may simplify management and improve compatibility with GitOps workflows.
Each YAML file should look like this:

carles@synthetics-migration:synthetic-project-test/lightweight# cat heartbeat.yml

heartbeat.monitors:
- type: icmp
  schedule: '@every 10s'
  hosts: ["localhost"]
  id: my-icmp-service-synth
  name: My ICMP Service - Synthetic
- type: tcp
  schedule: '@every 10s'
  hosts: ["myremotehost:8123"]
  mode: any
  id: my-tcp-service-synth
  name: My TCP Service Synthetic
- type: http
  schedule: '@every 10s'
  urls: ["http://elastic.co"]
  id: my-http-service-synth
  name: My HTTP Service Synthetic

What we just did is define different ICMP, TCP, and HTTP checks as code.

Now we need to ask Synthetics project to create the monitors in Kibana based on what we have defined in our YAML files:

npx @elastic/synthetics push --auth $SYNTHETICS_API_KEY --url

Unfortunately, we do not support a 1-to-1 mapping of the heartbeat schema to the lightweight schema, so you may encounter some errors during the execution of this command. One example is the definition of schedule. Heartbeat supports the use of crontab expressions, but Project requires the use of @every syntax.

If no syntax errors were found, the command output will show that the monitors have been successfully created in Kibana!

Then, go to Synthetics in Kibana. You should see your newly pushed monitors running. You can also go to the Management tab to see the monitors' configuration settings.

Visualizing OpenTelemetry Data in Elastic with OpenTelemetry Content Packages

Fri, 10 Apr 2026 00:00:00 GMT

If you've been in the observability space for the last couple of years, you've seen OpenTelemetry go from "promising standard" to the default choice for collecting metrics, logs, and traces. Elastic has been in that journey from early on — which is why we built the Elastic Distributions of OpenTelemetry (EDOT): a hardened, production-ready suite of OTel components including the EDOT Collector and language SDKs, tuned for infrastructure and application monitoring without the typical setup overhead.

EDOT is now generally available. The collector, the SDKs, the whole stack — production-ready, enterprise-supported, no asterisks.

But here's the thing: getting your data into Elastic is only half the job. The harder half, in practice, is what happens after. Someone still has to build the dashboards, write the alert rules, and figure out which SLOs are worth tracking — before any of it is useful.

That gap is what OpenTelemetry Content Packages are designed to close.

What Are OpenTelemetry Content Packages?

Elastic's traditional Beats-based integrations always bundled data collection and visualizations together — you got curated dashboards and alerts the moment you turned something on. As Elastic moves to an OpenTelemetry-first world, that same philosophy carries over, but the model is cleaner.

OpenTelemetry Content Packs are purely about the observability assets for a given service. No data collection config is bundled in, because in an OTel world, the collector handles that. Each package contains:

Dashboards — curated, pre-built Kibana visualizations tailored to the service being monitored
Alert rules — pre-configured alerting rules that fire on meaningful thresholds, helping teams minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR)
SLO templates — ready-made Service Level Objective definitions you can apply immediately to track reliability targets, error budgets, and burn rates

More asset types are planned for future packages as the content pack model continues to evolve.

How Does It Work?

The core idea is simple: as soon as data arrives in Elastic, the right dashboards, alert rules, and SLO templates are ready to use. The content package activates based on the incoming data, regardless of how that data was collected.

One of the most powerful aspects of this system is automatic installation. When Elastic detects that data for a particular service has started arriving in Elasticsearch, the corresponding content pack is installed automatically — no manual steps, no hunting through the integrations catalog. By the time you open Kibana, your dashboards are already there waiting for you, your alert rules are ready to be enabled, and your SLO templates are pre-loaded.

To get the data flowing in the first place, we need to configure the collector — a YAML file that defines the building blocks of your telemetry pipeline:

Receivers — define what data to collect and from where. Each service has its own receiver (for example, the MySQL receiver scrapes metrics directly from the database).
Exporters — define where the collected data is sent. In our case, we use the Elasticsearch exporter, which ships the telemetry data directly into Elasticsearch in OpenTelemetry native format.
Pipelines — wire the receivers and exporters together, defining the flow of data through the collector.

Once this configuration is in place and the collector is running, data starts flowing into Elasticsearch — and the content pack takes it from there.

Data Sources

OpenTelemetry data can reach Elastic through any of the following:

EDOT Collector — the Elastic Distribution of the OpenTelemetry Collector, embedded in or used alongside the Elastic Agent
Upstream OTel Collector — the standard community OpenTelemetry Collector (Contrib or custom builds)
EDOT Cloud Forwarder (ECF) — a serverless OTel Collector that collects telemetry from AWS, GCP, and Azure (VPC Flow Logs, CloudTrail, CloudWatch, and more) and forwards it directly to Elastic Observability, with no infrastructure to manage

The content pack doesn't care how the data arrived — only that it's there.

Seeing It in Practice: MySQL Monitoring

Take a team running MySQL who wants to track query throughput, connection counts, buffer pool utilization, and slow query rates — and get alerted before small problems turn into 2am incidents. Historically, that means hours of dashboard building, custom alert queries, and a lot of guesswork about which metrics actually matter.

With the MySQL OpenTelemetry Assets Package, that work is already done. Here's how the whole thing comes together.

Step 1: Get the Data In

The data pipeline is driven by a collector configuration that defines receivers (where to scrape data from), processors (how to enrich or transform it), and exporters (where to send it — in this case, Elasticsearch).

Regardless of whether you use the EDOT Collector or the Upstream OTel Collector, the fundamental configuration structure is the same. The configuration below uses separate receivers for the primary and replica instances, because replication metrics are only available on replicas. Replace the placeholders with your actual endpoints, credentials, and Elasticsearch details.

receivers:
  mysql/primary:
    endpoint: 
    username: 
    password: 
    collection_interval: 10s
    statement_events:
      digest_text_limit: 120
      limit: 250
    query_sample_collection:
      max_rows_per_query: 100
    events:
      db.server.query_sample:
        enabled: true
      db.server.top_query:
        enabled: true
    metrics:
      mysql.client.network.io:
        enabled: true
      mysql.connection.errors:
        enabled: true
      mysql.max_used_connections:
        enabled: true
      mysql.query.client.count:
        enabled: true
      mysql.query.count:
        enabled: true
      mysql.query.slow.count:
        enabled: true
      mysql.table.rows:
        enabled: true
      mysql.table.size:
        enabled: true

processors:
  resourcedetection:
    detectors: [system, env]

exporters:
  elasticsearch/otel:
    endpoint: 
    api_key: 
    mapping:
      mode: otel

service:
  pipelines:
    metrics:
      receivers: [mysql/primary, mysql/replica]
      processors: [resourcedetection]
      exporters: [elasticsearch/otel]

The MySQL receiver scrapes metrics and events from the database at the configured interval and emits them as OpenTelemetry metrics. These flow through the pipeline and land in Elasticsearch, ready to be visualized.

Step 2: Open Kibana — Everything's Already There

Dashboards

As soon as the MySQL metrics and events arrive in Elasticsearch, the MySQL OpenTelemetry Assets Package is automatically installed in the background. By the time you navigate to Kibana, the dashboards are already populated and waiting.

Users immediately get visibility into:

Active and max connections
Query throughput — statements executed per second
InnoDB buffer pool hit rate and memory usage
Slow query count and trends
Table lock waits and contention
Bytes sent and received over time
Replication lag (for replicated setups)

No manual field mapping. No dashboard building from scratch. Just data in, insights out.

Below are some screenshots of the MySQL OpenTelemetry dashboard in Kibana, showing the out-of-the-box visualizations that are automatically available as soon as your data starts flowing in.

Overview Dashboard

Queries Dashboard

Availability Dashboard

Alert Rules, Ready to Enable

The package includes six pre-built alert rules — covering high connection error rates, slow query spikes, thread saturation, replication lag, buffer pool dirty page ratio, and row lock contention — each with recommended thresholds and severity levels. These are available immediately on install and can be enabled, tuned, and extended directly in Kibana without any custom query authoring. Below is an example of one of the alerts.

SLO Templates, Pre-Loaded

Four SLO templates are included out of the box, tracking replication lag, connection exhaustion errors, slow query rate, and connected thread count — each with a pre-configured target and 30-day rolling window. Teams can adopt them as-is or tune the thresholds to match their own reliability requirements.

What's Available Today

The MySQL OpenTelemetry Assets Package is just one example from a growing library of OpenTelemetry Content Packages that Elastic has already built out. Content packs are available for a range of services — and we have also started extending this to the cloud, with initial support for Cloud Service Provider integrations that use the EDOT Cloud Forwarder (ECF) to bring AWS, GCP, and Azure telemetry into Elastic with ready-made dashboards.

The same pattern holds across all of them — data in, and a complete observability package (dashboards, alert rules, SLO templates) instantly ready — whether you're monitoring a self-managed database or cloud-native services from your preferred cloud service provider.

Where This Is Going

The next step worth watching is OTel Integration Packages, which will let you push collector configurations directly from the Kibana UI — making the entire setup experience point-and-click, from data collection through to visualization, with no YAML editing required.

Get Started

Ready to try it? Start with the EDOT Collector documentation and explore the growing library of OpenTelemetry content packages in Kibana's Integrations page.

Elastic Observability Labs - Observability

Elastic MongoDB Atlas Integration: Complete Database Monitoring and Observability

Why MongoDB Atlas Observability Matters

Integration Architecture and Data Streams

Log Data Streams

Metrics Data Streams

Implementation Guide

Setting Up the Integration

Considerations and Limitations

Real-World Use Cases and Benefits

Security and Compliance Monitoring

Performance Optimization and Capacity Planning

Operational Excellence

Let's Try It!

Elastic Ramen: A CLI harness for SRE investigation and remediation

Why the terminal matters

How Ramen works

Get started

What is next

Summary

Getting more from your logs with OpenTelemetry

Getting more from your logs with OpenTelemetry

Ingestion

Store and Optimize

Analytics

Conclusion: Beyond log aggregation to operational intelligence

Migrating Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform

Migrations covered by the Observability Migration Platform

How the Observability Migration Platform works

Running the migration

What's next

In conclusion

OpenTelemetry for PHP: EDOT PHP joins the OpenTelemetry project

What's coming

Building on the existing foundation

Ongoing improvements

Why it matters

Looking ahead

Supercharge Your vSphere Monitoring with Enhanced vSphere Integration

Overview of the Datastreams

Alarms support in vSphere Integration

Example Alarm

Lets Try It Out!

The next evolution of observability: unifying data with OpenTelemetry and generative AI

The problem: siloed data limits AI effectiveness

Why traditional machine learning (ML) falls short

The solution: unified data with enriched logs

How generative AI changes things

Implementing unified observability with OpenTelemetry

Powerful queries across previously separate data

Real-world applications and use cases

Natural language investigation

Unknown problem detection

Predictive maintenance

The future: agentic AI for observability

Business outcomes

Implementation steps

Generative AI: an evolutionary catalyst for observability

From Uptime to Synthetics in Elastic: Your migration Playbook

Private Location

Step 1: Set up Fleet Server and Elastic Agent

Step 2: Deploy the Elastic Agent

Synthetic Project

What's Synthetics project?

Step 3: Initialize your Synthetics project

Step 4: Add your heartbeat.yml files

Visualizing OpenTelemetry Data in Elastic with OpenTelemetry Content Packages

What Are OpenTelemetry Content Packages?

How Does It Work?

Data Sources

Seeing It in Practice: MySQL Monitoring

Step 1: Get the Data In

Step 2: Open Kibana — Everything's Already There

Dashboards

Alert Rules, Ready to Enable

SLO Templates, Pre-Loaded

What's Available Today

Where This Is Going

Get Started

Step 4: Add your `heartbeat.yml` files