<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Observability Labs - Observability</title>
        <link>https://www.elastic.co/observability-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Tue, 28 Apr 2026 16:44:07 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Observability Labs - Observability</title>
            <url>https://www.elastic.co/observability-labs/assets/observability-labs-thumbnail.png</url>
            <link>https://www.elastic.co/observability-labs</link>
        </image>
        <copyright>© 2026. Elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[Elastic MongoDB Atlas Integration: Complete Database Monitoring and Observability]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elastic-mongodb-atlas-integration</link>
            <guid isPermaLink="false">elastic-mongodb-atlas-integration</guid>
            <pubDate>Thu, 24 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Comprehensive MongoDB Atlas monitoring with Elastic's integration - track performance, security, and operations through real-time alerts, audit logs, and actionable insights.]]></description>
            <content:encoded><![CDATA[<p>In today's data-driven landscape, <a href="https://www.mongodb.com/products/platform/atlas-database">MongoDB Atlas</a> has emerged as the leading multi-cloud developer data platform, enabling organizations to work seamlessly with document-based data models while ensuring flexible schema design and easy scalability. However, as your Atlas deployments grow in complexity and criticality, comprehensive observability becomes essential for maintaining optimal performance, security, and reliability.</p>
<p>The Elastic <a href="https://www.elastic.co/docs/reference/integrations/mongodb_atlas">MongoDB Atlas integration</a> transforms how you monitor and troubleshoot your Atlas infrastructure by providing deep insights into every aspect of your deployment—from real-time alerts and audit trails to detailed performance metrics and organizational activities. This integration empowers teams to minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) while gaining actionable insights for capacity planning and performance optimization.</p>
<h2>Why MongoDB Atlas Observability Matters</h2>
<p>MongoDB Atlas abstracts much of the operational complexity of running MongoDB, but this doesn't eliminate the need for monitoring. Modern applications demand:</p>
<ul>
<li><strong>Proactive Issue Detection</strong>: Identify performance bottlenecks, resource constraints, and security threats before they impact users</li>
<li><strong>Comprehensive Audit Trails</strong>: Track database operations, user activities, and configuration changes for compliance and security</li>
<li><strong>Performance Optimization</strong>: Monitor query performance, resource utilization, and capacity trends to optimize costs and user experience</li>
<li><strong>Operational Insights</strong>: Understand organizational activities, project changes, and infrastructure events across your multi-cloud deployments</li>
</ul>
<p>The Elastic <a href="https://www.elastic.co/docs/reference/integrations/mongodb_atlas">MongoDB Atlas integration</a> addresses these needs by collecting comprehensive telemetry data and presenting it through powerful visualizations and alerting capabilities.</p>
<h2>Integration Architecture and Data Streams</h2>
<p>The <a href="https://www.elastic.co/docs/reference/integrations/mongodb_atlas">MongoDB Atlas integration</a> leverages the <a href="https://www.mongodb.com/docs/atlas/reference/api-resources-spec/v2/">Atlas Administration API</a> to collect eight distinct data streams, each providing specific insights into different aspects of your Atlas deployment:</p>
<h3>Log Data Streams</h3>
<p><strong>Alert Logs</strong>: Capture real-time alerts generated by your Atlas instances, covering resource utilization thresholds (CPU, memory, disk space), database operations, security issues, and configuration changes. These alerts provide immediate visibility into critical events that require attention.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/alert_logs.png" alt="Alert Datastream" /></p>
<p><strong>Database Logs</strong>: Collect comprehensive operational logs from MongoDB instances, including incoming connections, executed commands, performance diagnostics, and issues encountered. These logs are invaluable for troubleshooting performance problems and understanding database behavior.</p>
<p><strong>MongoDB Audit Logs</strong>: Enable administrators to track system activity across deployments with multiple users and applications. These logs capture detailed events related to database operations including insertions, updates, deletions, user authentication, and access patterns—essential for security compliance and forensic analysis.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/audit_logs.png" alt="Audit Datastream" /></p>
<p><strong>Organization Logs</strong>: Provide enterprise-level visibility into organizational activities, enabling tracking of significant actions involving database operations, billing changes, security modifications, host management, encryption settings, and user access management across teams.</p>
<p><strong>Project Logs</strong>: Offer project-specific event tracking, capturing detailed records of configuration modifications, user access changes, and general project activities. These logs are crucial for project-level auditing and change management.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/project_logs.png" alt="Project Datastream" /></p>
<h3>Metrics Data Streams</h3>
<p><strong>Hardware Metrics</strong>: Collect comprehensive hardware performance data including CPU usage, memory consumption, JVM memory utilization, and overall system resource metrics for each process in your Atlas groups.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/hardware_metrics.png" alt="Hardware Datastream" /></p>
<p><strong>Disk Metrics</strong>: Monitor storage performance with detailed insights into I/O operations, read/write latency, and space utilization across all disk partitions used by MongoDB Atlas. These metrics help identify storage bottlenecks and plan capacity expansion.</p>
<p><strong>Process Metrics</strong>: Gather host-level metrics per MongoDB process, including detailed CPU usage patterns, I/O operation counts, memory utilization, and database-specific performance indicators like connection counts, operation rates, and cache utilization.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/process_metrics.png" alt="Process Datastream" /></p>
<h2>Implementation Guide</h2>
<h3>Setting Up the Integration</h3>
<p>Getting started with MongoDB Atlas observability requires establishing API access and configuring the integration in Kibana:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/setup.png" alt="Setup" /></p>
<ol>
<li>
<p><strong>Generate Atlas API Keys</strong>: Create <a href="https://www.mongodb.com/docs/atlas/configure-api-access/#grant-programmatic-access-to-an-organization">programmatic API keys</a> with Organization Owner permissions in the Atlas console, then invite these keys to your target projects with appropriate roles (Project Read Only for alerts/metrics, Project Data Access Read Only for audit logs).</p>
</li>
<li>
<p><strong>Enable Prerequisites</strong>: Enable database auditing in Atlas for projects where you want to collect audit and database logs. Gather your <a href="https://www.mongodb.com/docs/atlas/app-services/apps/metadata/#find-a-project-id">Project ID</a> and Organization ID from the Atlas UI.</p>
</li>
<li>
<p><strong>Configure in Kibana</strong>: Navigate to Management &gt; Integrations, search for &quot;MongoDB Atlas,&quot; and add the integration using your API credentials.</p>
</li>
</ol>
<p>The integration supports different permission levels for each data stream, ensuring you can collect operational metrics with minimal privileges while protecting sensitive audit data with elevated permissions.</p>
<h3>Considerations and Limitations</h3>
<ul>
<li><strong>Cluster Support</strong>: Log collection doesn't support M0 free clusters, M2/M5 shared clusters, or serverless instances</li>
<li><strong>Historical Data</strong>: Most log streams collect the previous 30 minutes of historical data</li>
<li><strong>Performance Impact</strong>: Large time spans may cause request timeouts; adjust HTTP Client Timeout accordingly</li>
</ul>
<h2>Real-World Use Cases and Benefits</h2>
<h3>Security and Compliance Monitoring</h3>
<p><strong>Audit Trail Management</strong>: Organizations in regulated industries leverage the audit logs to maintain comprehensive records of database access and modifications. The integration automatically parses and indexes audit events, making it easy to search for specific user activities, failed authentication attempts, or unauthorized access patterns.</p>
<p><strong>Security Incident Response</strong>: When security events occur, teams can quickly correlate alert logs with audit trails to understand the scope and timeline of incidents.</p>
<h3>Performance Optimization and Capacity Planning</h3>
<p><strong>Proactive Resource Management</strong>: By monitoring disk, hardware, and process metrics, teams can identify resource constraints before they impact application performance. For example, tracking disk I/O latency trends helps predict when storage upgrades are needed.</p>
<p><strong>Query Performance Analysis</strong>: Database logs combined with process metrics provide insights into slow queries, connection patterns, and resource utilization that enable database performance tuning.</p>
<h3>Operational Excellence</h3>
<p><strong>Multi-Environment Monitoring</strong>: Organizations running Atlas across development, staging, and production environments can standardize monitoring across all environments while maintaining environment-specific alerting thresholds.</p>
<p><strong>Change Management</strong>: Project and organization logs provide complete audit trails for infrastructure changes, enabling teams to correlate application issues with recent configuration modifications.</p>
<h2>Let's Try It!</h2>
<p>The MongoDB Atlas integration delivers comprehensive database observability that enables proactive management and optimization of your Atlas deployments. With pre-built dashboards and alerting capabilities, teams can gain immediate value while leveraging rich data streams for advanced analytics and custom monitoring solutions.</p>
<p>Deploy a cluster on <a href="https://www.elastic.co/cloud/">Elastic Cloud</a> or <a href="https://www.elastic.co/cloud/serverless">Elastic Serverless</a>, or download the Elasticsearch stack, then spin up the MongoDB Atlas Integration, open the curated dashboards in Kibana and start monitoring your service!</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elastic-mongodb-atlas-integration/title.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Elastic Ramen: A CLI harness for SRE investigation and remediation]]></title>
            <link>https://www.elastic.co/observability-labs/blog/elastic-ramen-agent-builder-cli</link>
            <guid isPermaLink="false">elastic-ramen-agent-builder-cli</guid>
            <pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Exploring Elastic Ramen, a CLI harness that brings Agent Builder conversations, skills, and tools into the terminal so engineers can move from investigation to remediation in a single thread.]]></description>
            <content:encoded><![CDATA[<p>Observability tools tell you what went wrong.
They rarely help you fix it.
When responding to an incident, engineers split their time across Kibana, Slack, and the terminal.
At each step, the AI assistant stays behind in the previous surface, and the investigation starts over from scratch.</p>
<p><strong>Elastic Ramen</strong> (<strong>R</strong>oot-cause <strong>A</strong>nalysis &amp; <strong>M</strong>onitoring <strong>En</strong>gine) bridges that gap.
It is a local CLI agent that connects directly to <a href="https://www.elastic.co/search-labs/blog/elastic-ai-agent-builder-context-engineering-introduction">Elastic Agent Builder</a>, carrying the same conversation, skills, and Elastic context into the terminal.
Ramen operates directly in the environment where fixes actually happen. No handoff. No re-auth. No translation layer.
Ramen is open source and available at <a href="https://github.com/elastic/elastic-ramen">elastic/elastic-ramen</a>.</p>
<p>&lt;Video
vidyardUuid=&quot;C9yjANqDi6L1xzutQWktbu&quot;
quality=&quot;1080p&quot;
alt=&quot;Starting an investigation in Kibana, resuming in the terminal with Ramen, and using local tools to mitigate the issue.&quot;
/&gt;</p>
<h2>Why the terminal matters</h2>
<p>Agent Builder gives engineers a strong environment for querying observability data.
Ramen takes that same capability to the two workflows that need it most.</p>
<p><strong>Onboarding.</strong>
Configuring collectors, managing credentials, and validating data flow all happen in the shell.
A local agent can guide that work right where the credentials and tools already live.</p>
<p><strong>Mitigation.</strong>
The actual fix, whether restarting pods, scaling deployments, or rolling back releases, requires <code>kubectl</code>, <code>gcloud</code>, <code>git</code>, or internal scripts.
A CLI agent runs on hardware the team already trusts, using the credentials already present on the engineer's machine.</p>
<h2>How Ramen works</h2>
<p>Ramen is a CLI client for Agent Builder.
It is not a separate assistant with its own memory.
It connects your local environment to the same conversations, skills, and tools you already use in Kibana through a simple authentication flow.</p>
<p>On first launch, Ramen connects to your Elastic deployment and gives you everything out of the box:</p>
<ul>
<li>LLM inference through the Kibana gateway, using your existing AI connector</li>
<li>Native Kibana tools for managing workflows and agents</li>
<li>The Agent Builder MCP server for ES|QL queries and documentation search</li>
<li>An embedded <code>elastic</code> CLI for cluster health, data streams, and SLOs</li>
<li>Built-in skills for root cause analysis and SLO management</li>
</ul>
<p>The agent carries your investigation history across surfaces, so you never re-explain the incident when moving from the UI to the CLI.
Terminal interactions sync back to Elastic automatically, building a searchable record of operational knowledge for the team.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/elastic-ramen-agent-builder-cli/architecture-flow.jpg" alt="Diagram showing the Ramen CLI connecting to Agent Builder, which accesses Elastic Stack data, with conversations syncing back." /></p>
<h2>Get started</h2>
<p>You need an Elastic Observability Serverless project.
In Kibana, open <strong>Stack Management</strong>, then <strong>Advanced Settings</strong>, or go directly to <code>https://&lt;your-kibana-url&gt;/app/management/kibana/settings?query=ramen</code>.
Enable <strong><code>elasticRamen:enabled</code></strong>, then install the CLI:</p>
<pre><code class="language-bash">npm i -g @elastic/ramen
bun add -g @elastic/ramen
</code></pre>
<p>You can also use the install script or download a pre-built binary from <a href="https://github.com/elastic/elastic-ramen/releases">GitHub Releases</a>:</p>
<pre><code class="language-bash">curl -fsSL https://raw.githubusercontent.com/elastic/elastic-ramen/dev/install | bash
</code></pre>
<p>Once installed, connect to your deployment:</p>
<pre><code class="language-bash">elastic-ramen --kibana-base=https://&lt;your-kibana-url&gt;
</code></pre>
<p>Ramen opens a browser auth flow, generates credentials, and stores them locally.
After that, it reconnects automatically.
Start a conversation in Agent Builder and resume it in the terminal with <code>/kibana-conversations</code>.</p>
<h2>What is next</h2>
<p>Ramen is the first surface of a multi-surface agent system.
The same architecture extends to every surface engineers already use:</p>
<ul>
<li><strong>Space-scoped collaboration</strong> for shared agent context during outages</li>
<li><strong>Slack, Teams, Jira, PagerDuty</strong> integration: start from an alert, collaborate in chat, mitigate in the terminal, one thread</li>
<li><strong>Shared memory</strong>: progressively distill conversations into durable operational context that improves future investigations</li>
</ul>
<p>Beyond incident response, the same model applies to deployment risk analysis, production debugging, CI/CD policy checks, and cost anomaly investigation.</p>
<h2>Summary</h2>
<p>Ramen connects signal to action: Elastic data and Agent Builder context, plus the ability to act with local tools, in one continuous thread.
Elastic as the persistent context layer, every surface you use as the interface.</p>
<p>Try it out on <a href="https://github.com/elastic/elastic-ramen">GitHub</a> and let us know what you think.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/elastic-ramen-agent-builder-cli/cover.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Getting more from your logs with OpenTelemetry]]></title>
            <link>https://www.elastic.co/observability-labs/blog/getting-more-from-your-logs-with-opentelemetry</link>
            <guid isPermaLink="false">getting-more-from-your-logs-with-opentelemetry</guid>
            <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to evolve beyond basic log ingest by leveraging OpenTelemetry for ingestion, structured logging, geographic enrichment, and ES|QL analytics. Transform raw log data into actionable intelligence with practical examples and proactive observability strategies.]]></description>
            <content:encoded><![CDATA[<h1>Getting more from your logs with OpenTelemetry</h1>
<p>Most people today use their logging tools mostly still in the same way we have for decades as a simple search lake, essentially still grepping for logs but from a centralized platform. There’s nothing wrong with this, you can get a lot of value by having a centralized logging platform but the question becomes how can I start to evolve beyond this basic log and search use case? Where can I start to be more effective with my incident investigations? In this blog we start from where most of our customers are today and give you some practical tips on how to move a little beyond this simple logging use case.</p>
<h2>Ingestion</h2>
<p>Let's start at the beginning, ingest. Typically many of you are using older tools for ingestion today. If you want to be more forward thinking here, it’s time to introduce you to OpenTelemetry. OpenTelemetry was once not very mature or capable for logging but things have changed significantly. Elastic has been working particularly hard to improve the log capabilities resident in OpenTelemetry. So let's start by exploring how we can get started bringing logs into Elastic via the OpenTelemetry collector.</p>
<p>Firstly if you want to follow along simply create a host to run the log generator and OpenTelemetry collector.</p>
<p>Follow the instructions here to get the log generator running:</p>
<p><a href="https://github.com/davidgeorgehope/log-generator-bin/">https://github.com/davidgeorgehope/log-generator-bin/</a></p>
<p>To get the OpenTelemetry collector up and running in <a href="https://cloud.elastic.co/serverless-registration?onboarding_token=observability">Elastic Serverless</a>, you can click on Add Data from the bottom left, then 'host' and finally 'opentelemetry'</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image14.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image7.png" alt="" /></p>
<p>Follow the instructions but don’t start the collector just yet.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image16.png" alt="" /></p>
<p>Our host here is running a 3 tier application with an Nginx frontend, backend and connected to a MySQL database. So let's start by bringing the logs into Elastic.</p>
<p>First we’ll install the Elastic Distributions for OpenTelemetry but before starting it, we will make a small change to the OpenTelemetry configuration file to expand the directories it will search for logs in.  Edit the otel.yml by simply using vi or your favorite editor:</p>
<pre><code class="language-bash">vi otel.yml
</code></pre>
<p>Instead of simply /var/log/.log we will add /var/log/**/*.log to bring in all our log files.</p>
<pre><code class="language-yaml">receivers:
  # Receiver for platform specific log files
  filelog/platformlogs:
    include: [ /var/log/**/*.log ]
    retry_on_failure:
      enabled: true
    start_at: end
    storage: file_storage
</code></pre>
<p>Start the otel collector</p>
<pre><code class="language-bash">sudo ./otelcol --config otel.yml
</code></pre>
<p>And we can see these are being brought in, in discover</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image8.png" alt="" /></p>
<p>Now one thing that is immediately noticeable is that we automatically without changing anything get a bunch of useful additional information such as the os name and cpu information.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image12.png" alt="" /></p>
<p>The OpenTelemetry collector has automatically, without any changes, started to enrich our logs, making it useful for additional processing, though we could do significantly better!</p>
<p>To start with we want to give our logs some structure. Lets edit that otel.yml file and add some OTTL to extract some key data from our NGINX logs.</p>
<pre><code class="language-yaml">  transform/parse_nginx:
    trace_statements: []
    metric_statements: []
    log_statements:
      - context: log
        conditions:
          - 'attributes[&quot;log.file.name&quot;] != nil and IsMatch(attributes[&quot;log.file.name&quot;], &quot;access.log&quot;)'
        statements:
          - merge_maps(attributes, ExtractPatterns(body, &quot;^(?P&lt;client_ip&gt;\\S+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;^\\S+ - (?P&lt;user&gt;\\S+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\\[(?P&lt;timestamp_raw&gt;[^\\]]+)\\]&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot;(?P&lt;method&gt;\\S+) &quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot;\\S+ (?P&lt;path&gt;\\S+)\\?&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;req_id=(?P&lt;req_id&gt;[^ ]+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot; (?P&lt;status&gt;\\d+) &quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot; \\d+ (?P&lt;size&gt;\\d+)&quot;), &quot;upsert&quot;)
.....

   logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx,resourcedetection]
      exporters: [elasticsearch/otel]
</code></pre>
<p>Now when we start the Otel collector with this new configuration</p>
<pre><code class="language-bash">sudo ./otelcol --config otel.yml
</code></pre>
<p>We will see that we now have structured logs!!</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image17.png" alt="" /></p>
<h2>Store and Optimize</h2>
<p>To ensure you aren’t blowing your budget out with all this additional structured data there are few things you can do to help maximize storage efficiency.</p>
<p>You can use the filter processors in the Otel collector with granular filtering/dropping of irrelevant attributes to control volume going out of the collector for example.</p>
<pre><code class="language-yaml">processors:
  filter/drop_logs_without_user_attributes:
    logs:
      log_record:
        - 'attributes[&quot;user&quot;] == nil'
  filter/drop_200_logs:
    logs:
      log_record:
        - 'attributes[&quot;status&quot;] == &quot;200&quot;'

service:
  pipelines:
    logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx, filter/drop_logs_without_user_attributes, filter/drop_200_logs, resourcedetection]
      exporters: [elasticsearch/otel]
</code></pre>
<p>The filter processor will help reduce the noise for example if you wanted to drop the debug logs or logs from a noisy service. Great ways to keep a lid on your observability spend.</p>
<p>Additionally for your most critical flows and logs where you don’t want to drop any data, Elastic has you covered. In version 9.x of Elastic you now have LogsDB switched on by default.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image15.png" alt="" /></p>
<p>With LogsDB, Elastic has reduced the storage footprint of log data in Elasticsearch by up to 65% allowing you to store more observability and security data without exceeding your budget, while keeping all data accessible and searchable.</p>
<p>LogsDB reduces log storage by up to 65%. This dramatically minimizes storage footprints by leveraging advanced compression techniques like ZSTD, delta encoding, and run-length encoding, and it also reconstructs the _source field on demand, saving about 40% more storage by not retaining the original JSON document. Synthetic _source represents the introduction of columnar storage within Elasticsearch.</p>
<h2>Analytics</h2>
<p>So we have our data in Elastic, it’s structured, it conforms to the idea of a wide-event log since it has lots of good context, user ids, request ids and the data is captured at the start of a request Next we’re going to look at the analytics part of this. First let's take a stab at looking at the number of Errors for each user transaction in our application.</p>
<pre><code class="language-esql">FROM logs-generic.otel-default
| WHERE log.file.name == &quot;access.log&quot;
| WHERE attributes.status &gt;= &quot;400&quot;
| STATS error_count = COUNT(*) BY attributes.user
| SORT error_count DESC
</code></pre>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image9.png" alt="" /></p>
<p>It’s pretty easy now to save this and put it on a dashboard, we just click the save button:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image1.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image5.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image6.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image3.png" alt="" /></p>
<p>Next let's look at putting something together to show the global impact, first we will update our collector config to enrich our log data with geo location.</p>
<p>Update the OTTL configuration with this new line:</p>
<pre><code class="language-yaml">   log_statements:
      - context: log
        conditions:
          - 'attributes[&quot;log.file.name&quot;] != nil and IsMatch(attributes[&quot;log.file.name&quot;], &quot;access.log&quot;)'
        statements:
          - merge_maps(attributes, ExtractPatterns(body, &quot;^(?P&lt;client_ip&gt;\\S+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;^\\S+ - (?P&lt;user&gt;\\S+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\\[(?P&lt;timestamp_raw&gt;[^\\]]+)\\]&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot;(?P&lt;method&gt;\\S+) &quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot;\\S+ (?P&lt;path&gt;\\S+)\\?&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;req_id=(?P&lt;req_id&gt;[^ ]+)&quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot; (?P&lt;status&gt;\\d+) &quot;), &quot;upsert&quot;)
          - merge_maps(attributes, ExtractPatterns(body, &quot;\&quot; \\d+ (?P&lt;size&gt;\\d+)&quot;), &quot;upsert&quot;)
          - set(attributes[&quot;source.address&quot;], attributes[&quot;client_ip&quot;]) where attributes[&quot;client_ip&quot;] != nil
</code></pre>
<p>Next add a new processor (you will need to download the GeoIP database from MaxMind)</p>
<pre><code class="language-yaml">geoip:
  context: record
  source:
    from: attributes
  providers:
    maxmind:
      database_path: /opt/geoip/GeoLite2-City.mmdb
</code></pre>
<p>And add this to the log pipeline after the parse_nginx</p>
<pre><code class="language-yaml">service:
  pipelines:
    logs/platformlogs:
      receivers: [filelog/platformlogs]
      processors: [transform/parse_nginx, geoip, resourcedetection]
      exporters: [elasticsearch/otel]
</code></pre>
<p>Start the otel collector</p>
<pre><code class="language-bash">sudo ./otelcol --config otel.yml
</code></pre>
<p>Once the data starts flowing we can add a map visualization:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image2.png" alt="" /></p>
<p>Add a layer:</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image4.png" alt="" /></p>
<p>Use ES|QL</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image10.png" alt="" /></p>
<p>Use the following ES|QL</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image13.png" alt="" /></p>
<p>And this should give you a map showing the locations of all your NGINX server requests!</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/image11.png" alt="" /></p>
<p>As you can see, analytics is a breeze with your new Otel data collection pipeline.</p>
<h2>Conclusion: Beyond log aggregation to operational intelligence</h2>
<p>The journey from basic log aggregation to structured, enriched observability represents more than a technical upgrade, it's a shift in how organizations approach system understanding and incident response. By adopting OpenTelemetry for ingestion, implementing intelligent filtering to manage costs, and leveraging LogsDB's storage optimizations, you're not just modernizing your ELK stack; you're building the foundation for proactive system management.</p>
<p>The structured logs, geographic enrichment, and analytical capabilities demonstrated here transform raw log data into actionable intelligence with ES|QL. Instead of reactive grepping through logs during incidents, you now have the infrastructure to identify patterns, track user journeys, and correlate issues across your entire stack before they become critical problems.</p>
<p>But here's the key question: Are you prepared to act on these insights? Having rich, structured data is only valuable if your organization can shift from a reactive &quot;find and fix&quot; mentality to a proactive &quot;predict and prevent&quot; approach. The real evolution isn't in your logging stack, it's in your operational culture.</p>
<p>Get started with this today in <a href="https://cloud.elastic.co/serverless-registration?onboarding_token=observability">Elastic Serverless</a></p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/getting-more-from-your-logs-with-opentelemetry/getting-more-from-your-logs-with-opentelemetry.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Migrating Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform]]></title>
            <link>https://www.elastic.co/observability-labs/blog/migrate-datadog-grafana-dashboards-alerts-to-kibana</link>
            <guid isPermaLink="false">migrate-datadog-grafana-dashboards-alerts-to-kibana</guid>
            <pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to migrate supported Datadog and Grafana dashboards and alerts to Kibana with the Observability Migration Platform.]]></description>
            <content:encoded><![CDATA[<p>The Observability Migration Platform is a CLI-driven workflow that translates supported Grafana and Datadog assets into Kibana-native outputs and produces the evidence needed to review the result. It changes migration from a manual rebuild into a translation-and-verification workflow that gets teams into <a href="https://www.elastic.co/docs/solutions/observability">Elastic Observability</a> faster.</p>
<h2>Migrations covered by the Observability Migration Platform</h2>
<p>The current scope covers Datadog and Grafana. The platform can work from exported assets or live APIs, and it focuses on dashboards and alerting content on the Datadog and Grafana paths it currently covers.</p>
<p>Support is not identical across the two sources. Datadog has end-to-end extraction, validation, compile, upload, smoke, and verification workflows, but it currently covers a narrower slice of widgets and monitors. Grafana coverage is broader. The platform provides a practical translation pipeline for the supported paths.</p>
<p>The screenshots below show examples of dashboards after migration.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/migrated-dashboard-1.jpg" alt="Migrated Node Exporter Full dashboard in Kibana, top of page showing CPU, memory, network, and disk panels" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/migrated-dashboard-2.jpg" alt="Migrated Node Exporter Full dashboard in Kibana, scrolled to the Memory Meminfo section showing detailed memory panels" /></p>
<h2>How the Observability Migration Platform works</h2>
<p>At a high level, the workflow has two halves: source-aware translation on the way in and target-aware validation and delivery on the way out. That split matters because Grafana and Datadog differ not only in JSON shape, but also in query languages, panel types, controls, and alerting models.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/overview.png" alt="End-to-end flow of the Observability Migration Platform: extract from Grafana or Datadog, normalize and plan, translate queries, panels, and alerts, emit Kibana-native output, validate against an Elastic target, then compile and upload to Kibana while producing verification and review artifacts" /></p>
<p>A run starts with exported assets or live source APIs. From there, the workflow normalizes source-specific objects, chooses a translation path for each supported dashboard, panel, and alerting artifact, and emits Kibana-native output. This is where most of the source-specific logic lives: translating queries or Datadog formulas, mapping panel semantics, carrying forward controls and links where possible, and deciding when an exact translation is not the right answer.</p>
<p>The second half is target-aware. The emitted output can be validated against an Elastic target, compiled, and uploaded to Kibana through the shared runtime. In the happy path, that yields a working translated dashboard. In rougher cases, validation may show that a panel cannot run safely as emitted. When that happens, the workflow is designed to fail conservatively: it can mark the panel for manual review or replace it with an upload-safe placeholder instead of shipping a broken runtime panel.</p>
<p>Just as important, the outcome is not simply &quot;a dashboard showed up in Kibana.&quot; The workflow also produces reviewer-facing evidence such as a migration report, manifest, verification packets, and rollout plan so you can see what translated cleanly, what was downgraded or manualized, and what still needs human judgment. Those artifacts are what make the process operationally credible: they give teams something concrete to inspect, compare, and act on.</p>
<h2>Running the migration</h2>
<p>The platform is CLI-driven, and a good fit for migration work that needs to be repeatable, reviewable, and easy to automate. Users can start with a representative slice of dashboards and alerting content from Grafana or Datadog, point the workflow at an Elastic target, and use that first run to understand translation quality, validation results, and how much follow-up review is required.</p>
<p>To run the full path against Elastic, create an <a href="https://www.elastic.co/docs/solutions/observability/get-started">Elastic Observability Serverless</a> project, generate a <a href="https://www.elastic.co/docs/deploy-manage/api-keys/serverless-project-api-keys">Serverless project API key</a>, and point the CLI at your Elasticsearch and Kibana endpoints:</p>
<pre><code class="language-shell">obs-migrate migrate \
  --source grafana \
  --input-mode files \
  --input-dir ./grafana_exports \
  --output-dir ./migration_output \
  --assets all \
  --native-promql \
  --data-view &quot;metrics-*&quot; \
  --validate \
  --es-url &quot;$ELASTICSEARCH_ENDPOINT&quot; \
  --es-api-key &quot;$KEY&quot; \
  --kibana-url &quot;$KIBANA_ENDPOINT&quot; \
  --kibana-api-key &quot;$KEY&quot; \
  --upload
</code></pre>
<p>The run validates the emitted queries against Elastic, compiles the generated dashboards, uploads them to Kibana, and produces the standard migration artifacts for review.</p>
<p>A typical run looks like this:</p>
<ol>
<li>Start with exported assets or live source APIs from Grafana or Datadog.</li>
<li>Choose the asset scope with <code>--assets dashboards</code>, <code>--assets alerts</code>, or <code>--assets all</code>.</li>
<li>Translate the supported dashboards, queries, controls, and alerting artifacts into Kibana-native output.</li>
<li>Validate the emitted content against an Elastic target (if configured), then compile and upload the translated dashboards for dashboard-capable runs.</li>
<li>Review the migration evidence, including <code>migration_report.json</code>, <code>verification_packets.json</code>, <code>run_summary.json</code>, etc., to understand what translated cleanly, where semantic gaps remain, and which dashboards, panels, or alert rules still require human review.</li>
<li>If alert rule creation is enabled, review the migrated rules (which are disabled by default) in Kibana before deciding which ones to enable or redesign.</li>
</ol>
<h2>What's next</h2>
<p>The platform is still evolving, and will continue to gain depth and self-service capabilities. The biggest open areas are stronger measured source-to-target semantic verification, further coverage for Datadog, deeper coverage for harder query families and non-dashboard surfaces, and cleaner shared runtime contracts across the workflow.</p>
<p>It is also built to grow over time. The source and target boundaries are explicit by design, which gives the platform room to expand coverage and support additional source paths in the future.</p>
<h2>In conclusion</h2>
<p>If you are planning a move into Elastic, a good starting point is to create an <a href="https://www.elastic.co/docs/solutions/observability/get-started">Elastic Observability Serverless</a> project. That gives you the target environment where translated dashboards and alerting content can be validated and reviewed.</p>
<p>To learn more about the migration workflow, talk to your Elastic representative about current access, supported coverage, and how it can help with your migration needs.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/migrate-datadog-grafana-dashboards-alerts-to-kibana/header.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[OpenTelemetry for PHP: EDOT PHP joins the OpenTelemetry project]]></title>
            <link>https://www.elastic.co/observability-labs/blog/opentelemetry-accepts-elastics-donation-of-edot</link>
            <guid isPermaLink="false">opentelemetry-accepts-elastics-donation-of-edot</guid>
            <pubDate>Mon, 10 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Explore Elastic’s donation of its EDOT PHP to the OpenTelemetry community and discover how it makes OpenTelemetry for PHP simpler and more accessible.]]></description>
            <content:encoded><![CDATA[<p>The OpenTelemetry community has officially accepted Elastic's proposal to contribute the <strong>Elastic Distribution of OpenTelemetry for PHP (EDOT PHP)</strong> — marking an important milestone in bringing first-class observability to one of the web's most widely used languages.</p>
<p>For decades, PHP has powered everything from small business websites to large-scale SaaS platforms. Yet observability in PHP has often required manual setup, compilers, custom extensions, or changes to application code — challenges that limited adoption in production environments.
This upcoming donation aims to change that, by making OpenTelemetry for PHP <strong>as easy to deploy as any other runtime</strong>.</p>
<h2>What's coming</h2>
<p>Once the contribution process is complete, EDOT PHP will become part of the OpenTelemetry project — providing a <strong>complete, production-ready distribution</strong> that's optimized for performance, simplicity, and scalability.</p>
<p>EDOT PHP introduces a new approach to PHP observability:</p>
<ul>
<li><strong>Simple installation</strong> - installing OpenTelemetry for PHP will be as straightforward as installing a standard system package. From that point, the agent automatically detects and instruments PHP applications — no code changes, no manual setup.</li>
<li><strong>Automatic agent loading</strong> - works transparently in cloud and container environments without modifying application deployments.</li>
<li><strong>Zero configuration</strong> - ships as a single, self-contained binary; no need to install or compile any external extensions.</li>
<li><strong>Native C++ performance</strong> - a built-in serializer written in C++ reduces telemetry overhead by up to <strong>5×</strong>.</li>
<li><strong>Automatic instrumentation</strong> - instruments popular frameworks and libraries out of the box.</li>
<li><strong>Inferred spans</strong> - reveals the behavior of even uninstrumented code paths, providing full trace coverage.</li>
<li><strong>Automatic root spans</strong> - ensures complete traces, even in legacy or partially instrumented applications.</li>
<li><strong>OpAMP readiness</strong> - while the OpenTelemetry community continues to standardize configuration schemas and management workflows, the implementation in EDOT PHP is fully prepared to support these upcoming specifications — ensuring seamless adoption once the OpAMP ecosystem matures.</li>
<li><strong>Asynchronous backend communication</strong> - telemetry data is exported to the OpenTelemetry Collector or backend <strong>asynchronously</strong>, without blocking the instrumented application.
This ensures that span and metric exports do not add latency to user requests or impact response times, even under heavy load.</li>
</ul>
<p>Together, these features make EDOT PHP the first truly <strong>zero-effort observability solution for PHP</strong> — from local testing to cloud-scale production systems.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/opentelemetry-accepts-elastics-donation-of-edot/performance.png" alt="Performance comparision" /></p>
<blockquote>
<p>The native C++ serializer and asynchronous export pipeline in EDOT PHP reduce average request time from <strong>49 ms</strong> to <strong>23 ms</strong>, more than <strong>2× faster</strong> than the pure PHP implementation.</p>
</blockquote>
<h2>Building on the existing foundation</h2>
<p>EDOT PHP doesn't replace the existing OpenTelemetry PHP SDK — it <strong>extends and strengthens it</strong>.
It packages the SDK, automatic instrumentation, and native extension into a single, unified agent package that works seamlessly with existing OpenTelemetry specifications and APIs.</p>
<p>By contributing this work, Elastic helps the OpenTelemetry community accelerate PHP adoption, align implementations across languages, and make distributed tracing truly universal.</p>
<blockquote>
<p>“This isn't a hand-off — it's a collaboration.
We're contributing years of development to help OpenTelemetry for PHP evolve faster, run more efficiently, and reach more users in every environment.”</p>
<ul>
<li><em>Elastic Observability team</em></li>
</ul>
</blockquote>
<h2>Ongoing improvements</h2>
<p>Elastic continues to invest in advancing EDOT PHP ahead of its integration into OpenTelemetry.
The team is currently focused on <strong>reducing resource usage and memory footprint</strong>, particularly in <strong>multi-worker server environments</strong> such as PHP-FPM or Apache prefork.
These optimizations aim to make the agent more predictable and efficient under heavy load — ensuring that telemetry remains lightweight even in large-scale production deployments.</p>
<p>Beyond that, we're exploring further improvements that can enhance both performance and interoperability.
Areas under investigation include smarter coordination in high-concurrency scenarios, better sharing of telemetry resources across workers, and future alignment with additional OpenTelemetry signals such as metrics and logs.</p>
<p>Together, these efforts will help make EDOT PHP not only faster, but also more adaptable and seamlessly integrated into diverse runtime architectures.</p>
<h2>Why it matters</h2>
<p>This contribution is about more than performance — it's about <strong>removing barriers</strong>.
By making OpenTelemetry for PHP installable as a simple system package and automatically loaded into running applications, the project opens observability to every PHP developer, operator, and platform provider.</p>
<p>For the OpenTelemetry ecosystem, it fills one of the last major language gaps, extending visibility to a vast portion of the internet — all under open governance and community collaboration.</p>
<h2>Looking ahead</h2>
<p>In the months ahead, Elastic and the OpenTelemetry PHP SIG will work closely on the technical integration, documentation, and community onboarding process.
Once the transition is complete, developers will gain a fully open, community-driven, and production-ready OpenTelemetry agent that “just works” — without friction, configuration, or code changes.</p>
<p>Together, we're building a future where <strong>observability just works — for every language, every framework, and every environment</strong>.</p>
<p>For more information:</p>
<p><a href="https://www.elastic.co/docs/reference/opentelemetry">EDOT documentation</a>&lt;br /&gt;
<a href="https://www.elastic.co/observability-labs/blog/elastic-managed-otlp-endpoint-for-opentelemetry">Learn about</a> OTLP Endpoint</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/opentelemetry-accepts-elastics-donation-of-edot/otel-php.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Supercharge Your vSphere Monitoring with Enhanced vSphere Integration]]></title>
            <link>https://www.elastic.co/observability-labs/blog/supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration</link>
            <guid isPermaLink="false">supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration</guid>
            <pubDate>Wed, 11 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[Supercharge Your vSphere Monitoring with Enhanced vSphere Integration]]></description>
            <content:encoded><![CDATA[<p><a href="https://www.vmware.com/products/cloud-infrastructure/vsphere">vSphere</a> is VMware's cloud computing virtualization platform that provides a powerful suite for managing virtualized resources. It allows organizations to create, manage, and optimize virtual environments, providing advanced capabilities such as high availability, load balancing, and simplified resource allocation. vSphere enables efficient utilization of hardware resources, reducing costs while increasing the flexibility and scalability of IT infrastructure.</p>
<p>With the release of an upgraded <a href="https://www.elastic.co/docs/current/integrations/vsphere">vSphere integration</a> we now support an enhanced set of metrics and datastreams. Package version 1.15.0 onwards introduces new datastreams that significantly improve the collection of performance metrics, providing deeper insights into your vSphere environment.</p>
<p>This enhanced version includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.</p>
<p>We have expanded the performance metrics to encompass a broader range of insights across all datastreams, while also introducing new datastreams for clusters, resource pools, and networks. This enhanced integration version now includes a total of seven datastreams, featuring critical new metrics such as disk performance, memory utilization, and network status. Additionally, these datastreams now offer detailed visibility into associated resources like hosts, clusters, and resource pools.</p>
<p>Each datastream also includes detailed alarm information, such as the alarm name, description, status (e.g. critical or warning), and the affected entity's name. To make the most of these insights, we’ve also introduced prebuilt dashboards, helping teams monitor and troubleshoot their vSphere environments with ease and precision.</p>
<h2>Overview of the Datastreams</h2>
<ul>
<li><strong>Host Datastream:</strong> This datastream monitors the disk performance of the host, including metrics such as disk latency, average read/write bytes, uptime, and status. It also captures network metrics, such as packet information, network bandwidth, and utilization, as well as CPU and memory usage of the host. Additionally, it lists associated datastores, virtual machines, and networks within vSphere.</li>
</ul>
<p><img src="https://www.elastic.co/observability-labs/assets/images/supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration/hosts.png" alt="Host Datastream" /></p>
<ul>
<li><strong>Virtual Machine Datastream:</strong> This datastream tracks the used and available CPU and memory resources of virtual machines, along with the uptime and status of each VM. It includes information about the host on which the VM is running, as well as detailed snapshot metrics like the number of snapshots, creation dates, and descriptions. Additionally, it provides insights into associated hosts and datastores.</li>
</ul>
<p><img src="https://www.elastic.co/observability-labs/assets/images/supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration/virtualmachine.png" alt="Virtual Machine Datastream" /></p>
<ul>
<li>
<p><strong>Datastore Datastream:</strong> This datastream provides information on the total, used, and available capacity of datastores, along with their overall status. It also captures metrics such as the average read/write rate and lists the hosts and virtual machines connected to each datastore.</p>
</li>
<li>
<p><strong>Datastore Cluster:</strong> A datastore cluster in vSphere is a collection of datastores grouped together for efficient storage management. This datastream provides details on the total capacity and free space in the storage pod, along with the list of datastores within the cluster.</p>
</li>
</ul>
<p><img src="https://www.elastic.co/observability-labs/assets/images/supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration/datastore.png" alt="Datastore Datastream" /></p>
<ul>
<li>
<p><strong>Resource Pool:</strong> Resource pools in vSphere serve as logical abstractions that allow flexible allocation of CPU and memory resources. This datastream captures memory metrics, including swapped, ballooned, and shared memory, as well as CPU metrics like distributed and static CPU entitlement. It also lists the virtual machines associated with each resource pool.</p>
</li>
<li>
<p><strong>Network Datastream:</strong> This datastream captures the overall configuration and status of the network, including network types (e.g., vSS, vDS). It also lists the hosts and virtual machines connected to each network.</p>
</li>
<li>
<p><strong>Cluster Datastream:</strong> A Cluster in vSphere is a collection of ESXi hosts and their associated virtual machines that function as a unified resource pool. Clustering in vSphere allows administrators to manage multiple hosts and resources centrally, providing high availability, load balancing, and scalability to the virtual environment. This datastream includes metrics indicating whether HA or admission control is enabled and lists the hosts, networks, and datastores associated with the cluster.</p>
</li>
</ul>
<h2>Alarms support in vSphere Integration</h2>
<p>Alarms are a vital part of the vSphere integration, providing real-time insights into critical events across your virtual environment. In the updated Elastic’s vSphere integration, alarms are now reported for all the entities. They include detailed information such as the alarm name, description, severity (e.g., critical or warning), affected entity, and triggered time. These alarms are seamlessly integrated into datastreams, helping administrators and SREs quickly identify and resolve issues like resource shortages or performance bottlenecks.</p>
<h4>Example Alarm</h4>
<pre><code class="language-yaml">&quot;triggered_alarms&quot;: [
  {
    &quot;description&quot;: &quot;Default alarm to monitor host memory usage&quot;,
    &quot;entity_name&quot;: &quot;host_us&quot;,
    &quot;id&quot;: &quot;alarm-4.host-12&quot;,
    &quot;name&quot;: &quot;Host memory usage&quot;,
    &quot;status&quot;: &quot;red&quot;,
    &quot;triggered_time&quot;: &quot;2024-08-28T10:31:26.621Z&quot;
  }
]
</code></pre>
<p>This example highlights a triggered alarm for monitoring host memory usage, indicating a critical status (red) for the host &quot;host_us.&quot; Such alarms empower teams to act swiftly and maintain the stability of their vSphere environment.</p>
<h2>Lets Try It Out!</h2>
<p>The new <a href="https://www.elastic.co/docs/current/integrations/vsphere">vSphere integration</a> in Elastic Cloud is more than just a monitoring tool; it’s a comprehensive solution that empowers you to manage and optimize your virtual environments effectively. With deeper insights and enhanced data granularity, you can ensure high availability, improved load balancing, and smarter resource allocation. Spin up an Elastic Cloud, and start monitoring your vSphere infrastructure.</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/supercharge-your-vsphere-monitoring-with-enhanced-vsphere-integration/title.jpeg" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[The next evolution of observability: unifying data with OpenTelemetry and generative AI]]></title>
            <link>https://www.elastic.co/observability-labs/blog/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai</link>
            <guid isPermaLink="false">the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai</guid>
            <pubDate>Wed, 11 Jun 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Generative AI and machine learning are revolutionizing observability, but siloed data hinders their true potential. This article explores how to break down data silos by unifying logs, metrics, and traces with OpenTelemetry, unlocking the full power of GenAI for natural language investigations, automated root cause analysis, and proactive issue resolution.]]></description>
            <content:encoded><![CDATA[<p>The Observability industry today stands at a critical juncture. While our applications generate more telemetry data than ever before, this wealth of information typically exists in siloed tools, separate systems for logs, metrics, and traces. Meanwhile, Generative AI is hurtling toward us like an asteroid about to make a tremendous impact on our industry.</p>
<p>As SREs, we've grown accustomed to jumping between dashboards, log aggregators, and trace visualizers when troubleshooting issues. But what if there was a better way? What if AI could analyze all your observability data holistically, answering complex questions in natural language, and identifying root causes automatically?</p>
<p>This is the next evolution of observability. But to harness this power, we need to rethink how we collect, store, and analyze our telemetry data.</p>
<h2>The problem: siloed data limits AI effectiveness</h2>
<p>Traditional observability setups separate data into distinct types:</p>
<ul>
<li>Metrics: Numeric measurements over time (CPU, memory, request rates)</li>
<li>Logs: Detailed event records with timestamps and context</li>
<li>Traces: Request journeys through distributed systems</li>
<li>Profiles: Code-level execution patterns showing resource consumption and performance bottlenecks at the function/line level</li>
</ul>
<p>This separation made sense historically due to the way the industry evolved. Different data types have traditionally had different cardinality, structure, access patterns and volume characteristics. However, this approach creates significant challenges for AI-powered analysis:</p>
<pre><code class="language-text">Metrics (Prometheus) → &quot;CPU spiked at 09:17:00&quot;
Logs (ELK) → &quot;Exception in checkout service at 09:17:32&quot; 
Traces (Jaeger) → &quot;Slow DB queries in order-service at 09:17:28&quot;
Profiles (pyroscope) -&gt; &quot;calculate_discount() is taking 75% of CPU time&quot;
</code></pre>
<p>When these data sources live in separate systems, AI tools must either:</p>
<ol>
<li>Work with an incomplete picture (seeing only metrics but not the related logs)</li>
<li>Rely on complex, brittle integrations that often introduce timing skew</li>
<li>Force developers to manually correlate information across tools</li>
</ol>
<p>Imagine asking an AI, &quot;Why did checkout latency spike at 09:17?&quot; To answer comprehensively, it needs access to logs (to see the stack trace), traces (to understand the service path), and metrics (to identify resource strain). With siloed tools, the AI either sees only fragments of the story or requires complex ETL jobs that are slower than the incident itself.</p>
<h2>Why traditional machine learning (ML) falls short</h2>
<p>Traditional machine learning for observability typically focuses on anomaly detection within a single data dimension. It can tell you when metrics deviate from normal patterns, but struggles to provide context or root cause.</p>
<p>ML models trained on metrics alone might flag a latency spike, but can't connect it to a recent deployment (found in logs) or identify that it only affects requests to a specific database endpoint (found in traces). They behave like humans with extreme tunnel vision, seeing only a fraction of the relevant information and only the information that a specific vendor has given you an opinionated view into.</p>
<p>This limitation becomes particularly problematic in modern microservice architectures where problems frequently cascade across services. Without a unified view, traditional ML can detect symptoms but struggles to identify the underlying cause.</p>
<h2>The solution: unified data with enriched logs</h2>
<p>The solution is conceptually simple but transformative: unify metrics, logs, and traces into a single data store, ideally with enriched logs that contain all signals about a request in a single JSON document. We're about to see a merging of signals.</p>
<p>Think of traditional logs as simple text lines:</p>
<pre><code class="language-text">[2025-05-19 09:17:32] ERROR OrderService - Failed to process checkout for user 12345
</code></pre>
<p>Now imagine an enriched log that contains not just the error message, but also:</p>
<ul>
<li>The complete distributed trace context</li>
<li>Related metrics at that moment</li>
<li>System environment details</li>
<li>Business context (user ID, cart value, etc.)</li>
</ul>
<p>This approach creates a holistic view where every signal about the same event sits side-by-side, perfect for AI analysis.</p>
<h2>How generative AI changes things</h2>
<p>Generative AI differs fundamentally from traditional ML in its ability to:</p>
<ol>
<li>Process unstructured data: Understanding free-form log messages and error text</li>
<li>Maintain context: Connecting related events across time and services</li>
<li>Answer natural language queries: Translating human questions into complex data analysis</li>
<li>Generate explanations: Providing reasoning alongside conclusions</li>
<li>Surface hidden patterns: Discovering correlations and anomalies in log data that would be impractical to find through manual analysis or traditional querying</li>
</ol>
<p>With access to unified observability data, GenAI can analyze complete system behavior patterns and correlate across previously disconnected signals.</p>
<p>For example, when asked &quot;Why is our checkout service slow?&quot; a GenAI model with access to unified data can:</p>
<ul>
<li>Analyze unified enriched logs to identify which specific operations are slow and to find errors or warnings in those components</li>
<li>Check attached metrics to understand resource utilization</li>
<li>Correlate all these signals with deployment events or configuration changes</li>
<li>Present a coherent explanation in natural language with supporting graphs and visualizations</li>
</ul>
<h2>Implementing unified observability with OpenTelemetry</h2>
<p>OpenTelemetry provides the perfect foundation for unified observability with its consistent schema across metrics, logs, and traces. Here's how to implement enriched logs in a Java application:</p>
<pre><code class="language-java">import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.DoubleHistogram;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Scope;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;

public class OrderProcessor {
    private static final Logger logger = LoggerFactory.getLogger(OrderProcessor.class);
    private final Tracer tracer;
    private final DoubleHistogram cpuUsageHistogram;
    private final OperatingSystemMXBean osBean;

    public OrderProcessor(OpenTelemetry openTelemetry) {
        this.tracer = openTelemetry.getTracer(&quot;order-processor&quot;);
        Meter meter = openTelemetry.getMeter(&quot;order-processor&quot;);
        this.cpuUsageHistogram = meter.histogramBuilder(&quot;system.cpu.load&quot;)
                                      .setDescription(&quot;System CPU load&quot;)
                                      .setUnit(&quot;1&quot;)
                                      .build();
        this.osBean = ManagementFactory.getOperatingSystemMXBean();
    }

    public void processOrder(String orderId, double amount, String userId) {
        Span span = tracer.spanBuilder(&quot;processOrder&quot;).startSpan();
        try (Scope scope = span.makeCurrent()) {
            // Add attributes to the span
            span.setAttribute(&quot;order.id&quot;, orderId);
            span.setAttribute(&quot;order.amount&quot;, amount);
            span.setAttribute(&quot;user.id&quot;, userId);
            // Populate MDC for structured logging
            MDC.put(&quot;trace_id&quot;, span.getSpanContext().getTraceId());
            MDC.put(&quot;span_id&quot;, span.getSpanContext().getSpanId());
            MDC.put(&quot;order_id&quot;, orderId);
            MDC.put(&quot;order_amount&quot;, String.valueOf(amount));
            MDC.put(&quot;user_id&quot;, userId);
            // Record CPU usage metric associated with the current trace context
            double cpuLoad = osBean.getSystemLoadAverage();
            if (cpuLoad &gt;= 0) {
                cpuUsageHistogram.record(cpuLoad);
                MDC.put(&quot;cpu_load&quot;, String.valueOf(cpuLoad));
            }
            // Log a structured message
            logger.info(&quot;Processing order&quot;);
            // Simulate business logic
            // ...
            span.setAttribute(&quot;order.status&quot;, &quot;completed&quot;);
            logger.info(&quot;Order processed successfully&quot;);
        } catch (Exception e) {
            span.recordException(e);
            span.setAttribute(&quot;order.status&quot;, &quot;failed&quot;);
            logger.error(&quot;Order processing failed&quot;, e);
        } finally {
            MDC.clear();
            span.end();
        }
    }
}
</code></pre>
<p>This code demonstrates how to:</p>
<ol>
<li>Create a span for the operation</li>
<li>Add business attributes</li>
<li>Add current CPU usage</li>
<li>Link everything with consistent IDs</li>
<li>Record exceptions and outcomes in the backend system</li>
</ol>
<p>When configured with an appropriate exporter, this creates enriched logs that contain both application events and their complete context.</p>
<h2>Powerful queries across previously separate data</h2>
<p>With data that has not yet been enriched, there is still hope. Firstly with GenAI powered ingestion it is possible to extract key fields to help correlate data such as a session id's. This will help you enrich your logs so they get the structure they need to behave like other signals. Below we can see Elastic's Auto Import mechanism that will automatically generate ingest pipelines and pull unstructured information from logs into a structured format perfect for analytics.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai/image4.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai/image2.png" alt="" /></p>
<p>Once you have this data in the same data store, you can perform powerful join queries that were previously impossible. For example, finding slow database queries that affected specific API endpoints:</p>
<pre><code class="language-sql">FROM logs-nginx.access-default 
| LOOKUP JOIN .ds-logs-mysql.slowlog-default-2025.05.01-000002 ON request_id 
| KEEP request_id, mysql.slowlog.query, url.query 
| WHERE mysql.slowlog.query IS NOT NULL
</code></pre>
<p>This query joins web server logs with database slow query logs, allowing you to directly correlate user-facing performance with database operations.</p>
<p>For GenAI interfaces, these complex queries can be generated automatically from natural language questions:</p>
<p>&quot;Show me all checkout failures that coincided with slow database queries&quot;</p>
<p>The AI translates this into appropriate queries across your unified data store, correlating application errors with database performance.</p>
<h2>Real-world applications and use cases</h2>
<h3>Natural language investigation</h3>
<p>Imagine asking your observability system:</p>
<p>&quot;Why did checkout latency spike at 09:17 yesterday?&quot;</p>
<p>A GenAI-powered system with unified data could respond:</p>
<p>&quot;Checkout latency increased by 230% at 09:17:32 following deployment v2.4.1 at 09:15. The root cause appears to be increased MySQL query times in the inventory-service. Specifically, queries to the 'product_availability' table are taking an average of 2300ms compared to the normal 95ms. This coincides with a CPU spike on database host db-03 and 24 'Lock wait timeout' errors in the inventory service logs.&quot;</p>
<p>Here's an example of Claude Desktop connected to <a href="https://github.com/elastic/mcp-server-elasticsearch">Elastic's MCP (Model Context Protocol) Server</a> which demonstrates how powerful natural language investigations can be. Here we ask Claude &quot;analyze my web traffic patterns&quot; and as you can see it has correctly identified that this is in our demo environment.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai/image3.png" alt="" /></p>
<h3>Unknown problem detection</h3>
<p>GenAI can identify subtle patterns by correlating signals that would be missed in siloed systems. For example, it might notice that a specific customer ID appears in error logs only when a particular network path is taken through your microservices—indicating a data corruption issue affecting only certain user flows.</p>
<h3>Predictive maintenance</h3>
<p>By analyzing the unified historical patterns leading up to previous incidents, GenAI can identify emerging problems before they cause outages:</p>
<p>&quot;Warning: Current load pattern on authentication-service combined with increasing error rates in user-profile-service matches 87% of the signature that preceded the April 3rd outage. Recommend scaling user-profile-service pods immediately.&quot;</p>
<h2>The future: agentic AI for observability</h2>
<p>The next frontier is agentic AI, systems that not only analyze but take action automatically.</p>
<p>These AI agents could:</p>
<ol>
<li>Continuously monitor all observability signals</li>
<li>Autonomously investigate anomalies</li>
<li>Implement fixes for known patterns</li>
<li>Learn from the effectiveness of previous interventions</li>
</ol>
<p>For example, an observability agent might:</p>
<ul>
<li>Detect increased error rates in a service</li>
<li>Analyze logs and traces to identify a memory leak</li>
<li>Correlate with recent code changes</li>
<li>Increase the memory limit temporarily</li>
<li>Create a detailed ticket with the root cause analysis</li>
<li>Monitor the fix effectiveness</li>
</ul>
<p>This is about creating systems that understand your application's behavior patterns deeply enough to maintain them proactively. See how this works in Elastic Observability, in the screenshot at the end of the RCA we are sending an email summary but this could trigger any action.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai/image1.png" alt="" /></p>
<h2>Business outcomes</h2>
<p>Unifying observability data for GenAI analysis delivers concrete benefits:</p>
<ul>
<li>Faster resolution times: Problems that previously required hours of manual correlation can be diagnosed in seconds</li>
<li>Fewer escalations: Junior engineers can leverage AI to investigate complex issues before involving specialists</li>
<li>Improved system reliability: Earlier detection and resolution of emerging issues</li>
<li>Better developer experience: Less time spent context-switching between tools</li>
<li>Enhanced capacity planning: More accurate prediction of resource needs</li>
</ul>
<h2>Implementation steps</h2>
<p>Ready to start your observability transformation? Here's a practical roadmap:</p>
<ol>
<li>Adopt OpenTelemetry: Standardize on OpenTelemetry for all telemetry data collection and use it to generate enriched logs.</li>
<li>Choose a unified storage solution: Select a platform that can efficiently store and query metrics, logs, traces and enriched logs together</li>
<li>Enrich your telemetry: Update application instrumentation to include relevant context</li>
<li>Create correlation IDs: Ensure every request has identifiers</li>
<li>Implement semantic conventions: Follow consistent naming patterns across your telemetry data</li>
<li>Start with focused use cases: Begin with high-value scenarios like checkout flows or critical APIs</li>
<li>Leverage GenAI tools: Integrate tools that can analyze your unified data and respond to natural language queries</li>
</ol>
<p>Remember, AI can only be as smart as the data you feed it. The quality and completeness of your telemetry data will determine the effectiveness of your AI-powered observability.</p>
<h2>Generative AI: an evolutionary catalyst for observability</h2>
<p>The unification of observability data for GenAI analysis represents an evolutionary leap forward comparable to the transition from Internet 1.0 to 2.0. Early adopters will gain a significant competitive advantage through faster problem resolution, improved system reliability, and more efficient operations. GAI is a huge step for increasing observability maturity and moving your team to a more proactive stance.</p>
<p>Think of traditional observability as a doctor trying to diagnose a patient while only able to see their heart rate. Unified observability with GenAI is like giving that doctor a complete health picture, vital signs, lab results, medical history, and genetic data all accessible through natural conversation.</p>
<p>As SREs, we stand at the threshold of a new era in system observability. The asteroid of GenAI isn't a threat to be feared, it's an opportunity to evolve our practices and tools to build more reliable, understandable systems. The question isn't whether this transformation will happen, but who will lead it.</p>
<p>Will you?</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/the-next-evolution-of-observability-unifying-data-with-opentelemetry-and-generative-ai/title.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[From Uptime to Synthetics in Elastic: Your migration Playbook]]></title>
            <link>https://www.elastic.co/observability-labs/blog/uptime-to-synthetics-guide</link>
            <guid isPermaLink="false">uptime-to-synthetics-guide</guid>
            <pubDate>Thu, 11 Sep 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Effortlessly migrate your existing Uptime TCP, ICMP, and HTTP monitors to Elastic Synthetics with this comprehensive guide, leveraging Private Locations and Synthetics Projects for efficient, future-proof monitoring.]]></description>
            <content:encoded><![CDATA[<p>Have you seen the warning that Uptime is deprecated and want to know how to easily migrate to Synthetics? Then you are in the right place.
Starting with version 8.15.0, uptime checks have been deprecated in favor of synthetic monitoring.</p>
<p>Many users may have a large number of TCP, ICMP, and HTTP monitors and need to migrate them to Synthetics. In this guide, we will explain how to perform this migration easily while ensuring that it will be future-proof or able to develop more advanced checks such as <a href="https://www.elastic.co/docs/solutions/observability/synthetics/#monitoring-synthetics">Browser monitors</a>.</p>
<p>First, we must consider the number of monitors to migrate; if the number is small, the easiest way would be to do it manually through the <a href="https://www.elastic.co/docs/solutions/observability/synthetics/create-monitors-ui">Synthetics UI</a>. However, in this guide we will assume that we have dozens or hundreds of monitors to migrate, and doing it manually in the Synthetics UI is not an option.</p>
<h1>Private Location</h1>
<p>Traditionally, uptime monitors required a <a href="https://www.elastic.co/docs/reference/beats/heartbeat/">Heartbeat</a> to be deployed in your infrastructure, which indirectly allowed you to monitor endpoints or hosts on your private network. If this is still a requirement, you will need to either configure <a href="https://www.elastic.co/docs/solutions/observability/synthetics/monitor-resources-on-private-networks">Private Location</a> or allow Elastic’s global managed infrastructure to <a href="https://www.elastic.co/docs/solutions/observability/synthetics/monitor-resources-on-private-networks#monitor-via-access-control">access your private endpoints</a> (only on <a href="https://www.elastic.co/docs/deploy-manage/deploy/elastic-cloud/cloud-hosted">ECH</a> &amp; <a href="https://www.elastic.co/docs/deploy-manage/deploy/elastic-cloud/serverless">Serverless</a>).</p>
<p>In this guide, we will use Private Locations, which will allow you to monitor both internal and external resources. More details can be found here: <a href="https://www.elastic.co/docs/solutions/observability/synthetics/monitor-resources-on-private-networks#monitor-via-private-agent">Monitor resources on private networks</a></p>
<h2>Step 1: Set up Fleet Server and Elastic Agent</h2>
<p>Private Locations are simply Elastic Agents enrolled in Fleet and managed through an agent policy. </p>
<p>If you don't have a Fleet Server yet, start <a href="https://www.elastic.co/docs/reference/fleet/fleet-server">setting up a Fleet Server</a>. This step is not necessary if you use ECH, as it comes by default.</p>
<p>Next, you will need to create an Agent Policy. Go to <strong>Observability → Monitors (Synthetics) → Settings (top right) → Private Location → + Create Location</strong></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/uptime-to-synthetics-guide/create-private-location.png" alt="Create Private Location" /></p>
<p>Fill in the fields and create a new policy for this Private Location. It is important to know that Private Location should be set up against an agent policy that runs on a <strong>single</strong> Elastic Agent. </p>
<h2>Step 2: Deploy the Elastic Agent</h2>
<p>Now we need to deploy the Elastic Agent that will be responsible for running all the monitors. We can use the same host we were using for Heartbeat. There is only one requirement: we must be able to run Docker containers, since to take advantage of all the features of Synthetics, we must use the <code>elastic-agent-complete</code> Docker Image.</p>
<ol>
<li>
<p>Go to <strong>Fleet –&gt; Enrollment tokens</strong> and note the enrollment token relevant to the policy you just created for the Private Location. Now go to <strong>Settings</strong> and note the default Fleet server host URL.</p>
</li>
<li>
<p>On the host, run the following commands. For more information on running Elastic Agent with Docker, refer to Run Elastic Agent in a container.</p>
</li>
</ol>
<pre><code class="language-sh">docker run \
  --env FLEET_ENROLL=1 \
  --env FLEET_URL={fleet_server_host_url} \
  --env FLEET_ENROLLMENT_TOKEN={enrollment_token} \
  --cap-add=NET_RAW \
  --cap-add=SETUID \
  --rm docker.elastic.co/elastic-agent/elastic-agent-complete:9.3.2
</code></pre>
<h1>Synthetic Project</h1>
<p>At this point, we already have the location from which our Synthetic monitors will run. Now we need to load our Uptime monitors as Synthetics.</p>
<p>As we mentioned earlier, there are two ways to do this: either manually through the Synthetics UI or through a Synthetics Project.
In our case, since we have so many monitors to migrate and don't want to do it manually, we will use <a href="https://www.elastic.co/docs/solutions/observability/synthetics/create-monitors-with-projects">Synthetics Projects</a>. </p>
<p>The great thing about Synthetics Project is that it has some backward compatibility with the definition of monitors in <code>heartbeat.yml</code> and we will be leveraging it.</p>
<h2>What's Synthetics project?</h2>
<p>Synthetics project is the most powerful and flexible way to manage synthetic monitors in Elastic, based on the Infrastructure as Code principle and compatible with Git-Ops flows. Instead of configuring monitors from the interface, you define them as code: .yml files for lightweight monitors and JavaScript or TypeScript scripts for browser-type monitors (journeys).</p>
<p>This approach allows you to structure your monitors in a repository, version them with Git, validate them, and deploy them automatically using CI/CD flows, providing traceability, reviews, and consistent deployments.</p>
<h2>Step 3: Initialize your Synthetics project</h2>
<p>You will no longer need to connect to the hosts where you deployed the Elastic Agent, as the remaining steps can be performed locally as long as you have connectivity to Kibana!</p>
<p>Since Synthetics Projects is based on Node.js, make sure you have it <a href="https://nodejs.org/en/download">installed</a>. </p>
<ol>
<li>Install the package:</li>
</ol>
<pre><code class="language-sh">npm install -g @elastic/synthetics
</code></pre>
<ol start="2">
<li>Confirm your system is setup correctly:</li>
</ol>
<pre><code class="language-sh">npx @elastic/synthetics -h
</code></pre>
<ol start="3">
<li>Start by creating your first Synthetics project. Run the command below to create a new Synthetics project named <code>synthetic-project-test</code> in the current directory.</li>
</ol>
<pre><code class="language-sh">npx @elastic/synthetics init synthetic-project-test
</code></pre>
<ol start="4">
<li>
<p>Follow the prompt instructions to configure the default variables for your Synthetics project. Make sure to at least <strong>select your Private Location.</strong> Once that’s done, set the <code>SYNTHETICS_API_KEY</code> environment variable in your terminal, which allows the project to authenticate with Kibana.</p>
<ol>
<li>
<p>To generate an API key go to Synthetics Kibana.</p>
</li>
<li>
<p>Click <strong>Settings</strong>.</p>
</li>
<li>
<p>Switch to the <strong>Project API Keys</strong> tab.</p>
</li>
<li>
<p>Click <strong>Generate Project API key</strong>.</p>
</li>
</ol>
</li>
</ol>
<p><img src="https://www.elastic.co/observability-labs/assets/images/uptime-to-synthetics-guide/generate-api-key.png" alt="Generate API Key" /></p>
<p>More details for all the steps can be found here: <a href="https://www.elastic.co/docs/solutions/observability/synthetics/create-monitors-with-projects#synthetics-get-started-project-create-a-synthetics-project">Create monitors with a Synthetics project</a></p>
<h2>Step 4: Add your <code>heartbeat.yml</code> files</h2>
<p>Once the project is initialized, access the folder it has created and take a look at the project structure:</p>
<ul>
<li>
<p><code>journeys</code> is where you’ll add .ts and .js files defining your browser monitors. It currently contains files defining sample monitors.</p>
</li>
<li>
<p><code>lightweight</code> is where we’ll add our heartbeat.yml files defining our lightweight monitors. It currently contains a file defining sample monitors.</p>
</li>
</ul>
<p>Therefore, all we have to do is copy our <code>heartbeat.yml</code> files to this lightweight folder. Before copying <code>heartbeat.yml</code>, keep in mind that we don't need all the content, we are only interested in the <code>heartbeat.monitors</code> part. <br />
We recommend considering splitting the file into logical groups. Instead of maintaining a single large YAML file, you could create multiple smaller YAML files, with each file representing either a single check or a group of related checks. This approach may simplify management and improve compatibility with GitOps workflows.<br />
Each YAML file should look like this:</p>
<pre><code>carles@synthetics-migration:synthetic-project-test/lightweight# cat heartbeat.yml

heartbeat.monitors:
- type: icmp
  schedule: '@every 10s'
  hosts: [&quot;localhost&quot;]
  id: my-icmp-service-synth
  name: My ICMP Service - Synthetic
- type: tcp
  schedule: '@every 10s'
  hosts: [&quot;myremotehost:8123&quot;]
  mode: any
  id: my-tcp-service-synth
  name: My TCP Service Synthetic
- type: http
  schedule: '@every 10s'
  urls: [&quot;http://elastic.co&quot;]
  id: my-http-service-synth
  name: My HTTP Service Synthetic
</code></pre>
<p>What we just did is define different ICMP, TCP, and HTTP checks as code.</p>
<p>Now we need to ask Synthetics project to create the monitors in Kibana based on what we have defined in our YAML files:</p>
<pre><code class="language-sh">npx @elastic/synthetics push --auth $SYNTHETICS_API_KEY --url &lt;kibana-url&gt;
</code></pre>
<p>Unfortunately, we do not support a 1-to-1 mapping of the heartbeat schema to the lightweight schema, so you may encounter some errors during the execution of this command. One example is the definition of <code>schedule</code>. Heartbeat supports the use of crontab expressions, but Project requires the use of <code>@every</code> syntax.</p>
<p>If no syntax errors were found, the command output will show that the monitors have been successfully created in Kibana!</p>
<p>Then, go to <strong>Synthetics</strong> in Kibana. You should see your newly pushed monitors running. You can also go to the Management tab to see the monitors' configuration settings.</p>
]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/uptime-to-synthetics-guide/blog-header.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Visualizing OpenTelemetry Data in Elastic with OpenTelemetry Content Packages]]></title>
            <link>https://www.elastic.co/observability-labs/blog/visualizing-opentelemetry-data-elastic-content-packages</link>
            <guid isPermaLink="false">visualizing-opentelemetry-data-elastic-content-packages</guid>
            <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn and explore how OpenTelemetry Content Packages in Elastic provide instant dashboards, alerts, and SLOs for your telemetry data.]]></description>
            <content:encoded><![CDATA[<p>If you've been in the observability space for the last couple of years, you've seen OpenTelemetry go from &quot;promising standard&quot; to the default choice for collecting metrics, logs, and traces. Elastic has been in that journey from early on — which is why we built the <a href="https://www.elastic.co/observability-labs/blog/elastic-distributions-opentelemetry">Elastic Distributions of OpenTelemetry (EDOT)</a>: a hardened, production-ready suite of OTel components including the EDOT Collector and language SDKs, tuned for infrastructure and application monitoring without the typical setup overhead.</p>
<p>EDOT is now generally available. The collector, the SDKs, the whole stack — production-ready, enterprise-supported, no asterisks.</p>
<p>But here's the thing: getting your data into Elastic is only half the job. The harder half, in practice, is what happens after. Someone still has to build the dashboards, write the alert rules, and figure out which SLOs are worth tracking — before any of it is useful.</p>
<p>That gap is what OpenTelemetry Content Packages are designed to close.</p>
<hr />
<h2>What Are OpenTelemetry Content Packages?</h2>
<p>Elastic's traditional Beats-based integrations always bundled data collection and visualizations together — you got curated dashboards and alerts the moment you turned something on. As Elastic moves to an OpenTelemetry-first world, that same philosophy carries over, but the model is cleaner.</p>
<p>OpenTelemetry Content Packs are purely about the observability assets for a given service. No data collection config is bundled in, because in an OTel world, the collector handles that. Each package contains:</p>
<ul>
<li><strong>Dashboards</strong> — curated, pre-built Kibana visualizations tailored to the service being monitored</li>
<li><strong>Alert rules</strong> — pre-configured alerting rules that fire on meaningful thresholds, helping teams minimize Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR)</li>
<li><strong>SLO templates</strong> — ready-made Service Level Objective definitions you can apply immediately to track reliability targets, error budgets, and burn rates</li>
</ul>
<p>More asset types are planned for future packages as the content pack model continues to evolve.</p>
<hr />
<h2>How Does It Work?</h2>
<p>The core idea is simple: as soon as data arrives in Elastic, the right dashboards, alert rules, and SLO templates are ready to use. The content package activates based on the incoming data, regardless of how that data was collected.</p>
<p>One of the most powerful aspects of this system is <strong>automatic installation</strong>. When Elastic detects that data for a particular service has started arriving in Elasticsearch, the corresponding content pack is installed automatically — no manual steps, no hunting through the integrations catalog. By the time you open Kibana, your dashboards are already there waiting for you, your alert rules are ready to be enabled, and your SLO templates are pre-loaded.</p>
<p>To get the data flowing in the first place, we need to configure the collector — a YAML file that defines the building blocks of your telemetry pipeline:</p>
<ul>
<li><strong>Receivers</strong> — define what data to collect and from where. Each service has its own receiver (for example, the MySQL receiver scrapes metrics directly from the database).</li>
<li><strong>Exporters</strong> — define where the collected data is sent. In our case, we use the Elasticsearch exporter, which ships the telemetry data directly into Elasticsearch in OpenTelemetry native format.</li>
<li><strong>Pipelines</strong> — wire the receivers and exporters together, defining the flow of data through the collector.</li>
</ul>
<p>Once this configuration is in place and the collector is running, data starts flowing into Elasticsearch — and the content pack takes it from there.</p>
<h4>Data Sources</h4>
<p>OpenTelemetry data can reach Elastic through any of the following:</p>
<ul>
<li><strong><a href="https://www.elastic.co/observability-labs/blog/elastic-distributions-opentelemetry">EDOT Collector</a></strong> — the Elastic Distribution of the OpenTelemetry Collector, embedded in or used alongside the Elastic Agent</li>
<li><strong><a href="https://github.com/open-telemetry/opentelemetry-collector-contrib">Upstream OTel Collector</a></strong> — the standard community OpenTelemetry Collector (Contrib or custom builds)</li>
<li><strong><a href="https://www.elastic.co/docs/reference/opentelemetry/edot-cloud-forwarder">EDOT Cloud Forwarder (ECF)</a></strong> — a serverless OTel Collector that collects telemetry from AWS, GCP, and Azure (VPC Flow Logs, CloudTrail, CloudWatch, and more) and forwards it directly to Elastic Observability, with no infrastructure to manage</li>
</ul>
<p>The content pack doesn't care how the data arrived — only that it's there.</p>
<hr />
<h2>Seeing It in Practice: MySQL Monitoring</h2>
<p>Take a team running MySQL who wants to track query throughput, connection counts, buffer pool utilization, and slow query rates — and get alerted before small problems turn into 2am incidents. Historically, that means hours of dashboard building, custom alert queries, and a lot of guesswork about which metrics actually matter.</p>
<p>With the <strong><a href="https://www.elastic.co/docs/reference/integrations/mysql_otel">MySQL OpenTelemetry Assets Package</a></strong>, that work is already done. Here's how the whole thing comes together.</p>
<h3>Step 1: Get the Data In</h3>
<p>The data pipeline is driven by a collector configuration that defines receivers (where to scrape data from), processors (how to enrich or transform it), and exporters (where to send it — in this case, Elasticsearch).</p>
<p>Regardless of whether you use the <a href="https://www.elastic.co/observability-labs/blog/elastic-distributions-opentelemetry">EDOT Collector</a> or the <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib">Upstream OTel Collector</a>, the fundamental configuration structure is the same. The configuration below uses separate receivers for the primary and replica instances, because replication metrics are only available on replicas. Replace the placeholders with your actual endpoints, credentials, and Elasticsearch details.</p>
<pre><code class="language-yaml">receivers:
  mysql/primary:
    endpoint: &lt;MYSQL_PRIMARY_ENDPOINT&gt;
    username: &lt;MYSQL_USER&gt;
    password: &lt;MYSQL_PASSWORD&gt;
    collection_interval: 10s
    statement_events:
      digest_text_limit: 120
      limit: 250
    query_sample_collection:
      max_rows_per_query: 100
    events:
      db.server.query_sample:
        enabled: true
      db.server.top_query:
        enabled: true
    metrics:
      mysql.client.network.io:
        enabled: true
      mysql.connection.errors:
        enabled: true
      mysql.max_used_connections:
        enabled: true
      mysql.query.client.count:
        enabled: true
      mysql.query.count:
        enabled: true
      mysql.query.slow.count:
        enabled: true
      mysql.table.rows:
        enabled: true
      mysql.table.size:
        enabled: true

processors:
  resourcedetection:
    detectors: [system, env]

exporters:
  elasticsearch/otel:
    endpoint: &lt;ES_ENDPOINT&gt;
    api_key: &lt;ES_API_KEY&gt;
    mapping:
      mode: otel

service:
  pipelines:
    metrics:
      receivers: [mysql/primary, mysql/replica]
      processors: [resourcedetection]
      exporters: [elasticsearch/otel]
</code></pre>
<p>The <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/mysqlreceiver/README.md#mysql-receiver">MySQL receiver</a> scrapes metrics and events from the database at the configured interval and emits them as OpenTelemetry metrics. These flow through the pipeline and land in Elasticsearch, ready to be visualized.</p>
<h3>Step 2: Open Kibana — Everything's Already There</h3>
<h4>Dashboards</h4>
<p>As soon as the MySQL metrics and events arrive in Elasticsearch, the <a href="https://www.elastic.co/docs/reference/integrations/mysql_otel">MySQL OpenTelemetry Assets Package</a> is automatically installed in the background. By the time you navigate to Kibana, the <a href="https://www.elastic.co/docs/reference/integrations/mysql_otel#screenshots">dashboards</a> are already populated and waiting.</p>
<p>Users immediately get visibility into:</p>
<ul>
<li>Active and max connections</li>
<li>Query throughput — statements executed per second</li>
<li>InnoDB buffer pool hit rate and memory usage</li>
<li>Slow query count and trends</li>
<li>Table lock waits and contention</li>
<li>Bytes sent and received over time</li>
<li>Replication lag (for replicated setups)</li>
</ul>
<p>No manual field mapping. No dashboard building from scratch. Just data in, insights out.</p>
<p>Below are some screenshots of the MySQL OpenTelemetry dashboard in Kibana, showing the out-of-the-box visualizations that are automatically available as soon as your data starts flowing in.</p>
<p>Overview Dashboard
<img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/overview.png" alt="" /></p>
<p>Queries Dashboard
<img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/queries.png" alt="" /></p>
<p>Availability Dashboard
<img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/availability.png" alt="" /></p>
<h4>Alert Rules, Ready to Enable</h4>
<p>The package includes six pre-built <a href="https://www.elastic.co/docs/reference/integrations/mysql_otel#alert-rules">alert rules</a> — covering high connection error rates, slow query spikes, thread saturation, replication lag, buffer pool dirty page ratio, and row lock contention — each with recommended thresholds and severity levels. These are available immediately on install and can be enabled, tuned, and extended directly in Kibana without any custom query authoring. Below is an example of one of the alerts.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/alert1.png" alt="" /></p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/alert2.png" alt="" /></p>
<h4>SLO Templates, Pre-Loaded</h4>
<p>Four <a href="https://www.elastic.co/docs/reference/integrations/mysql_otel#slo-templates">SLO templates</a> are included out of the box, tracking replication lag, connection exhaustion errors, slow query rate, and connected thread count — each with a pre-configured target and 30-day rolling window. Teams can adopt them as-is or tune the thresholds to match their own reliability requirements.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/slo.png" alt="" /></p>
<hr />
<h2>What's Available Today</h2>
<p>The MySQL OpenTelemetry Assets Package is just one example from a growing library of OpenTelemetry Content Packages that Elastic has already built out. Content packs are available for a range of services — and we have also started extending this to the cloud, with initial support for Cloud Service Provider integrations that use the <a href="https://www.elastic.co/docs/reference/opentelemetry/edot-cloud-forwarder">EDOT Cloud Forwarder (ECF)</a> to bring AWS, GCP, and Azure telemetry into Elastic with ready-made dashboards.</p>
<p><img src="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/contentpacks.png" alt="" /></p>
<p>The same pattern holds across all of them — data in, and a complete observability package (dashboards, alert rules, SLO templates) instantly ready — whether you're monitoring a self-managed database or cloud-native services from your preferred cloud service provider.</p>
<h2>Where This Is Going</h2>
<p>The next step worth watching is <strong>OTel Integration Packages</strong>, which will let you push collector configurations directly from the Kibana UI — making the entire setup experience point-and-click, from data collection through to visualization, with no YAML editing required.</p>
<hr />
<h2>Get Started</h2>
<p>Ready to try it? Start with the <a href="https://www.elastic.co/observability-labs/blog/elastic-distributions-opentelemetry">EDOT Collector documentation</a> and explore the growing library of OpenTelemetry content packages in Kibana's Integrations page.</p>]]></content:encoded>
            <category>observability-labs</category>
            <enclosure url="https://www.elastic.co/observability-labs/assets/images/visualizing-opentelemetry-data-elastic-content-packages/otelcp.png" length="0" type="image/png"/>
        </item>
    </channel>
</rss>