Centrally Managing OTel Collectors with Elastic Agent and Fleet

"The dream of OpenTelemetry is vendor-neutral, standardised observability. The challenge nobody mentions is how you operate hundreds, or thousands, of those collectors in production."

OpenTelemetry has won the hearts of the industry. Adoption is accelerating: the CNCF's 2024 Observability survey found OTel to be the fastest-growing project in the foundation's history, with the OTel Collector registering hundreds of millions of downloads. The proposition is compelling: write instrumentation once, ship it anywhere, avoid lock-in.

But here is what every platform team discovers once they cross into production: the collector sprawl problem. Hundreds of collector instances deployed across regions, Kubernetes namespaces, and bare-metal hosts. Configuration drift creeping in. An upgrade that has to be co-ordinated across a fleet of independent processes. A security patch that someone has to manually roll out to each one. And zero visibility into which collectors are running, healthy, or stuck.

This is the gap between "deploying OpenTelemetry" and "operating OpenTelemetry at scale." With Elastic 9.3, Elastic Agent closes that gap entirely. The Elastic Agent is now built on Elastic's Distribution of the OpenTelemetry Collector (EDOT) and, when managed by Fleet, gives platform teams a single control plane for configuring, updating, and monitoring every OTel collector in their estate — all while remaining compatible with the Beats-based integrations they already rely on.

The Collector Sprawl Problem and Why It Matters

OpenTelemetry's success has created a quiet operational debt for many organisations. Individual teams adopt the collector for their services: logs here, metrics there, a custom pipeline for the new microservice. Without a centralised management layer, each of these collectors becomes an independent snowflake: its own config file, its own upgrade cycle, its own failure domain.

The consequences are predictable. Configuration drift means collectors running different versions of the same pipeline, producing subtly incompatible data. Compliance teams ask "show me all the places data is collected and where it goes", and the honest answer is a spreadsheet that's already out of date.

This isn't a niche problem. A Gartner analysis of enterprise observability programmes consistently identifies operational overhead as the top barrier to expanding OTel adoption beyond initial pilots. The technology works. The tooling to manage it at scale is what's been missing.

How Elastic Agent Became an OTel Collector

To understand the significance of this, it helps to understand what Elastic Agent used to be, and what it is now.

Elastic Agent acts as a supervisor process: Before version 9.3, it managed a collection of separate Beats sub-processes (Filebeat, Metricbeat, Winlogbeat and so on), each running its own input/output lifecycle, each consuming its own memory footprint. The agent coordinated them, but the fundamental model was a collection of discrete daemons running under a parent.

With 9.3, that model has been replaced. Elastic Agent is now itself an instance of the EDOT Collector: Elastic's hardened, production-supported distribution of the upstream OTel Collector. The architectural shift has three important consequences.

First, the process model simplifies dramatically. Instead of a supervisor managing multiple sub-process lifecycles, there is a single EDOT Collector process. This means a smaller memory footprint, fewer things that can fail independently, and fewer processes to observe for health and performance.

Second, Beats functionality is preserved, not discarded. Rather than forcing a breaking migration, Elastic has introduced Beats Receivers: beat inputs and processors re-packaged as native OTel receiver components. A Filestream input is enabled by a filebeatreceiver. The same Filebeat configuration YAML you write today is automatically translated into the corresponding EDOT receiver configuration at runtime. Existing integrations, dashboards, and ingest pipelines continue to work without modification.

Third, the agent is now a first-class participant in the OTel ecosystem. It speaks OTLP natively, it runs standard OTel receivers, and it can be configured to sit alongside any other OTel-compatible tool in a modern observability pipeline.

Central Management with Fleet: Configuration, Lifecycle, and Visibility

The architectural shift above would be valuable on its own. But it becomes transformative when combined with Elastic Fleet, the centralised management plane for Elastic Agents.

Fleet gives platform and SRE teams a single console from which to manage every Elastic Agent (and by extension, every EDOT Collector instance) in their estate. The capabilities break into three categories: configuration management, lifecycle management, and fleet-wide observability.

Configuration management at scale

With Fleet, you define an Agent Policy — a declarative description of what a collector should do. What data should it collect? Via which receivers? Where should it export? The policy is authored once in Fleet's UI (or via its API), and pushed automatically to every agent enrolled in that policy. Change the policy, and every affected collector receives the update. No SSH. No Ansible playbook to maintain. No configuration drift.

Fleet pushes policies to enrolled agents across any environment. Agents send heartbeat and health data back, giving a live inventory of every collector in the estate.

Lifecycle management: upgrades, enrolment, and remediation

Perhaps the most operationally significant benefit of Fleet management is lifecycle control. With Fleet, upgrading a collector is a policy action: select the target version, select the scope (all agents, a specific policy group, a canary subset), and click. Fleet orchestrates the rolling upgrade, tracking status per agent and surfacing failures immediately.

This changes the security calculus fundamentally. When a vulnerability is disclosed in the OTel Collector binary, patching is a Fleet operation measured in minutes, not a change-management ceremony measured in days across SSH sessions to individual hosts.

Fleet also handles enrolment and de-enrolment. New hosts added to your infrastructure can be auto-enrolled into the appropriate policy based on tags or deployment tooling. Agents on decommissioned hosts can be removed from Fleet's inventory, ensuring your observability map reflects your actual infrastructure.

Fleet-wide observability of your collectors

Every Fleet-managed Elastic Agent ships monitoring telemetry about itself: CPU and memory consumption, event throughput, error rates, pipeline latency. This data flows into Elastic and is surfaced in the Fleet UI, giving you a live dashboard of every collector in your estate, not just the ones you happen to be watching.

For the first time, "how healthy is my observability pipeline?" becomes a question with a real-time, fleet-wide answer. You can identify agents that have stopped sending data, agents consuming unexpectedly high resources, and agents that have fallen behind on queue processing — before those problems surface as gaps in your monitoring data.

In the near future this capability will be offered to non-Fleet managed agents (aka standalone) and/or 3rd party OTel collectors provided by other vendors. These collectors can be configured via some other means but be monitored in Fleet - from both resource consumption and/or component pipeline health.

The Hybrid Agent: Beats Data and OTel Data, Simultaneously

One of the most practically significant capabilities introduced in 9.3 is what Elastic calls the Hybrid Agent: an Elastic Agent that can run both Beats-based receivers and native OTel receivers in the same pipeline, at the same time. This does not change anything for existing installations.

This matters enormously for real-world adoption. Most organisations arriving at OTel in 2025 and 2026 are not starting from a blank slate. They have years of investment in Beats-based integrations: Filebeat-powered log collection, Metricbeat-powered host metrics, bespoke ingest pipelines in Elasticsearch that normalise and enrich that data into ECS (Elastic Common Schema) format. The business value locked in those integrations (the dashboards, the alerts, the correlation logic) is not something they can afford to throw away in order to "go OTel."

The Hybrid Agent solves this by making the two worlds coexist. For example, in a single agent policy you can simultaneously configure:

A filebeatreceiver collecting application logs in ECS format, routed through your existing ingest pipeline to its existing data stream
A native OTel filelog receiver collecting OTel-native telemetry from your new services instrumented with the OTel SDK, stored in OTel-native data streams without touching ingest pipelines
An OTel hostmetrics receiver collecting system metrics in semantic convention format alongside your existing Metricbeat-derived system metrics

The two lanes are independent. Beats-receiver data travels through ingest pipelines and lands in ECS-formatted data streams, exactly as it always has. Native OTel data follows OTel semantic conventions and is stored directly in OTel-native data streams, bypassing ingest pipelines. Your existing dashboards and alerts continue to work. Your new OTel-native workloads get the full OTel experience. The same agent, the same Fleet policy, the same management console.

This co-existence is the practical answer to the question every platform team eventually faces: "We want to adopt OTel properly but we can't break what we already have." The Hybrid Agent lets you migrate incrementally, service by service, on your timeline.

The Integration Catalogue: Turning Configuration into a One-Click Operation

Configuration management at scale is only as good as the configurations themselves. Elastic's integration catalogue — over 500 packages covering everything from NGINX and PostgreSQL to AWS CloudTrail and Kubernetes — extends naturally to the Hybrid Agent model.

From 9.3 onwards, the catalogue includes OTel integration packages alongside the existing Beats-based ones. Each OTel package contains two components:

An Input package: the configuration for the corresponding OTel receiver (receivers, processors, pipeline wiring), ready to be applied to a Hybrid Agent policy
A Content package: the assets associated with the application: pre-built dashboards, alerts, index templates, and saved queries, all calibrated for OTel semantic convention data

When an operator adds an OTel integration to an Agent Policy in Fleet, the receiver configuration is pushed to all enrolled agents. When those agents start ingesting data and it arrives in Elasticsearch, the content package assets are automatically installed based on metadata in the data received. The dashboard is ready before you've had time to wonder where it is.

The same policy can hold both OTel integrations and legacy Beats integrations. A real-world agent policy might simultaneously collect system metrics via the OTel hostmetrics receiver, application logs via filebeat receiver, and APM data via OTLP — all from one policy, all managed from Fleet, all visible in a unified Kibana experience.

A technical walk through of how this is done for NGINX data collection can be found here for reference. Currently management of Elastic Agents is done via existing Fleet protocols, however in the near future this will move over to OPAMP so that Fleet will be able to provide management to 3rd party OTel collectors as well.

For organisations on platforms not yet in Elastic's OS support matrix, 3rd-party OTel Collectors (such as Red Hat's OpenShift-native collector) can send data to Elastic using the OTLP exporter and be observed alongside all other collectors in their fleet.

What This Means in Practice: A Migration Story

Consider a mid-sized platform team operating 200 Linux hosts across three regions, currently running Elastic Agent 8.x with a mix of Filebeat and Metricbeat integrations. Their new services are being instrumented with the OTel SDK and they want to standardise on OTel going forward without disrupting the monitoring coverage they already have.

With a Fleet-managed upgrade to 9.3, their existing agents become Hybrid Agents automatically. Their Filebeat and Metricbeat configurations are internally translated to Beats receiver configurations and continue to run unmodified. Their existing dashboards still populate. Their ingest pipelines still fire. Nothing breaks.

They then add OTel integration packages to their Fleet policies for each new service. The OTel-instrumented microservices start sending OTLP data, received by native OTel receivers in the same agents. OTel-native dashboards appear automatically in Kibana. They now have both data universes in one place, managed from one console, visible in one interface.

Over the following quarters, as Beats-based integrations for their remaining services are superseded by OTel equivalents in the catalogue, they migrate them one by one, updating the Agent Policy in Fleet and watching the transition happen across all 200 hosts simultaneously, without touching a single one directly.

Looking Forward

Elastic has made a clear architectural bet: OpenTelemetry is the future of observability data collection, and the right response to that future is not to build a parallel OTel tool alongside the existing stack — it is to evolve the existing stack into OTel. The Hybrid Agent and EDOT Collector are the result of that bet.

Fleet central management is the operational layer that makes that bet practical at scale. OpenTelemetry gives you standardised, vendor-neutral instrumentation. Fleet gives you the operational control plane to manage those collectors like the production infrastructure they are, not like artisanal YAML files scattered across your estate.

The collector sprawl problem is solvable. The answer is a managed, policy-driven, centrally observable fleet of EDOT Collectors, and in Elastic 9.3, that answer is production-ready today.