Native OpenTelemetry support in Elastic Observability

ecs-otel-announcement-2.jpg

OpenTelemetry is more than just becoming the open ingestion standard for observability. As one of the major Cloud Native Computing Foundation (CNCF) projects, with as many commits as Kubernetes, it is gaining support from major ISVs and cloud providers delivering support for the framework. Many global companies from finance, insurance, tech, and other industries are starting to standardize on OpenTelemetry. With OpenTelemetry, DevOps teams have a consistent approach to collecting and ingesting telemetry data providing a de-facto standard for observability. 

Elastic® is strategically standardizing on OpenTelemetry for the main data collection architecture for observability and security. Additionally, Elastic is making a commitment to help OpenTelemetry become the best de facto data collection infrastructure for the observability ecosystem. Elastic is deepening its relationship with OpenTelemetry beyond the recent contribution of Elastic Common Schema (ECS) to OpenTelemetry (OTel).

Today, Elastic supports OpenTelemetry natively, since Elastic 7.14, by being able to directly ingest OpenTelemetry protocol (OTLP) based traces, metrics, and logs.

otel configuration options

In this blog, we’ll review the current OpenTelemetry support provided by Elastic, which includes the following:

Ingesting OpenTelemetry into Elastic

If you’re interested in seeing how simple it is to ingest OpenTelemetry traces and metrics into Elastic, follow the steps outlined in this blog.

Let’s outline what Elastic provides for ingesting OpenTelemetry data. Here are all your options:

flowchart

Using the OpenTelemetry Collector

When using the OpenTelemetry Collector, which is the most common configuration option, you simply have to add two key variables. 

The instructions utilize a specific opentelemetry-collector configuration for Elastic. Essentially, the Elastic values.yaml file specified in the elastic/opentelemetry-demo configure the opentelemetry-collector to point to the Elastic APM Server using two main values:

OTEL_EXPORTER_OTLP_ENDPOINT is Elastic’s APM Server
OTEL_EXPORTER_OTLP_HEADERS Elastic Authorization 

These two values can be found in the OpenTelemetry setup instructions under the APM integration instructions (Integrations->APM) in your Elastic Cloud.

Native OpenTelemetry agents embedded in code

If you are thinking of using OpenTelemetry libraries in your code, you can simply point the service to Elastic’s APM server, because it supports native OLTP protocol. No special Elastic conversion is needed.

To demonstrate this effectively and provide some education on how to use OpenTelemetry, we have two applications you can use to learn from:

  • Elastic’s version of OpenTelemetry demo: As with all the other observability vendors, we have our own forked version of the OpenTelemetry demo. 
  • Elastiflix: This demo application is an example to help you learn how to instrument on various languages and telemetry signals.

Check out our blogs on using the Elastiflix application and instrumenting with OpenTelemetry:

We have created YouTube videos on these topics as well:

Given Elastic and OpenTelemetry’s vast user base, these provide a rich source of education for anyone trying to learn the intricacies of instrumenting with OpenTelemetry.

Elastic Agents supporting OpenTelemetry

If you’ve already implemented OpenTelemetry, you can still use them with OpenTelemetry. Elastic APM agents today are able to ship OpenTelemetry spans as part of a trace. This means that if you have any component in your application that emits an OpenTelemetry span, it’ll be part of the trace the Elastic APM agent captures.

OpenTelemetry logs in Elastic

If you look at OpenTelemetry documentation, you will see that a lot of language libraries are still in experimental or not implemented yet state. Java is in stable state, per the documentation. Depending on your service’s language, and your appetite for adventure, there exist several options for exporting logs from your services and applications and marrying them together in your observability backend.

In a previous blog, we discussed 3 different configurations to properly get logging data into Elastic for Java. The blog explores the current state of the art of OpenTelemetry logging and provides guidance on the available approaches with the following tenants in mind:

  • Correlation of service logs with OTel-generated tracing where applicable
  • Proper capture of exceptions
  • Common context across tracing, metrics, and logging
  • Support for slf4j key-value pairs (“structured logging”)
  • Automatic attachment of metadata carried between services via OTel baggage
  • Use of an Elastic Observability backend
  • Consistent data fidelity in Elastic regardless of the approach taken

Three models, which are covered in the blog, currently exist for getting your application or service logs to Elastic with correlation to OTel tracing and baggage:

  • Output logs from your service (alongside traces and metrics) using an embedded OpenTelemetry Instrumentation library to Elastic via the OTLP protocol
  • Write logs from your service to a file scrapped by the OpenTelemetry Collector, which then forwards to Elastic via the OTLP protocol
  • Write logs from your service to a file scrapped by Elastic Agent (or Filebeat), which then forwards to Elastic via an Elastic-defined protocol

Note that (1), in contrast to (2) and (3), does not involve writing service logs to a file prior to ingestion into Elastic.

OpenTelemetry is Elastic’s preferred schema

Elastic recently contributed the Elastic Common Schema (ECS) to the OpenTelemetry (OTel) project, enabling a unified data specification for security and observability data within the OTel Semantic Conventions framework. 

ECS, an open source specification, was developed with support from the Elastic user community to define a common set of fields to be used when storing event data in Elasticsearch®. ECS helps reduce management and storage costs stemming from data duplication, improving operational efficiency.

Similarly, OTel’s Semantic Conventions (SemConv) also specify common names for various kinds of operations and data. The benefit of using OTel SemConv is in following a common naming scheme that can be standardized across a codebase, libraries, and platforms for OTel users.

The merging of ECS and OTel SemConv will help advance OTel’s adoption and the continued evolution and convergence of observability and security domains.

Elastic Observability APM and machine learning capabilities

All of Elastic Observability’s APM capabilities are available with OTel data (read more on this in our blog, Independence with OpenTelemetry):

  • Service maps
  • Service details (latency, throughput, failed transactions)
  • Dependencies between services
  • Transactions (traces)
  • ML correlations (specifically for latency)
  • Service logs
services

In addition to Elastic’s APM and unified view of the telemetry data, you will now be able to use Elastic’s powerful machine learning capabilities to reduce the analysis, and alerting to help reduce MTTR. Here are some of the ML based AIOps capabilities we have:

  • Anomaly detection: Elastic Observability, when turned on (see documentation), automatically detects anomalies by continuously modeling the normal behavior of your OpenTelemetry data — learning trends, periodicity, and more.
  • Log categorization: Elastic also identifies patterns in your OpenTelemetry log events quickly, so that you can take action quicker. 
  • High-latency or erroneous transactions: Elastic Observability’s APM capability helps you discover which attributes are contributing to increased transaction latency and identifies which attributes are most influential in distinguishing between transaction failures and successes. 
  • Log spike detector helps identify reasons for increases in OpenTelemetry log rates. It makes it easy to find and investigate causes of unusual spikes by using the analysis workflow view. 
  • Log pattern analysis helps you find patterns in unstructured log messages and makes it easier to examine your data.

Elastic allows you to migrate to OTel on your schedule

Although OpenTelemetry supports many programming languages, the status of its major functional components — metrics, traces, and logs — are still at various stages. Thus migrating applications written in Java, Python, and JavaScript are good choices to start with as their metrics, traces, and logs (for Java) are stable. 

For the other languages that are not yet supported, you can easily instrument those using Elastic Agents, therefore running your full stack observability platform in mixed mode (Elastic agents with OpenTelemetry agents).

Here is a simple example:

services 2

The above shows a simple variation of our standard Elastic Agent application with one service flipped to OTel — the newsletter-otel service. But we can easily and as needed convert each of these services to OTel as development resources allow.

Hence you can migrate what you need to OpenTelemetry with Elastic as specific languages reach a stable state, and you can then continue your migration to OpenTelemetry agents.

Integrated Kubernetes and OpenTelemetry views in Elastic

Elastic manages your Kubernetes cluster using the Elastic Agent, and you can use it on your Kubernetes cluster where your OpenTelemetry application is running. Hence you can not only use OpenTelemetry for your application, but Elastic can also monitor the corresponding Kubernetes cluster. 

There are two configurations for Kubernetes:

1. Simply deploying the Elastic Agent daemon set on the kubernetes cluster. We outline this out in the article entitled Managing your Kubernetes cluster with Elastic Observability. This would also push just the Kubernetes metrics and logs to Elastic.

elastic cloud nodes

2. Deploying the Elastic Agent with not only the Kubernetes Daemon set, but also Elastic’s APM integration, the Defend (Security) integration, and Network Packet capture integration to provide more comprehensive Kubernetes cluster observability. We outline this configuration in the following article Modern observability and security on Kubernetes with Elastic and OpenTelemetry.

flowchart

Both OpenTelemetry visualization examples use the OpenTelemetry demo, and in Elastic, we tie the Kubernetes information with the application to provide you an ability to see Kubernetes information from your traces in APM. This provides a more integrated approach when troubleshooting.

pod details

Summary

In essence, Elastic's commitment goes beyond mere support for OpenTelemetry. We are dedicated to ensuring our customers not only adopt OpenTelemetry but thrive with it. Through our solutions, expertise, and resources, we aim to elevate the observability journey for every business, turning data into actionable insights that drive growth and innovation.

Don’t have an Elastic Cloud account yet? Sign up for Elastic Cloud and try out the instrumentation capabilities that I discussed above. I would be interested in getting your feedback about your experience in gaining visibility into your application stack with Elastic.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.