In order to keep pace with demand, most of these online apps and services (for example, mobile applications, web pages, SaaS) are moving to a distributed microservice-based architecture and Kubernetes. Once you’ve migrated your app to the cloud, how do you manage and monitor production, scale, and availability of the service? OpenTelemetry is quickly becoming the de facto standard for instrumentation and collecting application telemetry data for Kubernetes applications.
OpenTelemetry (OTel) is an open source project providing a collection of tools, APIs, and SDKs that can be used to generate, collect, and export telemetry data (metrics, logs, and traces) to understand software performance and behavior. OpenTelemetry recently became a CNCF incubating project and has a significant amount of growing community and vendor support.
While OTel provides a standard way to instrument applications with a standard telemetry format, it doesn’t provide any backend or analytics components. Hence using OTel libraries in applications, infrastructure, and user experience monitoring provides flexibility in choosing the appropriate observability tool of choice. There is no longer any vendor lock-in for application performance monitoring (APM).
Elastic Observability natively supports OpenTelemetry and its OpenTelemetry protocol (OTLP) to ingest traces, metrics, and logs. All of Elastic Observability’s APM capabilities are available with OTel data. Hence the following capabilities (and more) are available for OTel data:
- Service maps
- Service details (latency, throughput, failed transactions)
- Dependencies between services
- Transactions (traces)
- ML correlations (specifically for latency)
- Service logs
In addition to Elastic’s APM and unified view of the telemetry data, you will now be able to use Elastic’s powerful machine learning capabilities to reduce the analysis, and alerting to help reduce MTTR.
Given its open source heritage, Elastic also supports other CNCF based projects, such as Prometheus, Fluentd, Fluent Bit, Istio, Kubernetes (K8S), and many more.
This blog will show:
- How to get a popular OTel instrumented demo app (HipsterShop) configured to ingest into Elastic Cloud through a few easy steps
- Highlight some of the Elastic APM capabilities and features around OTel data and what you can do with this data once it’s in Elastic
In follow-up blogs, we will detail how to use Elastic’s machine learning with OTel telemetry data, how to instrument OTel application metrics for specific languages, how we can support Prometheus ingest through the OTel collector, and more. Stay tuned!
Prerequisites and config
If you plan on following this blog, here are some of the components and details we used to set up the configuration:
- Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
- We used a variant of the ever so popular HipsterShop demo application. It was originally written by Google to showcase Kubernetes across a multitude of variants available, such as the OpenTelemetry Demo App. To use the app, please go here and follow the instructions to deploy.
- Additionally, we are using an OTel manually instrumented version of the application. No OTel automatic instrumentation was used in this blog configuration.
- Location of our clusters. While we used Google Kubernetes Engine (GKE), you can use any Kubernetes platform of your choice.
- While Elastic can ingest telemetry directly from OTel instrumented services, we will focus on the more traditional deployment, which uses the OpenTelemetry Collector.
- Prometheus and FluentD/Fluent Bit — traditionally used to pull all Kubernetes data — is not being used here versus Kubernetes Agents. Follow-up blogs will showcase this.
Here is the configuration we will get set up in this blog:
Setting it all up
Over the next few steps, I’ll walk through:
- Getting an account on Elastic Cloud
- Bringing up a GKE cluster
- Bringing up the application
- Configuring Kubernetes OTel Collector configmap to point to Elastic Cloud
- Using Elastic Observability APM with OTel data for improved visibility
Step 0: Create an account on Elastic Cloud
Follow the instructions to get started on Elastic Cloud.
Step 1: Bring up a K8S cluster
We used Google Kubernetes Engine (GKE), but you can use any Kubernetes platform of your choice.
There are no special requirements for Elastic to collect OpenTelemetry data from a Kubernetes cluster. Any normal Kubernetes cluster on GKE, EKS, AKS, or Kubernetes compliant cluster (self-deployed and managed) works.
Step 2: Load the HipsterShop application on the cluster
Get your application on a Kubernetes cluster in your cloud service of choice or local Kubernetes platform. The application I am using is available here.
Once your application is up on Kubernetes, you will have the following pods (or some variant) running on the default namespace.
kubectl get pods -n default
Output should be similar to the following:
NAME READY STATUS RESTARTS AGE adservice-f9bf94d56-5kt89 1/1 Running 0 41h cartservice-54d5955c59-7lrk9 1/1 Running 0 41h checkoutservice-57b95c78bb-qqcqv 1/1 Running 0 41h currencyservice-6869479db8-7tsnj 1/1 Running 0 43h emailservice-7c95b8766b-mp5vn 1/1 Running 0 41h frontend-5f54bcb7cf-kxwmf 1/1 Running 0 41h loadgenerator-bfb5944b6-2qhnw 1/1 Running 0 43h paymentservice-5bc8f549c8-hkxks 1/1 Running 0 40h productcatalogservice-665f6879d6-kv29f 1/1 Running 0 43h recommendationservice-89bf4bfc5-ztcrr 1/1 Running 0 41h redis-cart-5b569cd47-6wt59 1/1 Running 0 43h shippingservice-768b94fb8d-8hf9c 1/1 Running 0 41hRead more
In this version, we’ve only brought up all the services and the loadgenerator. You’ll notice the OpenTelemetry Collector is not yet brought up. (See next step.)
If you look at the individual service yamls, you will see it's pointing to the OpenTelemetry collector on port 4317.
- name: OTEL_EXPORTER_OTLP_ENDPOINT value: "http://otelcollector:4317"
Port 4317 is the default port OpenTelemetry listens on for telemetry from services. Hence all the services should be pointing to the OTel collector.
Step 3: Bring up the OpenTelemetry Collector pointing to Elastic
As you will see in the otelcollector.yaml file, in the /deploy-with-collector-k8s, there are two specific variables that need setting in the configmap section.
exporters: otlphttp/elastic: endpoint: OTEL_EXPORTER_OTLP_ENDPOINT headers: Authorization: OTEL_EXPORTER_OTLP_HEADERS
OTEL_EXPORTER_OTLP_ENDPOINT is Elastic’s APM Server.
OTEL_EXPORTER_OTLP_ENDPOINT provides your authorization.
For more details on the variables, please review Elastic’s documentation on OTel collector configuration.
Where do you get these values?
In Elastic’s Observability’s UI under APM, +add data, the following screen will show up.
Go under OpenTelemetry:
You will see values to the variables OTEL_EXPORTER_OTLP_ENDPOINT (your Elastic’s APM Server endpoint) and the authorization from OTEL_EXPORTER_OTLP_HEADERS.
When configuring the OTel Collector with Elastic’s APM Server endpoint, there are two options: gRPC and http.
In the otelcollector.yaml here, the exporters are configured with http.
If you want to send with gRPC port to the APM server, then you need to modify the exporters as such:
exporters: otlp/elastic: endpoint: OTEL_EXPORTER_OTLP_ENDPOINT headers: Authorization: OTEL_EXPORTER_OTLP_HEADERS
Note the change from otlphttp to otlp. Once you make the needed changes as noted above, create the otelcollector:
kubectl create -f otelcollector.yaml
Ensure it's up and running properly.
mycomputer% kubectl get pods | grep otelcollector otelcollector-5b87f4f484-4wbwn 1/1 Running 0 18d
Step 4: Open Kibana and use the APM Service Map to view your OTel instrumented Services
In the Elastic Observability UI under APM, select servicemap to see your services.
If you are seeing this, then the OpenTelemetry Collector is sending data into Elastic:
Congratulations, you've instrumented the HipsterShop demo application services for tracing using OpenTelemetry and successfully ingested the telemetry data into the Elastic!
How to configure specific environments
Elastic APM allows you to have multiple applications ingested with the ability to filter based on Environment. Hence if you have dev team 1 and dev team 2 both using the UI, you will need to set the environment variable properly.
Setting the Environment variable for this application is done through the deployment.environment variable in the service yamls.
If you want to change that, then you will have to change the OTEL_RESOURCE_ATTRIBUTES in each of the service yamls in the git for the application for this blog.
- name: OTEL_RESOURCE_ATTRIBUTES Value: "service.name=recommendationservice,service.version=1.0.0,deployment.environment=MY-DEMO"
- name: OTEL_RESOURCE_ATTRIBUTES Value: "service.name=recommendationservice,service.version=1.0.0,deployment.environment=XXX"
To do this across all services, run the following:
sed -i `s/MY-DEMO/XXX/g` *.yaml
Step 5: What can Elastic show me?
Now that the OpenTelemetry data is ingested into Elastic, what can you do?
First, you can view the APM service map (as shown in the previous step) — this will give you a full view of all the services and the transaction flows between services.
Next, you can now check out individual services, and the transactions being collected.
As you can see the frontend details are listed. Everything from:
- Average service latency
- Main transactions
- Failed traction rate
Let’s get to the trace. In the transaction tab, you can review all the types of transactions related to the frontend service:
Selecting /cart/checkout transactions, we can see the full trace with all the spans:
Not only can you review the trace but you can also analyze what is related to higher than normal latency for /chart/checkout.
Elastic uses machine learning to help identify any potential latency issues across the services from the trace. It’s as simple as selecting the Latency Correlations tab and running the correlation.
This shows that transactions from client 10.8.0.16 are potentially having abnormal latency for this transaction.
You can then drill down into logs directly from the trace view and review the logs associated with the trace to help identify and pinpoint potential issues.
Analyze your data with Elastic machine learning (ML)
Once OpenTelemetry metrics are in Elastic, start analyzing your data through Elastic’s ML capabilities.
A great review of these features can be found here: Correlating APM telemetry to determine root causes in transactions.
And there are many more videos and blogs on Elastic’s Blog.
We’ll follow up with additional blogs on leveraging Elastic’s machine learning capabilities for OpenTelemetry data.
I hope you’ve gotten an appreciation for how Elastic Observability can help you ingest and analyze OpenTelemetry data with Elastic’s APM capabilities.
A quick recap of lessons and more specifically learned:
- How to get a popular OTel instrumented demo app (HipsterShop) configured to ingest into Elastic Cloud, through a few easy steps.
- Highlight some of the Elastic APM capabilities and features around OTel data and what you can do with this once it’s in Elastic
Ready to get started? Sign up for Elastic Cloud and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.