OpenTelemetry adoption is rapidly increasing and more companies rely on OpenTelemetry to collect observability data. While OpenTelemetry offers clear specifications and semantic conventions to guide telemetry data collection, it also introduces significant flexibility. With high flexibility comes high responsibility — many things can go wrong with OTel-based data collection, easily resulting in mediocre or low-quality telemetry. Poor data quality can hinder backend analysis, confuse users, and degrade system performance. To unlock actionable insights from OpenTelemetry data, maintaining high data quality is essential. The Instrumentation Score initiative addresses this challenge by providing a standardized way to measure OpenTelemetry data quality. Although the specification and tooling are still evolving, the underlying concepts are already compelling. In this blog post, I’ll share my experience experimenting with the Instrumentation Score concept and demonstrate how to use the Elastic Stack — utlizing ES|QL, Kibana Task Manager, and Dashboards — to build a POC for data quality analysis based on this approach within Elastic Observability.
Instrumentation Score - The Power of Rule-based Data Quality Analysis
When you first hear the term "Instrumentation Score", your initial reaction might be: "OK, there's a single, percentage-like metric that tells me my instrumentation (i.e. OTel data) has a score of 60 out of 100. So what? How does it help me?"
However, the Instrumentation Score is much more than just a single number. Its power lies in the individual rules from which the score is calculated. The rule definitions'
As I explored the Instrumentation Score concepts, I developed the following mental model for deriving actionable insights.
The Score
The score itself is an indicator of the quality of your telemetry data. The lower the number, the more room for improvement with your data quality. In general, if a score falls below 75, you should consider fixing your instrumentation and data collection.
Breakdown by Instrumentation Score Rules
Exploring the evaluation results of individual Instrumentation Score rules will give you insights into what is wrong with your data quality. In addition, the rules' rationales explain why the violation of a rule is problematic.
As an example, let's take the
Description:
Traces do not contain orphan spans.
Rationale:
Orphaned spans indicate potential issues in tracing instrumentation or data integrity. This can lead to incomplete or misleading trace data, hindering effective troubleshooting and performance analysis.
If your data violates the
Breakdown by Services
When you have a large system with hundreds or maybe even thousands of entities (such as services, Kubernetes pods, etc.), a binary signal on all of the data — such as "has a certain rule been passed or not" — is not really actionable. Is the data from all services violating a certain rule, or just a small subset of services?
Breaking down rule evaluation by services (and potentially other entity types) may help you to identify where there are issues with data quality. For example, let's assume only one service — the
Once you know which services (or other entities) violate which Instrumentation Score rules, you're very close to actionable insights. However, there are two more things that I found to be extremely useful for data quality analysis when I was experimenting with the Instrumentation Score evaluation: (1) a quantitative indication of the extent, and (2) concrete examples of rule violation occurrences in your data.
Quantifying the Rule Violation Extent
The Instrumentation Score spec already defines an impact level (e.g.
Hence, having a quantitative indication of the extent of a rule violation per service — e.g. "40% of your traces violate
Tangible Examples
Finally, nothing is as meaningful and self-explanatory as tangible, concrete examples from your own data. If the telemetry data of your
Instrumentation Score with Elastic
The Instrumentation Score spec does not prescribe tool usage or implementation details for the calculation of the score and evaluation of the rules. This allows for integrating the Instrumentation Score concept with whatever backend your OpenTelemetry data is being sent to.
With the goal of building a POC for an end-to-end integration of the Instrumentation Score with Elastic Observability, I combined the powerful capabilities of ES|QL with Kibana's task manager and dashboarding features.
Each Instrumentation Score rule can be formulated as an ES|QL query that covers the steps described above:
- rule passed or not
- breakdown by services
- calculation of the extent
- sampling of an example occurrence
Here is an example query for the
FROM logs-*.otel-* METADATA _id
| WHERE data_stream.type == "logs"
AND @timestamp > NOW() - 1h
| EVAL no_sev = severity_number IS NULL OR severity_number == 0
| STATS
logs_wo_severity = COUNT(*) WHERE no_sev,
example = SAMPLE(_id, 1) WHERE no_sev,
total = COUNT(*)
BY service.name
| EVAL rule_passed = (logs_wo_severity == 0),
extent = CASE(total != 0, logs_wo_severity / total, 0.0)
| KEEP rule_passed, service.name, example, extent
These rule evaluation queries are wrapped in a Kibana
With the results stored in dedicated Elasticsearch indices, we can build Dashboards to visualize the Instrumentation Score insights and allow users to troubleshoot their data quality issues.
In this POC I implement subet of instrumentation score rules to prove out the approach.
The Instrumentation Score concept accommodates extension with your own custom rules. I did that in my POC as well to test some quality rules that are not yet formalized as rules in the Instrumentation Score spec, but are important for Elastic Observability to provide the maximum value from the OTel data.
Applying the Instrumentation Score on the OpenTelemetry Demo
The OpenTelemetry Demo is the most-used environment to play around with and showcase OpenTelemetry capabilities. Initially, I thought the demo would be the worst environment to test my Instrumentation Score implementation. After all, it's the showcase environment for OpenTelemetry, and I expected it to have an Instrumentation Score close to 100. Surprisingly, that wasn't the case.
Let's start with the overview.
The Overview
This dashboard shows an overview of the Instrumentation Score results for the OpenTelemetry Demo environment. The first thing you might notice is the very low overall score
The main reason is that Instrumentation Score rules have, by definition, a binary result — passed or not. So it can happen that each service fails a single but distinct rule. Hence, the service score is not perfect but also not too bad. But, from the overall perspective, many rules have failed (each by a different service), hence, leading to a very low overall score.
In the table on the right, we see the results for the individual rules with their description, impact level, and example occurrences. We see that 7 out of 11 implemented rules have failed. Let's pick our favorite example from earlier —
With the dashboard indicating that the rule
For further analysis, we have two ways to drill down: (1) into a specific rule to see which services violate a specific rule, or (2) into a specific service to see which rules are violated by that service.
Rule Drilldown
The following dashboard shows a detailed view into the rule evaluation results for individual rules. In this case we selected rule
In addition to the rule's meta information, such as its description, rationale, and criteria, we see some statistics on the right. For example, we see that 2 services have failed that rule, 16 passed, and for 19 services this rule is not applicable (e.g., because those don't have tracing data). In the table below, we see which two services are impacted by this rule violation: the
With that view, we now know that the
Service Drilldown
To answer the above question we can switch to the
In this dashboard, we see similar information as on the overview dashboard, however, filtered on a single selected service (e.g.,
As you can see, with the Instrumentation Score concept and a few different breakdown views, we were able to pinpoint data quality issues and identify which services and instrumentations need improvement to fix the issues.
Learnings and Observations
My experimentation with the Instrumentation Score was very insightful and showed me the power of this concept — though it's still in its early phase. It is particularly insightful if the implementation and calculation include breakdowns by meaningful entities, such as services, K8s pods, hosts, etc. With such a breakdown, you can narrow down data quality issues to a manageable scope, instead of having to sift through huge amounts of data and entities.
Furthermore, I realized that having some notion of problem extent (per rule and service), as well as concrete examples, helps make the problem more tangible.
Thinking further about the idea of rule violation
Conclusion
The Instrumentation Score is a powerful approach to ensuring a high level of data quality with OpenTelemetry.
Thank you to the maintainers — Antoine Toulme, Daniel Gomez Blanco, Juraci Paixão Kröhling, and Michele Mancioppi — for bringing this great project to life, and to all the contributors for their participation!
With proper implementation of the rules and score calculation, users can easily get actionable insights into what they need to fix in their instrumentation and data collection. The Instrumentation Score rules are in an early stage and are steadily improved and extended. I'm looking forward to what the community will build in the scope of this project in the future, and I hope to intensify my contributions as well.
