How we fixed head-based sampling in OpenTelemetry

Head-based sampling in OpenTelemetry is cheap and practical, but it used to create a major analytics problem: sampled traces reduced raw span counts, so backend throughput charts became wrong. The fix was to carry sampling probability in tracestate so a backend can estimate how many original traces each sampled trace represents. This article explains the problem, the spec, and how we implemented the fix in OpenTelemetry Java, JavaScript, and Python.

Why sampling creates a throughput problem

Most production systems sample traces because sending every span is expensive. Two common approaches are:

Head-based sampling: decide at trace start whether to keep or drop the trace.
Tail-based sampling: decide later, after seeing more or all spans from the trace.

Head-based sampling is fast and low-cost because it decides early. But if your backend only sees 10% of traces, a naive throughput chart built from ingested traces can undercount real traffic by 10x.

In other words, without extra metadata, sampled telemetry loses the context needed to reconstruct volume-based metrics.

The OpenTelemetry spec that solves it

The OpenTelemetry specification defines a way to encode probability sampling information in tracestate:

At a high level, the sampler writes enough information into tracestate for downstream systems to understand the effective sampling probability of a trace. When this metadata is present and propagated correctly, throughput and rate-oriented analytics can stay accurate while still getting the cost benefits of head-based sampling. Elastic Observability supports this spec and behavior out of the box. If you use Elastic's distribution (EDOT) SDKs or correctly configure the upstream OpenTelemetry SDKs as described below, Elastic can estimate the original throughput metrics from sampled data.

For example, a sampled span might carry this entry:

tracestate: ot=th:fd70a4;rv:fe123456789abc

Using the spec rules:

th is the rejection threshold (T) with trailing zeros removed.
rv is the 56-bit randomness value (R).
A participant keeps the span when R >= T.

Here, th:fd70a4 expands to T = 0xfd70a400000000, and rv gives R = 0xfe123456789abc, so the span is kept because R >= T.

In decimal, that is:

T = 0xfd70a400000000 = 71,337,018,784,743,424
R = 0xfe123456789abc = 71,514,660,082,850,492
2^56 = 72,057,594,037,927,936

Since 71,514,660,082,850,492 >= 71,337,018,784,743,424, this trace is sampled.

The backend can convert T into representative count (adjusted count):

probability = (2^56 - T) / 2^56
adjusted_count = 1 / probability = 2^56 / (2^56 - T)

Plugging in the values:

probability = (72,057,594,037,927,936 - 71,337,018,784,743,424) / 72,057,594,037,927,936
            = 720,575,253,184,512 / 72,057,594,037,927,936
           ~= 0.01

adjusted_count = 1 / 0.01 = 100

So in practice this is approximately 1% sampling, and each sampled span represents about 100 original spans.

That means a backend can do weighted calculations, for example:

extrapolated_throughput = sampled_throughput * adjusted_count

What was missing before

A few months ago, OpenTelemetry SDK users had the spec but no out-of-the-box implementation in major SDKs. Hence, spans received at a backend didn't carry the sampling metadata that would allow the backend to estimate the original trace volume. In practice, this meant teams adopting head-based sampling had limited options:

accept skewed throughput numbers,
build custom sampler logic, or
switch to more complex sampling setups.

For many teams, that made standard head-based sampling much less useful than it should be.

The fix: implementation across Java, JavaScript, and Python

We implemented the spec-aligned behavior in three SDKs so teams can use standardized sampling metadata instead of custom workarounds.

Java: open-telemetry/opentelemetry-java#7626
JavaScript: open-telemetry/opentelemetry-js#5839
Python: open-telemetry/opentelemetry-python#4714

All three PRs implemented the composite/probability sampling behavior so the root sampling decision is represented in tracestate and can be preserved across service boundaries.

What changed conceptually

The important shift is not "sample more" or "sample less". It is:

keep probabilistic head-based sampling,
propagate probability metadata with the trace,
let backends compute weighted rates from sampled data.

This keeps ingestion costs manageable and restores correct aggregate analysis.

Implementation walkthrough

The exact API shape differs by language and release, but the rollout pattern is similar:

Use the SDK sampler implementation that supports the probability/composite spec behavior.
Keep W3C Trace Context propagation enabled so tracestate moves across services.
Validate in your backend that throughput/rate charts use weighted interpretation when sampling metadata exists.

Java

If you are using the Elastic Distribution for OpenTelemetry Java (EDOT Java) the sampler is by default already configured to use the probability/composite spec behavior. By default, EDOT comes with a sampling rate of 100% for all traces. You can change the sampling rate by setting the sampling_rate in central configuration or by setting the otel.traces.sampler.arg Java system property / OTEL_TRACES_SAMPLER_ARG environment variable.

For the upstream OTel Java SDK use the following logic to configure the sampler:

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.extension.incubator.trace.samplers.ComposableSampler;
import io.opentelemetry.sdk.extension.incubator.trace.samplers.CompositeSampler;
import io.opentelemetry.sdk.trace.SdkTracerProvider;

// Use a sampling ratio. For example, 10% sampling:
double ratio = 0.1;

SdkTracerProvider tracerProvider =
    SdkTracerProvider.builder()
        .setSampler(
            CompositeSampler.wrap(
                ComposableSampler.parentThreshold(
                    ComposableSampler.probability(ratio)
                )
            )
        )
        // (other configuration, e.g., span processor, exporter)
        .build();

OpenTelemetry openTelemetry =
    OpenTelemetrySdk.builder()
        .setTracerProvider(tracerProvider)
        .build();

Tracer tracer = openTelemetry.getTracer("my-instrumentation-library");

// You can now start spans with the configured sampler.
// Example:
tracer.spanBuilder("example-span").startSpan();

The example above shows how to configure head-based probability sampling using the OpenTelemetry Java SDK. Let’s break down the key parts:

Importing necessary classes: The imports bring in the required OpenTelemetry APIs and sampler extensions.
Setting the sampling ratio: The ratio variable controls the fraction of traces sampled (for example, 0.1 for 10%).
Sampler configuration:
- CompositeSampler and ComposableSampler are used to set up a sampler that follows the OpenTelemetry specification for composite samplers, enabling more accurate probability-based head sampling.
- ComposableSampler.probability(ratio) specifies that traces are sampled at the configured ratio.
- ComposableSampler.parentThreshold(...) ensures parent sampling decisions are respected, which keeps trace context consistent across service boundaries.
- Wrapping this in CompositeSampler.wrap(...) gives you a sampler compliant with the latest spec.
- In current OTel Java, sampled root spans emit the th value in tracestate, and rv is preserved when it is already present from upstream context.
Tracer provider and OpenTelemetry setup:
- The configured sampler is attached to the SdkTracerProvider which is then built into the OpenTelemetrySdk instance.
Using the configured tracer:
- When you build a span (like tracer.spanBuilder("example-span").startSpan()), the SDK applies your sampling policy as you generate traces.

This pattern ensures that your head-based sampler not only controls costs (by sampling only a percentage of traces) but also carries and respects sampling metadata. This, in turn, enables downstream backends (like Elastic or any OTel-compliant backend) to correctly calculate throughput/volume metrics, accounting for sampling, and provide more accurate operational measurements.

Node.js

If you are using the Elastic Distribution for OpenTelemetry Node.js (EDOT Node) the sampler is by default already configured to use the probability/composite spec behavior. By default, EDOT comes with a sampling rate of 100% for all traces. You can change the sampling rate by setting the sampling_rate in central configuration or by setting the OTEL_TRACES_SAMPLER_ARG environment variable.

For the upstream OTel JavaScript SDK use the following logic to configure the sampler:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const {
  createCompositeSampler,
  createComposableParentThresholdSampler,
  createComposableTraceIDRatioBasedSampler,
} = require('@opentelemetry/sampler-composite');

// Example: sample 10% of new root traces and preserve parent decisions.
const sampler = createCompositeSampler(
  createComposableParentThresholdSampler(
    createComposableTraceIDRatioBasedSampler(0.1)
  )
);

const sdk = new NodeSDK({ sampler });
sdk.start();

This JavaScript snippet demonstrates how to configure head-based probability sampling using the upstream OpenTelemetry JavaScript SDK with the new composite sampler specification.

Imports:
The code imports utility functions—createCompositeSampler, createComposableParentThresholdSampler, and createComposableTraceIDRatioBasedSampler—from the @opentelemetry/sampler-composite extension, which implements the spec-compliant composable sampling logic.
Sampler configuration:
The sampler is constructed to:
- Use createComposableTraceIDRatioBasedSampler(0.1) to sample 10% of all (root) traces.
- Wrap this in createComposableParentThresholdSampler, so sampling respects the decision made by any parent span that might come from upstream (preserving distributed trace context).
- Finally, the whole structure is wrapped in createCompositeSampler, which puts it into the form expected by the OTel SDK.
Usage:
The sampler is passed to the NodeSDK from @opentelemetry/sdk-node at startup. After this, all spans you create in your application will follow this sampling logic.

This approach enables accurate, head-based sampling in OpenTelemetry JavaScript, following the latest OTel specification. It ensures your traces are sampled at the rate you set, while also propagating sampling-related metadata in the tracestate to downstream services and telemetry backends (e.g., Elastic, Jaeger). This is critical for volume adjustment, accurate metrics, and cost control in distributed tracing environments.

Python

If you are using the Elastic Distribution for OpenTelemetry Python (EDOT Python) the sampler is by default already configured to use the probability/composite spec behavior. By default, EDOT comes with a sampling rate of 100% for all traces. You can change the sampling rate by setting the sampling_rate in central configuration or by setting the OTEL_TRACES_SAMPLER_ARG environment variable.

For the upstream OTel Python SDK, register a custom sampler entry point and point OTEL_TRACES_SAMPLER to it:

# pyproject.toml
[project.entry-points.opentelemetry_traces_sampler]
parentbased_composite = "your_package.sampling:ParentBasedCompositeSampler"

# your_package/sampling.py
from __future__ import annotations

from typing import Sequence

from opentelemetry.context import Context
from opentelemetry.sdk.trace._sampling_experimental import (
    composable_parent_threshold,
    composable_traceid_ratio_based,
    composite_sampler,
)
from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult
from opentelemetry.trace import Link, SpanKind, TraceState
from opentelemetry.util.types import Attributes


class ParentBasedCompositeSampler(Sampler):
    # The SDK passes OTEL_TRACES_SAMPLER_ARG as this constructor argument.
    def __init__(self, ratio_str: str | None):
        try:
            ratio = float(ratio_str) if ratio_str else 1.0
        except ValueError:
            ratio = 1.0
        self._delegate = composite_sampler(
            composable_parent_threshold(composable_traceid_ratio_based(ratio))
        )

    def should_sample(
        self,
        parent_context: Context | None,
        trace_id: int,
        name: str,
        kind: SpanKind | None = None,
        attributes: Attributes | None = None,
        links: Sequence[Link] | None = None,
        trace_state: TraceState | None = None,
    ) -> SamplingResult:
        return self._delegate.should_sample(
            parent_context,
            trace_id,
            name,
            kind,
            attributes,
            links,
            trace_state,
        )

    def get_description(self) -> str:
        return self._delegate.get_description()

export OTEL_TRACES_SAMPLER=parentbased_composite
export OTEL_TRACES_SAMPLER_ARG=0.10

This Python snippet demonstrates how to configure probability-based head sampling in the OpenTelemetry Python SDK using a custom sampler that supports the tracestate probability propagation spec.

Here's what happens in the example:

It imports experimental sampling APIs from opentelemetry.sdk.trace._sampling_experimental to build a composite sampler that encodes the sampling probability in the tracestate of each root span. This supports backend throughput correction.
The ParentBasedCompositeSampler class is a wrapper you can plug into the SDK. Its constructor accepts the sampling probability as a string (from OTEL_TRACES_SAMPLER_ARG, but defaults to 1.0 = 100% sampling).
composite_sampler(composable_parent_threshold(composable_traceid_ratio_based(ratio))) builds a sampler that:
1. Uses the probability to sample root spans.
2. Propagates the root sampling decision for child spans via parent-based threshold logic.
3. Embeds and respects the OpenTelemetry probability fields in tracestate.
The example then shows the required environment variables to enable this sampling logic:
```
export OTEL_TRACES_SAMPLER=parentbased_composite
export OTEL_TRACES_SAMPLER_ARG=0.10
```
With these values, you enable parent-based composite sampling with a 10% sampling rate.

This snippet enables standards-compliant probability sampling and propagation in OpenTelemetry Python, so throughput metrics can be accurately estimated by your backend based on the tracestate metadata.

Validation

To validate your setup and ensure accurate throughput metrics when using head-based sampling, follow these steps:

Deploy the SDK in a controlled environment where you can manage both load and sampling rate.
Generate a steady, predictable load with a known throughput.
Set a fixed sampling rate for traces.
Send the resulting telemetry data to Elastic Observability.
Confirm that the reported throughput metrics in Elastic match your expectations.

You can use the following ES|QL commands to compare the observed raw throughput (counted from sampled traces) to the extrapolated throughput available in the derived metrics:

Raw throughput:

FROM traces-* 
| WHERE service.name == "your-service"
| WHERE transaction.name IS NOT NULL
| STATS count_transactions = COUNT(*),
    time_range = DATE_DIFF("minute", MIN(@timestamp), MAX(@timestamp))
| EVAL raw_throughput_per_min = count_transactions::double / time_range

Extrapolated throughput:

FROM metrics-*
| WHERE service.name == "your-service"
| WHERE metricset.name == "service_transaction" AND metricset.interval == "1m"
| STATS count_transactions = COUNT(transaction.duration.summary),
    time_range = DATE_DIFF("minute", MIN(@timestamp), MAX(@timestamp))
| EVAL extrapolated_throughput_per_min = count_transactions::double / time_range

The extrapolated_throughput_per_min should be close to your real throughput rate, while the raw_throughput_per_min should be close to the your configured sampling rate.

Conclusion

This work turns head-based sampling into a much safer default for teams that need both cost control and reliable operational metrics. You no longer have to choose between affordable trace volume and trustworthy throughput calculations.

Before enabling head-based sampling with probability propagation, review the linked PRs and consult each SDK’s release notes to confirm the minimum version with full support (Java, JavaScript, and Python). Start by enabling sampling-aware configuration in a single service, and once you verify correct tracestate propagation and backend metric accuracy, gradually roll out the change to additional services. To maintain reliability, establish monitoring and automated regression tests for throughput accuracy, so you can spot any unintended metric drift when sampling rates or SDK components are updated.

How we fixed head- based sampling in OpenTelemetry