﻿---
title: Tail-based sampling
description: Tail-based sampling configuration options. See Tail-based sampling to learn more. Set to true to enable tail based sampling. Disabled by default. (bool)...
url: https://www.elastic.co/docs/solutions/observability/apm/apm-server/tail-based-sampling
products:
  - APM
  - Elastic Observability
applies_to:
  - Elastic Stack: Generally available
---

# Tail-based sampling
<note>
  ![supported deployment methods](https://www.elastic.co/docs/solutions/images/observability-binary-yes-fm-yes.svg)Most options on this page are supported by all APM Server deployment methods when writing to Elasticsearch. If you are using a different [output](https://www.elastic.co/docs/solutions/observability/apm/apm-server/configure-output), tail-based sampling is *not* supported.
</note>

<note>
  Enhanced privileges are required to use tail-based sampling. For more information, refer to [Create a tail-based sampling role](/docs/solutions/observability/apm/create-assign-feature-roles-to-apm-server-users#apm-privileges-tail-based-sampling).
</note>

<note>
  If you're manually configuring `LimitNOFILE` or `LimitNOFILESoft` in systemd and using tail-based sampling, the APM Server process might encounter a `too many open files` error. Refer to [Configuring the `nofile` limit](/docs/solutions/observability/apm/apm-server/systemd#configuring-nofile-limit) for more information.
</note>

<note>
  If you're running the APM Server binary standalone (not using the provided deb, RPM packages, or Docker image), you might need to adjust the `nofile` limit based on your throughput requirements. Refer to [Modify the `nofile` ulimit](/docs/solutions/observability/apm/apm-server/binary#modify-nofile-ulimit) for guidance.
</note>

Tail-based sampling configuration options.
<tab-set>
  <tab-item title="APM Server binary">
    **Example config file:**
    ```yaml
    apm-server:
      sampling:
        tail:
          enabled: true
          interval: 1m
          storage_limit: 0GB
          policies:
            - sample_rate: 1.0
              trace.outcome: failure
            - sample_rate: 0.1
    ```
  </tab-item>

  <tab-item title="Fleet-managed">
    Configure and customize Fleet-managed APM settings directly in Kibana:
    1. In Kibana, find **Fleet** in the main menu or use the [global search field](https://www.elastic.co/docs/explore-analyze/find-and-organize/find-apps-and-objects).
    2. Under the **Agent policies** tab, select the policy you would like to configure.
    3. Find the Elastic APM integration and select **Actions** > **Edit integration**.
    4. Look for these options under **Tail-based sampling**.
  </tab-item>
</tab-set>


## Top-level tail-based sampling settings

See [Tail-based sampling](/docs/solutions/observability/apm/transaction-sampling#apm-tail-based-sampling) to learn more.

### Enable tail-based sampling

Set to `true` to enable tail based sampling. Disabled by default. (bool)

|                   |                                    |
|-------------------|------------------------------------|
| APM Server binary | `apm-server.sampling.tail.enabled` |
| Fleet-managed     | `Enable tail-based sampling`       |


### Interval

Synchronization interval for multiple APM Servers. Should be in the order of tens of seconds or low minutes. Default: `1m` (1 minute). (duration)

|                   |                                     |
|-------------------|-------------------------------------|
| APM Server binary | `apm-server.sampling.tail.interval` |
| Fleet-managed     | `Interval`                          |


### TTL

Time-to-live (TTL) for trace events stored in the local storage of the APM Server during tail-based sampling. This TTL determines how long trace events are retained in the local storage while waiting for a sampling decision to be made. A greater TTL value increases storage space requirements. Should be at least 2 * Interval (`apm-server.sampling.tail.interval`).
Default: `30m` (30 minutes). (duration)

|                                                                                     |                                |
|-------------------------------------------------------------------------------------|--------------------------------|
| APM Server binary                                                                   | `apm-server.sampling.tail.ttl` |
| Fleet-managed <applies-to>Elastic Stack: Generally available since 9.1</applies-to> | `TTL`                          |


### Policies

Criteria used to match a root transaction to a sample rate.
Policies map trace events to a sample rate. Each policy must specify a sample rate. Trace events are matched to policies in the order specified. All policy conditions must be true for a trace event to match. Each policy list should conclude with a policy that only specifies a sample rate. This final policy is used to catch remaining trace events that don’t match a stricter policy. (`[]policy`)

|                   |                                     |
|-------------------|-------------------------------------|
| APM Server binary | `apm-server.sampling.tail.policies` |
| Fleet-managed     | `Policies`                          |


### Discard On Write Failure

Defines the indexing behavior when trace events fail to be written to storage (for example, when the storage limit is reached). When set to `false`, traces bypass sampling and are always indexed, which significantly increases the indexing load. When set to `true`, traces are discarded, causing data loss which can result in broken traces. The default is `false`.
Default: `false`. (bool)

|                                                                                     |                                                     |
|-------------------------------------------------------------------------------------|-----------------------------------------------------|
| APM Server binary                                                                   | `apm-server.sampling.tail.discard_on_write_failure` |
| Fleet-managed <applies-to>Elastic Stack: Generally available since 9.1</applies-to> | `Discard On Write Failure`                          |


### Storage limit

The amount of storage space allocated for trace events matching tail sampling policies. Caution: Setting this limit higher than the allowed space may cause APM Server to become unhealthy.
A value of `0GB` (or equivalent) does not set a concrete limit, but rather allows the APM Server to align its disk usage with the disk size. APM server uses up to 80% of the disk size limit on the disk where the local tail-based sampling database is located. The last 20% of disk will not be used by APM Server. It is the recommended value as it automatically scales with the disk size.
If this is not desired, a concrete `GB` value can be set for the maximum amount of disk used for tail-based sampling.
If the configured storage limit is insufficient, it logs "configured limit reached". When the storage limit is reached, the event will be indexed or discarded based on the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) configuration.
Default: `0GB`. (text)

|                   |                                          |
|-------------------|------------------------------------------|
| APM Server binary | `apm-server.sampling.tail.storage_limit` |
| Fleet-managed     | `Storage limit`                          |


## Policy-level tail-based sampling settings

See [Tail-based sampling](/docs/solutions/observability/apm/transaction-sampling#apm-tail-based-sampling) to learn more.

### **`sample_rate`**

The sample rate to apply to trace events matching this policy. Required in each policy.
The sample rate must be greater than or equal to `0` and less than or equal to `1`. For example, a `sample_rate` of `0.01` means that 1% of trace events matching the policy will be sampled. A `sample_rate` of `1` means that 100% of trace events matching the policy will be sampled. (float)

### **`trace.name`**

The trace name for events to match a policy. A match occurs when the configured `trace.name` matches the `transaction.name` of the root transaction of a trace. A root transaction is any transaction without a `parent.id`. (string)

### **`trace.outcome`**

The trace outcome for events to match a policy. A match occurs when the configured `trace.outcome` matches a trace’s `event.outcome` field. Trace outcome can be `success`, `failure`, or `unknown`. (string)

### **`service.name`**

The service name for events to match a policy. (string)

### **`service.environment`**

The service environment for events to match a policy. (string)

## Monitoring tail-based sampling

APM Server produces metrics to monitor the performance and estimate the workload being processed by tail-based sampling. In order to use these metrics, you need to [enable monitoring for the APM Server](https://www.elastic.co/docs/solutions/observability/apm/apm-server/monitor). The following metrics are produced by the tail-based sampler (note that the metrics might have a different prefix,  for example `beat.stats` for ECH deployments, based on how the APM Server is running):

### `apm-server.sampling.tail.dynamic_service_groups`

This metric tracks the number of dynamic services that the tail-based sampler is tracking per policy. Dynamic services are created for tail-based sampling policies that are defined without a `service.name`.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.processed`

This metric tracks the total number of events (including both transaction and span) processed by the tail-based sampler.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.stored`

This metric tracks the total number of events stored by the tail-based sampler in the database. Events are stored when the full trace is not yet available to make the sampling decision. This value is directly proportional to the storage required by the tail-based sampler to function.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.dropped`

This metric tracks the total number of events dropped by the tail-based sampler. Only the events that are actually dropped by the tail-based sampler are reported as dropped. Additionally, any events that were stored by the processor but never indexed will not be counted by this metric.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.failed_writes`

This metric tracks the total number of events that failed to be written to the tail-based sampling storage. Failed writes typically occur when the storage limit is reached or when there are issues with the local sampling database.
The value of this metric should be 0 if tail-based sampling is functioning properly. If it is consistently increasing, check for misconfigured [storage limit](#sampling-tail-storage_limit-ref).
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.sampled`

This metric tracks the total number of events that were sampled (kept) by the tail-based sampler after applying the configured policies and were selected for indexing. This includes all events that belong to traces that matched tail-based sampling policies.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.events.head_unsampled`

This metric tracks the total number of events that were already unsampled by head-based sampling before reaching the tail-based sampler. These events are processed by the tail-based sampler but are not stored or indexed because they were already filtered out by head-based sampling decisions.
This is a counter metric, so it should be visualized with `counter_rate`.

### `apm-server.sampling.tail.storage.lsm_size`

This metric tracks the storage size of the log-structured merge trees used by the tail-based sampling database in bytes. Starting in version 9.0.0, this metric is effectively equal to the total storage size used by the database. This is the most crucial metric to track storage requirements for tail-based sampler, especially for big deployments with large distributed traces. Deployments using tail-based sampling extensively should set up alerts and monitoring on this metric.
This metric can also be used to get an estimate of the storage requirements for tail-based sampler before increasing load by extrapolating the metric based on the current usage. It is important to note that before doing any estimation the tail-based sampler should be allowed to run for at least a few TTL cycles and that the estimate will only be useful for similar load patterns.

### `apm-server.sampling.tail.storage.value_log_size`

This metric tracks the storage size for value log files used by the previous implementation of a tail-based sampler. This metric was deprecated in 9.0.0 and should always report `0`.

## Frequently Asked Questions (FAQ)

<dropdown title="Why doesn't the sampling rate shown in Storage Explorer match the configured tail sampling rate?">
  In APM Server, the tail sampling policy applied to a distributed trace is determined by evaluating the configured policies in order against the root transaction (the transaction without a parent). To learn more about how tail sampling policies are applied, see the examples in [Configure Tail-based sampling](/docs/solutions/observability/apm/transaction-sampling#apm-configure-tail-based-sampling).In contrast, the APM UI Storage Explorer calculates the effective average sampling rate for each service using a different method. It considers both head-based and tail-based sampling, but does not account for root transactions. As a result, the sampling rate displayed in Storage Explorer may differ from the configured tail sampling rate, which can give the false impression that tail-based sampling is not functioning correctly.For more information, check the related [Kibana issue](https://github.com/elastic/kibana/issues/226600).
</dropdown>

<dropdown title="Why do transactions disappear after enabling tail-based sampling?">
  If a transaction is consistently not sampled after enabling tail-based sampling, verify that your instrumentation is not missing root transactions (transactions without a parent). APM Server makes sampling decisions when a distributed trace ends, which occurs when the root transaction ends. If the root transaction is not received by APM Server, it cannot make a sampling decision and will silently drop all associated trace events.This issue often arises when it is assumed that a particular service (e.g., service A) always produces the root transaction, but in reality, another service (e.g., service B) may precede it. If service B is not instrumented or sends data to a different APM Server cluster, the root transaction will be missing. To resolve this, ensure that all relevant services are instrumented and send data to the same APM Server cluster, or adjust the trace continuation strategy accordingly.To identify traces missing a root transaction, run the following ES|QL query during a period when tail-based sampling is disabled. Use a short time range to limit the number of results:
  ```
  FROM "traces-apm-*"
  | STATS total_docs = COUNT(*), total_child_docs = COUNT(parent.id) BY trace.id, transaction.id
  | WHERE total_docs == total_child_docs
  | KEEP trace.id, transaction.id
  ```
</dropdown>

<dropdown title="Why is the configured tail sampling rate ignored and why are traces always sampled, causing unexpected load to Elasticsearch?">
  When the storage limit for tail-based sampling is reached, APM Server will log "configured limit reached" (or "configured storage limit reached" in version 8) as it cannot store new trace events for sampling. By default, traces bypass sampling and are always indexed (sampling rate becomes 100%). This can cause a sudden increase in indexing load, potentially overloading Elasticsearch, as it must process all incoming traces instead of only the sampled subset.To mitigate this risk, enable the [`discard_on_write_failure`](#sampling-tail-discard-on-write-failure-ref) setting. When set to `true`, APM Server discards traces that cannot be written due to storage or indexing failures, rather than indexing them all. This helps protect Elasticsearch from excessive load. Note that enabling this option can result in data loss and broken traces, so it should be used with caution and only when system stability is a priority.For more information, refer to the [Discard On Write Failure](#sampling-tail-discard-on-write-failure-ref) section.
</dropdown>