Tech Topics

How to apply index lifecycle management policies to APM data in Elasticsearch

In version 7.5, APM Server changed the default ILM policies and added support for custom policies. If you're using APM Server 7.5 or later, some of the settings described in this blog post are outdated. We suggest you refer to the ILM documentation instead.

Index lifecycle management (ILM) was made generally available in Elasticsearch in 6.7. It enables you to define how your indices should be managed as they age, ranging from hot indices which are actively updated and queried, to cold indices which are no longer updated and seldom queried. Index management actions can be defined on various factors, such as time, shard size, and performance requirements. APM Server built in support for ILM in 7.2, and applies ILM by default starting with 7.3.

In this blog post we will examine how ILM is applied to APM data out of the box, discuss how you can customize index lifecycle management according to your needs, and give a brief outlook about what to expect in the future.

ILM out of the box

Starting with 7.3, ILM is enabled by default for installations using the default index configuration, provided that an Elasticsearch cluster with ILM support is configured as output. For explicit configuration, the option apm-server.ilm.enabled can be set accordingly.

It is recommended to run ./apm-server setup before starting the APM Server to ensure everything is properly set up before data ingestion starts. Behind the scenes, APM Server creates everything necessary for ILM to be properly set up. More concretely, APM Server still creates one common APM template, defining all the necessary mappings, but it now also creates one template and one ILM policy per event type (transaction, span, metric, and error). The event type–specific templates take care of wiring the corresponding ILM policy and rollover alias to the event type–specific indices.

Take a look at how the two templates below differ. The first one shows the event type–specific template when ILM is enabled, connecting indices with an ILM policy and a rollover alias, while the second one reflects the setup when ILM is disabled:

Template for spans when ILM is enabled:

{ 
  "apm-7.3.1-span": { 
    "aliases": {}, 
    "index_patterns": [ 
      "apm-7.3.1-span*" 
    ], 
    "mappings": {}, 
    "order": 2, 
    "settings": { 
      "index": { 
        "lifecycle": { 
          "name": "apm-7.3.1-span", 
          "rollover_alias": "apm-7.3.1-span" 
        } 
      } 
    } 
  } 
}

Template for spans when ILM is disabled:

{ 
  "apm-7.3.1-span": { 
    "aliases": {}, 
    "index_patterns": [ 
      "apm-7.3.1-span*" 
    ], 
    "mappings": {}, 
    "order": 2, 
    "settings": {} 
  } 
}

When upgrading from an older version or switching between ILM and regular index management, it is therefore important to set setup.template.enabled and setup.template.overwrite to true during the setup process, allowing the templates to be updated.

Default ILM policies

Index lifecycle management will manage an index based on its defined policy. APM Server defines a longer ILM policy per transaction and metrics event types, and a shorter ILM policy for error and span event types. You can investigate the policies in more detail in our ILM default policies documentation.

Applying default policies, indices for transaction and metrics events can grow up to a maximum of 50 GB and contain 7 days of information. A prominent use case for these events is to show high level performance trends over time. After one month the indices are moved to the warm phase, where they are set to read-only.

The event type span holds detailed information about a specific code path as part of a transaction. The information is considered to be more valuable the more recent it is, and to lose value faster over time than the more aggregated view of a transaction. Similar considerations are made for error events. Once an error is fixed, the information about it loses importance. If an error still exists, we expect it to be repeatedly monitored when transactions and spans are being monitored. Indices for these event types also grow to a maximum size of 50 GB but contain information for a maximum of 1 day. Creating daily indices allows to delete the whole index after a short amount of time, without losing data for a large time frame. After 7 days the indices are moved to read-only in the warm phase.

Check out the ILM default policy for spans:

{
  "apm-8.0.0-span": {
    "policy": {
      "phases": {
        "hot": {
          "actions": {
            "rollover": {
              "max_age": "1d",
              "max_size": "50gb"
            },
            "set_priority": {
              "priority": 100
            }
          },
          "min_age": "0ms"
        },
        "warm": {
          "actions": {
            "readonly": {},
            "set_priority": {
              "priority": 50
            }
          },
          "min_age": "7d"
        }
      }
    },
    "version": 1
  }
}

To avoid unexpected surprises (especially in a minor version upgrade), current default policies do not delete indices.

The default policies group indices together in a sensible way, suitable for deleting indices as a whole manually. If you want, you can change the default policies to also contain a delete phase, which will completely get rid of old indices after a certain time (see example below).

Customize ILM policies

Where default policies do not cover a use case, more advanced users can also modify the policies, such as adding a delete phase to the policies. In the following example we walk through how you can add a delete phase to a default policy.

Add delete phase to ILM policy for spans via Kibana

First, ensure your APM Server configuration contains all settings necessary for ILM setup:

apm-server.ilm.enabled: true
setup.template.enabled: true
setup.template.overwrite: true
output.elasticsearch.enabled: true

Then, set up the default templates and policies by running ./apm-server setup --index-management.

Navigate to the Management/Elasticsearch/Index Lifecycle Policies section in Kibana and search for apm-server-[version]-span.

Managing index lifecycle policies from Kibana

Selecting the policy allows you to edit it, showing all available phases: hot, warm, cold, and delete. Activate the delete phase and choose when to delete the entire index. After saving the changes, the policy is updated and the policy changes will be applied when the existing indices enter the next phase.

Deleting an ILM phase

That's it! Now all new span indices will be deleted after the specified time.

Note that changes to a policy will only be applied once index lifecycle management transitions into the next phase. Changes will not be applied to indices that have already reached their last phase. By default, index lifecycle management checks every 10 minutes for indices that meet policy criteria. To control how often the check should occur, the cluster setting indices.lifecycle.poll_interval can be configured.

ILM support on Elastic Cloud

APM Server on Elastic Cloud also supports ILM. You can make use of the hot-warm template in combination with ILM and configure your policies to move APM data from hot nodes to warm nodes. Check out Implementing a hot-warm-cold architecture with index lifecycle management, which explains in more detail how you can leverage ILM to implement a hot-warm-cold architecture.

Wrap-up

APM Server leverages Elasticsearch's ILM support to automatically create and manage indices. Default policies currently only contain hot and warm phases, and ensure indices maintain a sensible size and age. More flexibility for supported policies is planned for the future, enabling to either choose from a predefined set of policies or configure your own policies, that get picked up when running the setup via the APM Server. Until then policies can always be manually created or edited and applied to indices, as shown in the example above.

Find a complete walk-through on how to create your own ILM policies, templates, aliases, and indices, and wire them together in our documentation.

Note: If you change the default policies, be aware that policies and templates are versioned and changes must be reapplied when upgrading to another version in the future.

We love to hear your feedback! Please let us know about your use cases and experience in our APM discuss forum.