2015年08月28日 エンジニアリング

TTL Documents, Shield and Found

By Christian Strzadala

Elasticsearch allows for many types of document mappings. An interesting mapping is the _ttl or Time to Live mapping. 

This mapping allows us to set a expiry time for a document. Once the time has past, the expired documents are deleted. 

Note that in the current version of Elasticsearch,_ttl is deprecated. This post will look at the current _ttl mapping and how you can try to avoid using them.

Let's define the mapping

Let’s say we are creating an index structure for a store called crazy shop that has regular specials that end at various time periods.

Here is an example mapping with a _ttl for a regular document, in this case called special

POST /crazy-shop
{
  "mappings": {
    "special": {
      "properties": {
        "message": {
          "type": "string"
        }
      },
      "_ttl": {
        "enabled": true,
        "default": "10s"
      }
    }
  }
}

The _ttl is set to enabled and defaults to 10 seconds.

We can test this out by indexing the following document:

PUT /crazy-shop/special/1
{
    "message" : "An awesome new special"
}

We can get this document with the following request:

GET /crazy-shop/special/1

which will return:

{
   "_index": "crazy-shop",
   "_type": "special",
   "_id": "1",
   "_version": 1,
   "found": true,
   "_source": {
      "message": "An awesome new special"
   }
}

So after 10 seconds, this document should be deleted, right? Not exactly.

How does expiring a document actually work?

Elasticsearch has a background system process that will check for all _ttl expired documents and add them to a bulk delete request. This process by default runs every 60 seconds.

This means that setting the _ttl for a document to anything under 60 seconds won’t actually delete the document until 60 seconds has past. This interval can be adjusted though, through the setting indices.ttl.interval, so you can adjust this if needed.

Delete operations are really expensive

Deleting documents doesn’t remove them from the index, but rather marks them as deleted and filters them out at query time.

Large amounts of bulk deletions can result in large Lucene index segments and in turn a large amount of data to handle until segment merges occur and deleted documents are finally removed from the index.

Refer to https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up#index-segments for more information around Elasticsearch and Lucene indexes.

With this in mind, our advice is to try and avoid using TTL. The expense that TTL causes Elasticsearch can in most circumstances be avoided using time based indices.

The suggested re-implementation is to use a combination of two things:

  • Time-based indices that contain a expiration field for highly-granular query filtering and index aliases. Moving to time-based indices significantly reduces the amount of work that’s required to remove documents and reduces memory CPU and overhead associated with records laying around but marked deleted.
  • Filtered queries which are extremely fast in Elasticsearch, as filters are cached. As time-based indices are now being used, there will be significantly less Lucene segments, due to documents not being bulk deleted. This will result in increased query performance.

You may think your data model doesn’t allow for time-based indices or there’s some overhead around managing time-based indices. We’ll demonstrate a modified data model below and there are already great tools available for handling time-based indices.

Example time based indices

Continuing our crazy shop example, below is an example index template that allows for:

  • A wildcard to specify the date of the index, if we decide to have daily indexes.
  • Mapping of fields including the expiry field that is a date type. This will be used to filter documents.

An alias is also included to use a constant index name.

PUT /_template/crazy-shop-template
{
  "order": 0,
  "template": "crazy-shop-*",
  "mappings": {
    "special": {
      "properties": {
        "message": {
          "type": "string"
        },
        "expiry": {
          "type": "date",
          "format": "yyyy-MM-dd'T'HHmmssZ"
        }
      }
    }
  },
  "aliases": {
    "crazy-shop": {}
  }
}

We can add some documents to an index with a date in the index name which will create a new index with the document added to it:

PUT /crazy-shop-2015-08-24/special/1
{
    "message" : "An awesome new special",
    "expiry" : "2015-08-24T202000+1000"
}
PUT /crazy-shop-2015-08-24/special/2
{
    "message" : "Another awesome new special",
    "expiry" : "2015-08-24T203000+1000"
}

Next, we can query the index using a filter for the expiry field with a timestamp, which will filter out documents we don’t need anymore.

POST /crazy-shop/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "range": {
          "expiry": {
            "gte": "2015-08-24T202500+1000"
          }
        }
      }
    }
  }
}

We can delete the daily index manually with the following request:

DELETE /crazy-shop-2015-08-24

Or use tools like curator to automatically delete indices.

Again, deleting the entire index is much cheaper than having lots of deleted documents in an index.

When time based indices may not work

There may be times when the use of time based indices may not work for your data model.

You might require documents with a TTL to be in a parent / child relationship which requires all documents to be in the same index.

For example, hotel deals that have a parent / child relationship and the deals expire at various intervals.

Deleting the index in this case would remove not only the deals but the hotel documents as well.

In these situations, _ttl mapped documents are valid but it needs to be reminded that care should be taken of the cluster when large amounts of deleted documents occur.

Also remember that _ttl is deprecated and that future versions of Elasticsearch may have new implementations of _ttl. It would be recommended to avoid using _ttl and have your application handle TTL of documents if time-based indicies doesn't work for you.

How does Shield affect TTL documents?

In the current version of Elasticsearch, if Shield is installed, there is a limitation that will affect documents with _ttl.

The issue here is that Shield adds privileges to documents. When Elasticsearch’s background system action is executed to add documents with an expired _ttl, the bulk delete operation fails as it doesn’t have the appropriate privileges to delete these documents. The documents therefore are not deleted and stay available in the index.

Note: This does not affect the use of time based indices as Elasticsearch is not handling a background system level deletion of any documents or indices.