Engineering

Get a consistent view of your data over time with the Elasticsearch point-in-time reader

TL;DR: We recommend that you use the new point-in-time functionality in Elasticsearch if you can. The scroll API is no longer recommended for deep pagination (even though it still works).

Most data is constantly changing. When querying an index in Elasticsearch, you are essentially searching for data at a given point of time. With an index that is constantly changing — as in most observability and security use cases — two identical queries performed at two different times will return different results because the data has changed over time. So, what do you do if you need to remove that time variable?

The point in time (PIT) reader, which was introduced in Elasticsearch 7.10, gives you the ability to repeatedly query an index as it was at a specific point in time.

At a high level, this sounds like the scroll API, which retrieves the next batch of results for a scrolling search — but there’s a subtle difference that makes it very clear why PIT is an important building block for future stateful queries.

Scroll API: A quick review

The scroll API works like this:

A regular search query gets executed with the scroll parameter attached Each search response contains a _scroll_id that should be used for the next request After scrolling through all the responses, you can delete that scroll_id to free resources

The data returned is basically frozen the moment you start the initial search request. Write operations that took place after that scroll was initiated will not be part of the search response. This applies for delete, index, and update operations. This way the whole data set is guaranteed to be consistent at a certain point in time.

What happens in the background that requires resources to be freed? Scroll search only takes data into account the moment the initial scroll search is created. This implies on a lower level that none of the resources that are required to return the data from the initial request are modified or deleted. Segments are kept around, even though the segment might already have been merged away and is not needed for the live data set. Keep in mind that other searches using another scroll id or no scroll at all are happening at the same time looking at different data compared to the initial scroll search. This results in keeping more data around than just the live data set. More data means more segments, more file handles, and more heap to keep metadata from segments in the heap.

Keeping segments around that are not needed for live data also means that you need more disk space to keep those segments alive, as those cannot be deleted until the scroll id is deleted. The way this works internally is by using reference counting. As long as there is a component (like a scroll search) holding a reference to the data (for example via an open file handle against an inode) there is no final deletion of that data, even though it is not part of the live dataset anymore. This is also the reason why the scroll id exists. By specifying it as part of the query, you are specifying which state you want to query.

In order to free resources as soon as possible, we recommend using the clear scroll API. There are also optimizations like sliced scroll available to parallelize data retrieval.

So what about PIT?

Now that we’ve covered the basics of a scroll search, let’s come back to the most important question: If we have all of this infrastructure in place already, why should we use PIT?

Currently, the scroll search and its context are bound to a query. That means that you write a query, add a scroll parameter, and the response data from this query will be consistent. However, this isn’t always what you need. Sometimes you want to run different queries against the same fixed data set in time. This is a major difference. One of the first users of PIT is EQL — the event query language used to query within time series data. Let’s take a look at this EQL query:

GET /auth-logs/_eql/search
{
  "query": """
  sequence by host.name,source.ip,user.name with maxspan=15s
    [ authentication where event.outcome == "failure" ]
    [ authentication where event.outcome == "failure" ]
    [ authentication where event.outcome == "failure" ]
    [ authentication where event.outcome == "success" ]
  """
}

This searches for three failed logins followed by a successful one within 15 seconds. A perfect use case for EQL, as more than a single query is needed.

The trick in this case is that the PIT reader is actually decoupled from a search request. The PIT structure gets created in a dedicated action and therefore is available across arbitrary search requests. You can do this using the PIT API. The result from such a request includes an id that now can be used for any search request that you are about to execute. Let’s take a look at what happens in the machine room when the PIT API is called. Basically this executes a shard operation, which calls SearchService.openReaderContext(). However, this is not called for all shards in an index, but only for those hitting a search request. Let’s look at an example — this requires at least two nodes in a cluster:

PUT test?wait_for_active_shards=all
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

POST test/_pit?keep_alive=1m

GET test/_stats?filter_path=**.open_contexts

The last call returns

{
  "_all" : {
    "primaries" : {
      "search" : {
        "open_contexts" : 2
      }
    },
    "total" : {
      "search" : {
        "open_contexts" : 5
      }
    }
  },
  "indices" : {
    "test" : {
      "primaries" : {
        "search" : {
          "open_contexts" : 2
        }
      },
      "total" : {
        "search" : {
          "open_contexts" : 5
        }
      }
    }
  }
}

As you can see here, we have as many contexts open as the number of primary shards, but like with a regular search, the contexts are spread among primary and replica shards.

You can see a PIT query in action with this example:

PUT test/_doc/1
{
  "name" : "Alex"
}

PUT test/_doc/2?refresh
{
  "name" : "David"
}

# note down the id and reuse below
POST test/_pit?keep_alive=1m

DELETE test/_doc/1

# this will return the David doc
GET /_search
{
  "size": 1, 
  "from": 0, 
  "query": {
    "match_all": {}
  },
  "pit": {
	    "id":  "ID_RETURNED_FROM_PIT_REQUEST", 
	    "keep_alive": "1m"
  },
  "sort": [
    {
      "name.keyword": {
        "order": "desc"
      }
    }
  ]
}

# this will return the Alex doc
# because the PIT reader is older than the delete
GET /_search
{
  "size": 1, 
  "query": {
    "match_all": {}
  },
  "pit": {
	    "id":  "ID_RETURNED_FROM_PIT_REQUEST", 
	    "keep_alive": "1m"
  },
  "sort": [
    {
      "name.keyword": {
        "order": "desc"
      }
    }
  ],
  "search_after" : ["David", 1]
}

The above snippet executes a deletion of a document after the PIT reader has been created. So whenever you run a search request with the PIT added, that deleted document will be part of the result set.

But wait, there’s more! After the PIT infrastructure had been added, there was another improvement in the 7.12 release of Elasticsearch. By taking a shard’s context id into account, Elasticsearch has a mechanism to retry a PIT query on another shard copy if the original one is not available anymore. However, this only works if both shards contain the exact same segments — which is only the case for searchable snapshots or read-only data.

Also, the Elasticsearch clients will feature helpers for PIT just as they had for searchable scrolls.

So, should you always use PIT all the time now? Well, the same rules as for a scroll search still apply. If you have a high search load on an ever-changing index, it’s probably not a good idea to create a new PIT query per request, as a fair number of resources will need to be kept open. However you can keep this at bay by having a background process that creates a single PIT id every few minutes and use that for all of your search requests. This way you would keep a consistent view of your data across all your requests, at the expense of not taking the latest data into account. There are further improvements planned, for example when using PIT readers in combination with slice queries.

Wrapping up

You now may understand why PIT is such an important building block, as running different queries against the same point-in-time dataset is important for the consistency of the data you are extracting — be it for an analytics job or implementing a query language like EQL.

If you have further questions about PIT, please contact us via our Discuss forums or the Elastic Community Slack. And of course, if you want to give PIT a try right now, spin up a cluster on Elastic Cloud.