This Week in Elasticsearch and Apache Lucene - 2019-10-14

Elasticsearch

Docker

We completed work on Docker packaging tests, which help ensure our Docker image is working properly. We also tackled a longstanding issue to add _FILE environment variable support. With this new change, users of the Docker image will be able to pass the value of any environment variable as the content of a file which is especially important for sensitive configuration values.

In addition, we opened a couple of PRs on the Docker getting started/install experience:

Enrich

We merged a PR that adds a geo_match enrich policy type (#47243). This allows users to enrich incoming documents based on matching geo_shapes. For example, you could add postal codes to incoming documents based on a set of coordinates. We added the docs for this in (#47745).

The enrich execute policy API synchronizes data from a user-managed index into an internally managed index that is used as a data source for the enrich processor to add data to new documents being ingested. We added task support to this API so that it can now return a status or a task id. Currently it defaults to wait for completion. Going forward full support for wait_for_completion parameter will be added and support for cancellation of policy execution. (#47523). The reason for this change and the coming changes is that policy execution may take a while to complete as it does several heavy operations (reindex and force merge).  It easier to integrate with the enrich policy api if it does not wait for completion. The task api can be used to keep track of the status.

We changed the execute policy runner to retry the force merge operation if after the operation the enrich index has more than one segment. During policy execution an enrich index is always forced merged to a single segment to make searches from the enrich processor as efficient as possible. The force merge operation is retried up to a configurable number of times. (#47178)

Cross-Cluster Replication

We've extended CCR to allow pausing and resuming auto-follow patterns. This is useful for example during rolling upgrades of clusters that are using a bi-directional cross-cluster replication setup, where index following should be paused during the upgrade.

Sequence-number aware replica allocation

We've made the replica allocator sequence-number aware. Shard allocation will prefer allocating replicas on nodes where they can perform an operation-based recovery, all powered by peer recovery retention leases. This is a big step in simplifying rolling restarts as well as full-cluster restarts, reducing the time the cluster will take to go back to green as well as reducing the number of operational steps required to perform such a restart, aiming to remove the need for a synced flush as well as allowing quicker recoveries even if indexing is ongoing during the rolling restart.

Range Aggregation Optimizations

We're working on an optimization for the Range aggregator.  It's not uncommon for range aggs to be run as a top-level agg against an entire index, e.g. to collect counts for drop-downs in a UI or facets on a website.  This is very expensive though, since we scan all the documents to collect those ranges, even if only a small subset of docs actually land in one of the ranges.

The optimization uses the BKD tree in this scenario to collect the hits instead of normal doc value collection, meaning we can visit just the parts of the tree that intersect/overlap the range instead of all the documents. There are still some problems to fix (notably, it doesn't maintain monotonically increasing doc ID contract that Lucene expects in places), but so far the latency improvement is 70-90% in the "optimizable" scenarios.

We note that a similar optimization could be applied to date_histo/histograms, which are obviously much more popular than ranges, so we'll be investigating that optimization next.

Index Privilege Names

With the 7.5 release we added a new privilege that only allows writing new documents to an index and does not permit overwriting existing docs.

In order to name this privilege well, we're taking the opportunity to create a new set of index privilege names and deprecate the old ones.

The new privilege will be called "create_doc", we haven't worked out the exact set of privileges/names that we want for the other privileges, but they will have names that follow a similar pattern like "index_doc" , "delete_doc", etc.

The main obstacle is how to handle privileges around the update API. The Update API has the ability to read the current version of the document and process it in a script which makes it a write-api that has some read functionality in it. We want to ensure that it's possible to give users access to this API when they need it, but not include as part of the more commonly used privileges. As always one of the hardest parts is to get the names correct so that you can reasonably infer what access is granted from the privilege's name.

Lucene

Lucene 8.3

There was recent work on an improvement to FSTs that provides faster random access to sub nodes at the cost of higher memory usage. Even though this change provided good performance and little memory increase for text content, it might increase memory usage by up to 4x in the worst-case scenario, and happens to make memory usage of Elasticsearch's _id field (which is binary) about 50% higher. We had reverted the change from 8.2 when we released it and decided to look into improving that worst-case scenario. Unfortunately we haven't made progress as fast as we hoped and we are now discussing what to do for 8.3.

Prevent caching from hurting tail latencies

A community contributor who manages an Elasticsearch cluster noticed that caching filters sometimes hurts tail latencies. For instance, if you frequently combine a selective query with a filter that matches lots of documents such as "host.name:rare_host_name AND @timestamp:[now-48h TO now]" and query execution triggers the caching of the range query, it might make the overall query several times slower. The proposed solution consists of comparing the cost of the top-level query with the cost of the filter that is being considered for caching, and to skip caching if the cost of the filter is more than X times greater than the cost of the top-level query. This should help ensure that caching can't make queries more than X times slower.

Other

 - A nice blog post giving insights into Lucene support for concurrent query execution.

 - A committer opened an issue about K nearest neighbor search and uploaded a proof of concept that implements HNSW (hierarchical navigable small-world) graphs.

 - We identified concurrency issues in SetOnce and proposed a fix.

 - We improved KD trees so that they make better splitting decisions by not assuming that all dimensions are independent.

 - Some work iterating on making top-hits retrieval more efficient when an index is searched concurrently.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Refactor ESLogMessage to not define fields upfront #46702
  • BREAKING: Remove include_relocations setting #47717
  • Explicit name for doc snippets #46951
  • Remove types from BulkRequest #46983

Changes in 7.5:

  • [Java.time] Support partial parsing #46814
  • SQL: Implement DATEADD function #47747
  • SQL: make date/datetime and interval types compatible in conditional functions #47595
  • Ensure that we don't call listener twice when detecting a partial failures in _search #47694
  • Resolve more Gradle task validation warnings #47825
  • Geo: implement proper handling of out of bounds geo points #47734
  • Geo: Fixes indexing of linestrings that go around the globe #47471
  • Fix --debug-jvm Gradle Arg #47773
  • SQL: use calendar interval of 1y instead of fixed interval for grouping by YEAR and HISTOGRAMs #47558
  • Separate SLM stop/start/status API from ILM #47710
  • Fix cluster alert for watcher/monitoring IndexOutOfBoundsExcep… #45308
  • Introduce simple remote connection strategy #47480
  • Make loadShardSnapshot Exceptions Consistent #47728
  • Convert RunTask to use testclusers, remove ClusterFormationTasks #47572
  • Deprecate include_relocations setting #47443
  • Add 'create_doc' index privilege #45806
  • Add support to retrieve all API keys if user has privilege #47274

Changes in 7.4:

  • Fix dependency info tasks #47857
  • sync before trimUnreferencedReaders to improve index preformance #47790
  • Do not auto-follow closed indices #47721
  • Dangling indices strip aliases #47581
  • Throw error retrieving non-existent SLM policy #47679

Changes in 7.3:

  • Reformats reindex API #47483

Changes in 6.8:

  • GlobalBuildInfo plugin should search packed references for commit IDs #47464
  • Add a verifyVersions to the test FW #47192
  • Watcher - catch uncaught exception. #47680
  • Watcher - workaround for potential deadlock #47603
  • SQL: Allow whitespaces in escape patterns #47577

Changes in Rally

Changes in 1.4.0:

  • Don't attach telemetry devices for Docker #785
  • Attach telemetry device on Docker launch #784