This Week in Elasticsearch and Apache Lucene - 2020-01-31

Elasticsearch

Updating Policies in ILM

One challenge ILM users face is when adding a phase to be executed after an index had already executed its policy. For example, adding a delete phase to a policy where indices have already reached the terminal (“completed”) phase did not actually delete the indices.

To solve this frustration, we have introduced a PR which changes the behavior of ILM to only execute as far as the last configured phase. This means that any subsequently added phase will be executed when the policy is updated, rather than requiring manual re-adjustment of the index’s lifecycle execution state.

Keystore

Up until now, the keystore has only been protected through obfuscation. It was always our intent to offer real password protection, but grabbing this password at startup from various different mechanisms like the console, systemd or environment variables in docker is tricky. We are happy to announce that password protected keystore is now here!

We spent the last many months working out not only the scripting code necessary to make all those cases work, but also adding tons of tests to ensure each and every case is covered well for the initial release of this new feature.

Switch to disallow slow queries

We have started to work on a setting to disallow slow queries. This work helped to identify what should be considered as a slow query in Elasticsearch:

  • Queries that need to do linear scans to identify matches (script queries).
  • Queries that have a high up-front cost (fuzzy queries, prefix queries without index_prefixes, wildcard queries, range queries on keyword fields).
  • Queries that may have a high per-document cost (percolate queries).

We are also working on a change to more aggressively cancel queries that need to spend a lot of time in the terms dictionary (multi-terms query) and the BKD-tree (range and geo queries).

Currently we need to wait until the end of these costly operations before exiting a cancelled query, this delay can have an impact on the cluster for very costly queries such as leading wildcards (we need to scan the entire dictionary).

Snapshot resiliency

Recent improvements in snapshot resiliency has allowed us to remove the restriction that you cannot restore a snapshot and delete another snapshot at the same time. The next step is to allow deleting multiple snapshots at once as well as allow parallel deletes.

We received recent reports about excessive disk usage of source-only snapshots. The problem is that they create and keep a lot of extra data on disk when creating a snapshot. Removing that data from disk destroys incrementality of source-only snapshots though, rendering the feature broken to some degree. We're looking into possible fixes.

Lucene

Handling of tragic event in IndexWriter

A patch has been proposed to reduce the severity if an internal thread tries to access an already closed index writer but that could hide other bugs in the future so this brought up interesting discussions regarding the origin of the bug.

Remove/Deprecate SpanQuery

The SpanBoostQuery has been removed in the master branch. This query was ignored in nested span queries so it can be replaced by a simple BoostQuery.

Terms Dictionary Compression

This work has been merged to the master and 8x branch.

On the Wikipedia dataset that Lucene uses for benchmarks, this resulted in a 23% reduction of the size of the* .tim* file, and a 2% reduction of the size of the entire index. These numbers will be different depending on features that are used and data that lives in the index.

Here's the highlights of the change:

The Lucene terms dictionary stores terms into blocks, according to a shared prefix, such that each block has enough terms, and then stores the prefix trie in an FST. For each suffix in a block, it needs to store a variety of information:

  • length of the suffix,
  • suffix bytes
  • document frequency of the term,
  • total term frequency of the term (only if term frequencies are indexed, i.e. not for keywords or IDs)
  • offset where postings can be read in the ".doc" file, usually encoded absolutely for the first term of the block, and as a delta with the previous document for other terms
  • offset where positions can be read in the ".pos" file (only if positions are indexed)
  • offset where offsets and payloads can be read in the ".pay" file (only if offsets or payloads are indexed)
  • the only matching doc ID when docFreq=1, this optimization helps avoid having to seek into another file just to read one document e.g. for ID fields.

Until now all these numbers were stored explicitly using variable-length integers.

Some significant savings have been achieved by making the following observations:

  • Suffix bytes are often compressible on text fields
  • ID fields would typically have blocks where all document frequencies are equal to 1, so we don't need to record it explicitly for every term.
  • For ngrams and some schemes that are commonly used for ID fields like auto-increment IDs, Flake IDs or UUIDs, most blocks would have suffix lengths that are the same for all terms, so we don't need to record the length for every term either.
  • When docFreq=1 and the doc ID is "pulsed" into the terms dictionary, no data is written to the ".doc" file, so the offset in the ".doc" file of the next term will be the same.
  • If the ID scheme is monotonically increasing, like for auto-increment of Flake IDs, then consecutively pulsed doc IDs will be close to each other, so we could encode deltas instead of absolute doc IDs.

Changes

Breaking Changes in Elasticsearch

Breaking Changes in 8.0:

  • Make order setting mandatory for Realm config #51195

Breaking Changes in 7.7:

  • Remove DEBUG-level default logging from actions #51459