10 July 2017

This Week in Elasticsearch and Apache Lucene - 2017-07-10

By Clinton GormleyAdrien Grand

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Defence against full disks

Elasticsearch had a first line of defence in place to prevent nodes from running out of disk space. Once a node has reached a certain disk-utilization the cluster tries to move shards away from nodes with little disk space to prevent catastrophic situations where nodes fail will no space left on their devices. However, moving shards around is not always possible and even if it is, indexing might fill up disks faster than shards can be relocated.

Coming with Elasticsearch 6.0, nodes will stop accepting write requests to indices that have one or more shards allocated on a node being tight on disk space. This should provide a safer failure mode for clusters than the current behavior, where resources may be exhausted before being detected. Users will need to manually set indices back to read/write once they have provisioned more disk space.

Sequence Numbers

The last 6.0 blockers for sequence numbers are closed! All Sequence Numbers related PRs slated for 6.0 are merged. We now have:

  • fast operation based recovery,
  • a custom translog retention policy to make fast recovery more likely,
  • cleanup of old transaction logs on idle indices, and
  • a primary/replica sync on primary promotions.

We also have the infrastructure we need to start developing the cross data centre replication (xDCR) X-Pack feature. We will continue to use the new infrastructure to tackle more complex correctness problems: the roll back of unneeded operations in replicas and usage of sequence numbers for optimistic locking.

Aggregation rewriting

Aggregations now have a rewrite phase which is similar to the query rewrite phase. This gives any aggregation the opportunity to rewrite itself into a simpler or more generic form, which increases the chance of the aggregation being cached. The first implementation of this rewrite is in the filter/filters aggregations where we now rewrite the filters meaning we can cache requests which use the filter/filters aggregation (as long as the underlying filter is cacheable).

Java High Level REST Client docs

The docs for the upcoming Java High Level REST Client are in the workings. You can get a preview here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.x . Plan is to add the missing search docs and a specific page with examples on how to migrate from Java API to REST client.

Cluster alerts in monitoring now cacheable

Cluster alerts (which use Watcher internally) were problematic for users with many clusters as they would run into the soft limit on the number of script compilations per minute. (Mustache templates, used by Watcher, are treated as scripts internally). Now, all the watches in cluster monitoring use the same searches, as we removed custom search inputs per cluster, so that it does not matter how many different clusters are being monitored. This means, there is no tweaking of the compilations per minute settings necessary. This was uncovered by an SDH ticket of a customer who monitored several clusters.

More compact _ids

We had a longstanding issue about storing ids in binary form in the index. Elasticsearch accepts arbitrary strings as document identifiers, and up to now we used UTF8 encoding when indexing/storing them in the index. However, it is very common to have strings that represent a base64-encoded byte[] as ids (eg. autogenerated ids) or a number (eg. auto-increment ids coming from an external database whose content is replicated to Elasticsearch). In those cases, the UTF8 representation is respectively 33% and 2.4x larger than the original binary representation, so we had room for improvement.

We just changed the internal representation of ids to try to detect when the id might be a base64-encoded byte[] or a stringified number, and use a more efficient representation in those cases. This encoding makes base64-encoded ids about 32% smaller and numeric ids about 2x smaller compared to today, very close to the size of the original binary representation of those ids.

Beware that these savings will not automatically translate to significant reductions of the size of the index, given that Lucene performs prefix-compression of those ids in the terms dictionary, and LZ4 (or DEFLATE if using index.codec: best_compression) compression of those ids in stored fields. However, more compact ids mean that Lucene has fewer bytes to compare when sorting values at flush time, when merging terms dictionaries at merge time, and when compressing data in general. This change will also allow to work on interesting follow-ups such as reordering bytes of autogenerated ids and dropping the sortability of ids (which we do not leverage) in favour of an order of bytes that makes prefix compression and indexing more efficient.

Changes in 5.x:

Changes in master:

Apache Lucene

Lucene 7.0 - feature freeze There has been a surge of changes to be made to 7.0, so the feature freeze was delayed until July 10th.

Better version compatibility checks

Now that Lucene stores the version that was used to first create the index, it will use this version in order to check that you are not attempting to read a too old index. It was impossible before since Lucene only recorded the version that was used to write segments and commit points, so it was possible to use merging to make Lucene think an index is recent when in fact it is not. As of Lucene 8.0, this leniency will be gone.

Other changes

  • The new concurrent delete/update improvements had a race condition.
  • The addition of hooks to run wildcard terms in phrases proves controversial since it would encourage usage of the dangerous SpanMultiTermQueryWrapper.
  • SpanMultiTermQueryWrapper is a dangerous query which might expand to an arbitrary number of terms. We are looking into adding protection against this.
  • Can we improve IndexOrDocValuesQuery to be more aware of the difference in cost between points and doc values?
  • We keep forgetting that we already have a query that matches documents that have a value for a given doc-value field, so we decided to rename it so that it better reflects what it does.
  • The fact that compiling scripts gets exponentially slower over time triggered a discussion about whether we should cache the result of compilations.
  • Factory methods of doc values queries have been renamed to make sure their name includes "slow".
  • Can we add a sort field that allows to sort child docs by value of their parent document?
  • Should we use a common interface when multiple analysis factories set the same option?
  • You can now group using the new ValuesSources API.
  • IndexWriter sometimes gets wrong about the total number of docs in the index.

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!