22 May 2017

This Week in Elasticsearch and Apache Lucene - 2017-05-22

By Clinton GormleyAdrien Grand

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Java High Level REST Client

In an arduous and heroic campaign, three brave knights (Sir Luca, Sir Tanguy, and Sir Christoph) have scythed their way through ferocious hordes of aggregations and have finally destroyed the last one standing. Victory! This means that the new REST client (which will replace the transport client) is close to landing. More remains to be done: the branch still has to be merged to master and then backported to 5.x, plus a lot of tests remain to be written, but the battle is close to being won! The REST client covers only the essential APIs for now (ping, info, index, bulk, get, exists, delete, update, search) but this list will be expanded over time.

Routing around degraded nodes

The latency of a search request depends on the slowest/busiest node involved in the search. The search thread pool in master has been switched to a new type (fixed_auto_queue_size) which behaves like the old type by default. It can, however, be configured to have a target response time (and min and max queue sizes) in which case queue sizes will be automatically adjusted based on response times. Slower nodes will have shorter queues, and so will reject search requests earlier than other nodes. These rejected requests will be forwarded to other nodes. This is the first experimental step in the ability to work around degraded nodes. The next step is to figure out a way to set the target response time automatically.

Changes in 5.4:

Changes in 5.5: Changes in master: Coming soon:
  • Range fields can benefit from the same optimization as range queries on point fields, to decide whether to execute the query using the index or doc values.

Apache Lucene

Lucene 6.6

The release process for Lucene 6.6.0 has started. This might be the last 6.x minor release before 7.0.

Indexed geo bounding boxes

There is a patch in progress that adds support for indexing geo bounding boxes using points similarly to how range fields work. These bounding boxes can then be queried at search time and support the same relations as range fields: INTERSECTS, WITHIN, CONTAINS or CROSSES.

Greater accuracy of the length normalization factors

Up to Lucene 6.x, norms had to store a scoring factor that combined the index-time boost with length normalization. However Lucene 7.0 will not support index-time boosts anymore, which gave us an opportunity to store length-normalization factors more accurately. Actually we are no longer storing the normalization factor but the length directly, which we encode on a single byte while retaining 4 significant bits (and even more for small values). And at search time we keep a translation table between the encoded length, which only has 256 possible values, and the length normalization factor. This will give users who search short fields a much better experience since all lengths up to 40 now get encoded to a different byte, while the previous encoding would already quantize the length normalization factors for lengths of 3 and 4 to the same byte, which many users complained about over the years.

PostingsHighlighter removal

Over the last months, there have been lots of efforts to create a highlighter to rule them all, called the unified highlighter. In particular, there are no features that the postings highlighter has and the unified highlighter doesn't, so we are considering removing the postings highlighter.

Other changes:

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!