23 janvier 2017

This Week in Elasticsearch and Apache Lucene - 2017-01-23

Par Clinton GormleyMichael McCandless

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Elasticsearch Core

Adjacency matrix aggregation

We have a new adjacency_matrix aggregation, which allows to analyze co-occurence of filters, e.g. for the following list of terms, tell me how often two of these terms occur together. It has been built in order to improve the graph functionality so that users can better dive into how different nodes from a graph are connected. For instance, put under a date_histogram, it could help analyze how fraudulent bank accounts have exchanged money over time. It is however likely that users will find exciting use-cases for this aggregation outside of the context of graph.

Improved performance of numeric range queries

When you ask Lucene to filter a query with a numeric range its first step is to build up a bitset marking all documents accepted by your range filter by visiting the dimensional points (BKD tree). This can result in unexpectedly poor performance when the range accepts many documents yet the other parts of the query are restrictive. But with this change, quietly pushed this past week, for the future Lucene 6.5.0 release, Lucene is now smarter: it is able to check up front the expected cost of enumerating all hits for the range versus the expected cost of the other query clauses, and if the range is more costly, it will instead first use the other clauses to enumerate candidate hits and for each hit it will use doc values, instead of dimensional points, to check if it falls within the range filter. For queries that combine restrictive clauses with non-restrictive ranges this can be an enormous speedup. All that is required is you index your range fields using both doc values and points, and then use the IndexOrDocValuesQuery at search time, just graduated to Lucene's core module to express the range.

Changes in 5.2: Changes in 5.x: Changes in master: Coming up:

Apache Lucene

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!