This Week in Elasticsearch and Apache Lucene - 2017-04-10

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Batched reduction of search results

Elasticsearch has gained the ability to reduce the top N shard results incrementally. This is the next step towards reducing intermediate memory consumption during search execution that will allow us to relax current limitations related to the number of shards that are involved in the execution of a single search request. Ultimately this will allow search executions against large number of shards or in other words hitting the entire cluster with sustainable memory consumption on the coordinating node.

_field_stats is dead, long live _field_caps

After several PRs, the _field_stats API has been deprecated in favour of a more lightweight API that only exposes unique field names across indices/types, and whether they are searchable and/or aggregatable, called the _field_caps API. One important bit of the _field_stats API was the ability to return the min/max values of fields alongside filtering capabilities so that Kibana could figure out which time-based indices a query would match based on its time filter. This was important as including too many indices in a search request could cause memory issues since Elasticsearch would have to hold all shard responses in memory on the coordinating node. In order to work around this issue, we introduced incremental reduction of shard responses in order to keep memory usage on the coordinating node under control. As a consequence, Kibana should now be able to query all indices all the time, and Elasticsearch will make sure to execute things efficiently. A common misunderstanding is that this change will make queries more costly since they now have to go to all shards. However it was already the case with the _field_stats API. Now, queries that have a time filter that does not match any documents will return instantly, so we do not expect any performance regression.

Non-deterministic distributions in Rally

Rally gained support for modelling non-deterministic distributions of operations. With this change you can model Poisson processes with Rally out of the box. Poisson processes are often used to model arrival processes of independent actors (think coffee shops, telephone hotlines and most importantly for us: Web services) with a defined mean "arrival rate" (i.e. throughput). They play a central role in queuing theory.

Changes in 5.3:

Changes in 5.x:Changes in master:Coming soon:

Apache Lucene

Elasticsearch master is being upgraded to a snapshot of Lucene 7.0

With Lucene 7.0 coming soon, we want to make sure that we are not using Lucene in ways that won't be allowed anymore in Lucene 7. This will also give Lucene 7 more integration tests before it gets released. For the record, you can read about what Lucene 7 will bring at http://blog.mikemccandless.com/2017/03/apache-lucene-70-is-coming-soon.html.

Other changes:

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!