Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Really excellent walk-through by @gregibrown on tuning Elasticsearch query relevance. https://t.co/CURlF0S8tY
— Zachary Tong (@ZacharyTong) March 17, 2017
Changes in 5.3:
- MapperService#parentTypes was rewrapped in an UnmodifiableSet on every cluster state update, eventually resulting in a StackOverflowError.
- Add a bootstrap check to prevent Elasticsearch starting in production mode running on an early-access version of Java 9.
- Use ParseFields for parsing aggregations' CommonFields instead of String.
- Fixed a bug when deleting a snapshot while another snapshot was in progress.
- Queries can now be validated on all shards instead of just choosing one at random.
- Centralise XContent rendering of terms aggs for better code sharing.
- Reindex now waits until scroll cleanup has finished before returning.
- Search
took
time should use a relative clock. - The
clear-cache
API should acceptrequest
instead ofrequest_cache
. - Improved Painless' regex parsing and error messages.
- The plugin CLI now emits useful exit codes on failure.
- Improved exception message when installing an incompatible plugin.
Apache Lucene
Lucene 6.5.0 is going out soon
The vote for the release candidate has passed. In particular, this release will bring faster range and geo queries to Elasticsearch, as well as improvements to the way query parsers deal with token graphs, see below. You can read more about the release highlights at https://wiki.apache.org/lucene-java/ReleaseNote65.
Reduce memory usage when parsing complex graphs of tokens
Some analysis components, like shingles with multiple sizes, can generate very dense graphs. This is an issue for query parsing, since we need to enumerate all paths in order to generate a query. Too complex graphs will now throw a TooManyClausesException rather than out-of-memory errors.
What's new in Lucene 7?
Mike McCandless wrote a blog post that describes the main changes that are coming in Lucene 7.0.
Other changes:
- Building software is hard. Just like Lucene and Elasticsearch, Java has sometimes to make difficult compromises between making progress and maintaining backward compatibility.
- Could we allow to upgrade segments in an index without requiring the IndexWriter to be reopen?
- The Findbugs static analysis tool finds warnings that maybe we should fix.
- The ascii folding token filter is slow due to a too large method. There is a simple hack that can fix it but we would like to apply a proper fix by storing the replacements in a data-structure rather than a giant switch/case block.
- Can we make scoring on spans more flexible by exposing the whole tree of matching spans to the score computation?
- CustomSeparatorBreakIterator should allow to split on strings rather than just single chars.
- GPU acceleration is getting interest thanks to the Google Summer Of Code program.
- Range fields now support ip addresses.
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!