This Week in Elasticsearch and Apache Lucene - 2016-11-28
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
How fstrim impacts Elasticsearch peformance https://t.co/IymS1ELvur #elasticsearch #TRIM #performance
— elastic (@elastic) November 24, 2016
Elasticsearch Core
Changes in 2.x:- Install a security manager at startup (even if we override it later) so that the JVM takes the existence of the SM into place when making early decisions.
- Add a HostFailureListener to the transport client to notify client code if a node is disconnected.
- It should be possible to remove a node address from the transport client.
- Whitespace around commas in auto_create_index setting should be ignored.
- Don't blindly allocate arrays with StreamInput#readArraySize - check if the array size appears sane.
- Added BWC layer for exceptions - when speaking to nodes from before a new exception was added, use NotSerializableExceptionWrapper instead.
- OOM exceptions were not causing the node to exit when thrown in the networking layer or in the Lucene layer.
- If an ExecutionException occurred when calling Cache#computeIfAbsent, only the first caller was getting the exception. The others would simply get null.
- Allow the master to consider (with the lowest priority) assigning a primary to a node even though the node reports that the shard is still locked, instead of treating the latter as a hard failure.
- Unicast hosts are now resolved lazily, so that failure to resolve doesn't block Elasticsearch startup, and DNS lookups can be repeated later if the DNS settings have been changed.
- Painless gets a variable dumper/inspector with
Debug.explain()
- Term aggs can now be partitioned, allowing expensive aggs to be executed over several queries and later merged client side.
- The tribe node is now able to merge custom cluster state metadata
- Search tasks now provide descriptions of the search via the task management API.
- Now that alias filters are parsed on the coordinating node, the search-shards API should include index info and alias filters.
- Scripts can now access binary doc values.
- Removed onModule support for plugins.
- Python, Javascript, and Groovy have been removed from master.
- ClusterService method split up into smaller chunks for later refactoring.
- Add exception handling to the rank evaluation framework.
- Phrase and proximity queries on multi-word synonyms will be properly supported via the synonym-graph token filter.
- Better Painless exception when compiling a file script.
- In the indices_boost query should be be able to use aliases and wildcards.
- The slowlog should be single-line, not pretty printed.
- Dots in field names should only work for
object
fields, notnested
fields. - The unified highlighter solves many highlighting woes.
- Better garbage collection when passing very big terms lists.
Apache Lucene
- Index sorting is now more efficient when the index is already sorted, which happens if all documents have the same value for the sort field
- Dimensional points will soon require substantially less heap for the in-memory BKD tree index
- Tests uncovered various problems with recent changes to index sorting
- The
Terms.intersect
API is trappy - If
IndexWriter
hits a tragic event when too many merges are running it can lead to deadlock - Doc values queries should cache their hash code since they are likely to have many terms
- We likely will not switch highlighters to use
BytesRef
UnifiedHighlighter
should let you highlight text from other fields too- Lucene has overly complex specialized packed ints code, but simplifying it is controversial
- The 6.x doc values APIs are very trappy since they have secret thread-local state; this will be fixed in 7.0 with the doc values iterators APIs
TestJoinUtil
hit a strange, reproducible, as yet unexplained test failure- The changes-to-html generator should not rely on Jira being up
QueryBuilder
now allows subclasses to customize all methodsBooleanQuery.Builder
should not clone the added clauses- Lucene now provides six axiomatic similarities
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!