This Week in Elasticsearch and Apache Lucene - 2016-11-14
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
All you need to know about #Elasticsearch 5.0 - Part 1 - Search https://t.co/nPDQk124oR
— Itamar Syn-Hershko (@synhershko) November 3, 2016
Changes in 2.x:
- Reduced memory usage on client nodes by changing ShardActiveResponseHandler from holding on to the entire cluster state to just keeping the cluster state version.
- Binary fields indexed in 1.x indices were not readable in 2.x.
Changes in 5.0:
- The vm.map_max_count on systemd settings required a reboot to take effect - now we apply this setting during package install.
- On Debian, the start-stop daemon was being backgrounded, which caused important exceptions to be swallowed.
- ES_JVM_OPTIONS were being ignored on SysV systems.
- Under failure conditions, a thread's original context could be changed before it returned to the pool, which means the context would persist when the thread is used in the future.
- The response consumer in the Java REST client is stateful and shouldn't be reused, which we were doing when retrying failed requests.
- 2.x indices with TTL enabled were failing because the TTL query tries to access now.
- Temporary index-* generational blobs created during snapshotting should be cleared up, otherwise they prevent further snapshots.
- Joda time has been updated to v2.9.5
- GET _snapshot/_all was returning duplicate in-progress snapshots.
Changes in 5.x:
- Painless now supports
Dto denote decimal constants, and will suggest using a long constant if an integer is too big. Also, it now supports null safe dereferences like
- Every cluster state update causes data nodes to check if there are shards that can be deleted. A cache now makes this process much more efficient.
- Task cancellation should wait for all child tasks to receive the cancellation request before returning.
- The simple query string learns to run an all-fields query across all fields, instead of the _all field.
- Alias names now have the same validation as index names (except for the lowercase restriction).
Changes in 6.0:
- Index template can now specify multiple patterns (and the parameter has changed from
- index and delete operations should not mutate the version and version type of a request. Instead, this is the caller's responsibility.
- The cluster state is now a truly immutable class.
- The cat-thread-pool request should return unbounded queues as -1 instead of null.
- Rank evaluation now treats unranked docs as irrelevant, and provides an option to ignore them.
- Adding support for range fields and the corresponding queries (similar to geoshape queries).
- REST URL parsing is undergoing a cleanup with the intention of being able to return 405 method not allowed.
- Closing a shard when the store was not initialised could cause an NPE.
- Painless is getting the Elvis operator.
- Index*AlreadyExistsExceptions to be replaced by ResourceAlreadyException.
- Delete requests should not have a body like the clear-scroll API has today.
- Should Painless support binary values?
- Plugins should have an extension point for adding custom cluster state metadata.
- Non-master nodes should not change the cluster state.
- Aliases and wildcards should be accepted in the indices_boost query.
- A few resources missed the migration from
- Index time sorting should allow sorting on multi-valued fields as well
- FST packing is rarely used and adds a lot of complex code so we removed it
- Multiple pre/post tags and phrase queries do not work together in
- It is time to upgrade ICU from 56.1 to 58.1
FastVectorHighlighterfails to highlight phrase queries that have stop words that were removed at index time
- The default sorting behaviour for missing values is always confusing
changes-to-htmlant task annoyingly fails when Apache's JIRA instance is not reachable
- The classic
QueryParsershould not parse an explicit query string differently depending on the default operator
JapaneseTokenizerFactoryleaks file handles
- Our release smoke testing tool does not work under cygwin on Windows
- Attempting to index a too-massive single text field now causes an
IllegalArgumentExceptioninstead of otherwise very confusing int overflow exceptions
TermAutomatonQuery,which is a powerful query that generalizes other proximity queries (
MultiPhraseQuery), now rewrites to simpler queries when the word-graph is trivial
UnifiedHighlighternow has extension points for custom queries
SpanNotQuerylets you specify the allowed range over overlap between sub-queries
- Similarities now also explain exactly how they compute inverse document frequency
- The new
BooleanSimilaritybypasses all scoring and uses just the query's boost as the hit score
- Fully dense norms and doc values should not waste memory on a fully set bitset
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!