This Week in Elasticsearch and Apache Lucene - 2016-11-14
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
All you need to know about #Elasticsearch 5.0 - Part 1 - Search https://t.co/nPDQk124oR
— Itamar Syn-Hershko (@synhershko) November 3, 2016
Elasticsearch Core
Changes in 2.x:
- Reduced memory usage on client nodes by changing ShardActiveResponseHandler from holding on to the entire cluster state to just keeping the cluster state version.
- Binary fields indexed in 1.x indices were not readable in 2.x.
Changes in 5.0:
- The vm.map_max_count on systemd settings required a reboot to take effect - now we apply this setting during package install.
- On Debian, the start-stop daemon was being backgrounded, which caused important exceptions to be swallowed.
- ES_JVM_OPTIONS were being ignored on SysV systems.
- Under failure conditions, a thread's original context could be changed before it returned to the pool, which means the context would persist when the thread is used in the future.
- The response consumer in the Java REST client is stateful and shouldn't be reused, which we were doing when retrying failed requests.
- 2.x indices with TTL enabled were failing because the TTL query tries to access now.
- Temporary index-* generational blobs created during snapshotting should be cleared up, otherwise they prevent further snapshots.
- Joda time has been updated to v2.9.5
- GET _snapshot/_all was returning duplicate in-progress snapshots.
Changes in 5.x:
- Painless now supports
d
orD
to denote decimal constants, and will suggest using a long constant if an integer is too big. Also, it now supports null safe dereferences likefoo?.bar()?.baz?
. - Every cluster state update causes data nodes to check if there are shards that can be deleted. A cache now makes this process much more efficient.
- Task cancellation should wait for all child tasks to receive the cancellation request before returning.
- The simple query string learns to run an all-fields query across all fields, instead of the _all field.
- Alias names now have the same validation as index names (except for the lowercase restriction).
- X-Pack:
- Automatons used for FieldPermissions should be minimised in order to be thread safe.
- The original ThreadContext should be restored after a preserved context is restored to prevent leaving previous users in the context.
Changes in 6.0:
- Index template can now specify multiple patterns (and the parameter has changed from
template
toindex_patterns
). - index and delete operations should not mutate the version and version type of a request. Instead, this is the caller's responsibility.
- The cluster state is now a truly immutable class.
- The cat-thread-pool request should return unbounded queues as -1 instead of null.
- X-Pack:
Ongoing changes:
- Rank evaluation now treats unranked docs as irrelevant, and provides an option to ignore them.
- Adding support for range fields and the corresponding queries (similar to geoshape queries).
- REST URL parsing is undergoing a cleanup with the intention of being able to return 405 method not allowed.
- Closing a shard when the store was not initialised could cause an NPE.
- Painless is getting the Elvis operator.
- Index*AlreadyExistsExceptions to be replaced by ResourceAlreadyException.
- Delete requests should not have a body like the clear-scroll API has today.
- Should Painless support binary values?
- Plugins should have an extension point for adding custom cluster state metadata.
- Non-master nodes should not change the cluster state.
- Aliases and wildcards should be accepted in the indices_boost query.
- X-Pack:
- Blocking calls should not be made on the cluster state update thread.
- Watch status should be updatable while watches are executing.
- Watcher is gaining a JIRA action.
Apache Lucene
- A few resources missed the migration from
people.apache.org
tohome.apache.org
- Index time sorting should allow sorting on multi-valued fields as well
- FST packing is rarely used and adds a lot of complex code so we removed it
- Multiple pre/post tags and phrase queries do not work together in
FastVectorHighlighter
- It is time to upgrade ICU from 56.1 to 58.1
FastVectorHighlighter
fails to highlight phrase queries that have stop words that were removed at index time- The default sorting behaviour for missing values is always confusing
- Our
changes-to-html
ant task annoyingly fails when Apache's JIRA instance is not reachable - The classic
QueryParser
should not parse an explicit query string differently depending on the default operator JapaneseTokenizerFactory
leaks file handles- Our release smoke testing tool does not work under cygwin on Windows
- Attempting to index a too-massive single text field now causes an
IllegalArgumentException
instead of otherwise very confusing int overflow exceptions TermAutomatonQuery,
which is a powerful query that generalizes other proximity queries (SpanQuery,
PhraseQuery,
MultiPhraseQuery
), now rewrites to simpler queries when the word-graph is trivialUnifiedHighlighter
now has extension points for custom queriesSpanNotQuery
lets you specify the allowed range over overlap between sub-queries- Similarities now also explain exactly how they compute inverse document frequency
- The new
BooleanSimilarity
bypasses all scoring and uses just the query's boost as the hit score - Fully dense norms and doc values should not waste memory on a fully set bitset
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!