This Week in Elasticsearch and Apache Lucene - 2016-11-21
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Every shard deserves a home https://t.co/Iyns3sX7MQ via @elastic #Elasticsearch #DevOps
— Daniel Berman (@proudboffin) November 21, 2016
Changes in 2.x:
- Remove cluster update tasks when task times out to prevent memory leak.
- Added option to disable caching of term queries (disabled by default).
- A date-range with a bucket script that left
tounbounded could result in an NPE.
- Update Joda time to v2.9.5 which fixes a bug with parsing
- Log messages should be truncated from the end because the beginning of the message contains the most interesting parts.
- Non-dynamic settings should not be resettable with null.
- The default global search timeout was not being respected.
- The engine failed to report when it was being throttled.
trace_matchbehaviour didn't work with only one grok pattern.
- Instructions for disabling deprecation logging were incorrect.
- Emulate Java8 FilePermissions in Java 9, which has removed pathname canonicalization when constructing FilePermission objects.
- Tribe nodes were prevented from using non-default ports by the security manager.
- Log4J should be shut down when the JVM exits instead of when the node exits to avoid problems in the Tribe node.
- The parent field should not be added to nested documents.
Changes in 5.1:
- Deprecated queries against
booleanfields from accepting values other than
- Uncommitted mapping updates should not affect existing indices.
- IndexAlreadyExistsException has been replaced with ResourceAlreadyExistsException.
- Painless gains the Elvis
- Functions in Painless now have access to reserved variables like
doc, which allows Kibana to use (most) scripts both for scripted fields and in the script query.
- The update to Tika v1.14 allows the ingest mapper attachment to handle docs > 100kB.
match_phrase_prefixquery didn't work on boosted fields.
- Term queries are no longer cached by default as they are fast already, and queries with many terms can result in other filters never being cached.
splitingest processor gains the
- Parsing of the
levelparameter in index and node stats is now strict.
- Handle DST shifts that happen one hour after midnight.
- Alias filters should be parsed on the coordinating node to allow caching of filters which use now() and to ensure that all shards see the same filter.
- Upgrade to Lucene 6.3.0.
Changes in 6.0:
- Removed netty3 in favour of netty4.
- Removed store throttling - has been handled automatically by Lucene since 2.x.
- Parsing of the metrics parameter in node stats is now strict.
- Enabled 5.x -> 6.x bwc tests.
- The sequence IDs branch has been merged into master to allow for easier development.
- The synonym graph token filter will provide correct handling of multi-word synonyms in phrase and contextual queries.
cross_fieldsexecution of the
multi_matchquery doesn't handle synonyms correctly.
- Allow term aggs to be partitioned so that more terms can be retrieved using multiple requests.
- Lazy DNS resolution of unicast hosts would allow starting Elasticsearch before DNS entries are ready and relookup of changed entries, but has an impact on ping timeouts in Zen discovery.
- Highlighting doesn't work on keyword fields which contain non-ascii characters.
- Writing UTF8 to StreamOutput can be done more efficiently using a local buffer.
- The tribe node should be able to merge custom cluster state metadata.
- The master node should be able to retry assigning a primary shard to a node that has the shard store locked during shard state fetching.
- Binary field values should be accessible in scripts.
- Add a
sizeingest processor to replace the
- Slow application of cluster state changes can hold on to many old cluster states resulting in OOM. The whole cluster state is not required, and can be replaced with revision numbers to indicate which states have been applied.
- The new
unifiedhighlighter offers more flexible highlighting, but doesn't work with queries that need access to an index reader.
- Remove the deprecated
QueryBuildershould allow subclasses to override
createFieldQueryto allow for query parsers that properly handle multi-token synoynms, for example
SynonymFilter,which has long-standing tricky bugs with multi-token synonyms, may soon be replaced by
SynonymGraphFilteroffering a path to fix those bugs
- Index time sorting now supports sorting on multi-valued fields using selectors
- The simplistic in-heap BKD index can be optimized to require less heap
- Codec-level encryption continues iterating, this time getting some technical documentation
- Lucene may soon have an implementation of a logistic regression classifier
- Subclasses of the primary node in NRT segment based replication should have access to the
ASCIIFoldingFilterFactory.getMultiTermComponentno longer emits the original token even when
- The classic query parser no longer allows
- The changes-to-html generator should not rely on Jira being up
- A possible ICU 58.1 upgrade is on hold because of unexplained ICU assertion test failures
- A write-once attribute analysis chain has some very nice properties but is a massive change
- Can we make it easier to create graph tokenizers and token filters?
SpanNearQueriesmiss some hits today, but the fix is surprisingly tricky and only addresses some cases
- Lucene should provide axiomatic similarities, six in all
- The release smoke tester now fails if
CHANGES.txtseems to be from the future
Passageis now public, for better extensibility; its passage relevancy has improved and it should let you set the max passage character length
AnalyzingInfixSuggesternow closes its
IndexWriterby default after
- We were missing backwards compatibility coverage for sorted indices
- The document based suggester sometimes threw
NullPointerExceptionon level 2 ghost fields
- End-of-line characters were causing false failures in our precommit tests
- Doc values queries now use the new iterator API directly
RollingBufferutility class was missing a getter for its internal buffer size
- The new
BooleanSimilaritycaused problems for tests that expect similarity implementations to be sane
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!