2016年11月21日

This Week in Elasticsearch and Apache Lucene - 2016-11-21

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Every shard deserves a home https://t.co/Iyns3sX7MQ via @elastic #Elasticsearch #DevOps
— Daniel Berman (@proudboffin) November 21, 2016

Elasticsearch Core

Changes in 2.x:

Remove cluster update tasks when task times out to prevent memory leak.
Added option to disable caching of term queries (disabled by default).
A date-range with a bucket script that left from or to unbounded could result in an NPE.
Update Joda time to v2.9.5 which fixes a bug with parsing ZZZ time zones.

Changes in 5.0:

Log messages should be truncated from the end because the beginning of the message contains the most interesting parts.
Non-dynamic settings should not be resettable with null.
The default global search timeout was not being respected.
The engine failed to report when it was being throttled.
Grok's trace_match behaviour didn't work with only one grok pattern.
Instructions for disabling deprecation logging were incorrect.
Emulate Java8 FilePermissions in Java 9, which has removed pathname canonicalization when constructing FilePermission objects.
Tribe nodes were prevented from using non-default ports by the security manager.
Log4J should be shut down when the JVM exits instead of when the node exits to avoid problems in the Tribe node.
The parent field should not be added to nested documents.

Changes in 5.1:

Deprecated queries against boolean fields from accepting values other than true, false, "true", or "false".
Uncommitted mapping updates should not affect existing indices.
IndexAlreadyExistsException has been replaced with ResourceAlreadyExistsException.
Painless gains the Elvis ?: operator.
Functions in Painless now have access to reserved variables like doc, which allows Kibana to use (most) scripts both for scripted fields and in the script query.
The update to Tika v1.14 allows the ingest mapper attachment to handle docs > 100kB.
The match_phrase_prefix query didn't work on boosted fields.
Term queries are no longer cached by default as they are fast already, and queries with many terms can result in other filters never being cached.
The split ingest processor gains the ignore_missing parameter.
Parsing of the level parameter in index and node stats is now strict.
Handle DST shifts that happen one hour after midnight.
Alias filters should be parsed on the coordinating node to allow caching of filters which use now() and to ensure that all shards see the same filter.
Upgrade to Lucene 6.3.0.

Changes in 6.0:

Removed netty3 in favour of netty4.
Removed store throttling - has been handled automatically by Lucene since 2.x.
Parsing of the metrics parameter in node stats is now strict.
Enabled 5.x -> 6.x bwc tests.
The sequence IDs branch has been merged into master to allow for easier development.

Ongoing:

The synonym graph token filter will provide correct handling of multi-word synonyms in phrase and contextual queries.
The cross_fields execution of the multi_match query doesn't handle synonyms correctly.
Allow term aggs to be partitioned so that more terms can be retrieved using multiple requests.
Lazy DNS resolution of unicast hosts would allow starting Elasticsearch before DNS entries are ready and relookup of changed entries, but has an impact on ping timeouts in Zen discovery.
Highlighting doesn't work on keyword fields which contain non-ascii characters.
Writing UTF8 to StreamOutput can be done more efficiently using a local buffer.
The tribe node should be able to merge custom cluster state metadata.
The master node should be able to retry assigning a primary shard to a node that has the shard store locked during shard state fetching.
Binary field values should be accessible in scripts.
Add a size ingest processor to replace the _size metafield.
Slow application of cluster state changes can hold on to many old cluster states resulting in OOM. The whole cluster state is not required, and can be replaced with revision numbers to indicate which states have been applied.
The new unified highlighter offers more flexible highlighting, but doesn't work with queries that need access to an index reader.
Remove the deprecated groovy, javascript, and python scripting languages.

Apache Lucene

QueryBuilder should allow subclasses to override createFieldQuery to allow for query parsers that properly handle multi-token synoynms, for example
SynonymFilter, which has long-standing tricky bugs with multi-token synonyms, may soon be replaced by SynonymGraphFilter offering a path to fix those bugs
Index time sorting now supports sorting on multi-valued fields using selectors
The simplistic in-heap BKD index can be optimized to require less heap
Codec-level encryption continues iterating, this time getting some technical documentation
Lucene may soon have an implementation of a logistic regression classifier
Subclasses of the primary node in NRT segment based replication should have access to the IndexWriter
ASCIIFoldingFilterFactory.getMultiTermComponent no longer emits the original token even when preserveOriginal is true
The classic query parser no longer allows autoGeneratePhraseQueries when splitOnWhitespace is false
The changes-to-html generator should not rely on Jira being up
A possible ICU 58.1 upgrade is on hold because of unexplained ICU assertion test failures
A write-once attribute analysis chain has some very nice properties but is a massive change
Can we make it easier to create graph tokenizers and token filters?
Nested SpanNearQueries miss some hits today, but the fix is surprisingly tricky and only addresses some cases
Lucene should provide axiomatic similarities, six in all
The release smoke tester now fails if CHANGES.txt seems to be from the future
UnifiedHighlighter's Passage is now public, for better extensibility; its passage relevancy has improved and it should let you set the max passage character length
AnalyzingInfixSuggester now closes its IndexWriter by default after build
We were missing backwards compatibility coverage for sorted indices
The document based suggester sometimes threw NullPointerException on level 2 ghost fields
End-of-line characters were causing false failures in our precommit tests
Doc values queries now use the new iterator API directly
Lucene's RollingBuffer utility class was missing a getter for its internal buffer size
The new BooleanSimilarity caused problems for tests that expect similarity implementations to be sane

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-11-21

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards