2016年8月29日

This Week in Elasticsearch and Apache Lucene - 2016-08-29

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Had an awesome time @elastic #meetup presenting zero-downtime re-indexing of #elasticsearch @signalfx - slide deck: https://t.co/AiTdCAoFNW
— Mahdi Ben Hamida (@mahdouch) August 25, 2016

Elasticsearch Core

Changes in 2.x:

Added ref counting to SearchContext to avoid unexpected AlreadyClosedExceptions which could rarely lead to a SIGSEGV on mmap'ed directories.
Non-scoring term queries on the _all field were all considered equal when nested inside a bool, and when the _all field has different per-field boosts.

Changes in master:

Lucene upgraded to 6.2.0.
Jackson upgraded to 2.8.1.
Painless is the new default scripting language. Fixed a bug when using break in for loops.
Realtime GET is now handled by doing an automated refresh instead of reading from the translog.
Fsync'ing documents is now performed asynchronously, which delivers a 15-30% speedup on slow disks.
If disaster strikes, AlreadyClosedExceptions should not be suppressed.
RAM usage estimation of the LiveVersion map was way too high.
GET requests no longer support fields - stored_fields return stored fields only, while _source filtering reads from the source.
Shards should not be marked as stale just because a node has been shut down. They should only be marked as stale if there is a subsequent write.
Blank field names are no longer accepted.
The _version field should not be indexed.
The phase to fetch stored fields can be skipped entirely by setting stored_fields: _none_. Especially useful for returning completion suggester results.
keyword fields are now indexed and stored as binary values to avoid an extra UTF8 conversion.
Numbers do not need to be parsed as strings if they are not included in _all.
Agg profiling did not support breadth_first mode correctly.
ShardRouting now includes RecoverySource which characterises the type of recovery that should be performed.
Ingest pipelines should not be invalidated on every cluster state update.
cluster.routing.allocation.same_shard.host setting had not been migrated to the new settings infra.
The script ingest processor should support params, like all other scripts.
Mapping settings numeric_detection, date_detection, and dynamic_date_formats were not dynamically updatable.
Cluster stats now report whether netty3 or netty4 is being used.
The client-benchmark-noop-api-plugin makes Elasticsearch do nothing, removing noise from client benchmarks.
Avoid initialising the logger prematurely.
Async methods in the REST client now have Async appended to distinguish them from sync methods.
The index_boost query was not being cached because of indeterminate hash ordering.
The default cluster settings now accurately reflect which scripting engines are enabled.
Source filtering on docs with source disabled could trigger an NPE.

Ongoing changes:

Date range queries should be generated in a way that they can be cached efficiently.
It should be possible to update string mappings on 2.x indices using 5.x syntax.
Adding action to update-aliases to make deleting an index and adding an index alias an atomic step.
Log4j2 PR will be landing soon.
Ingest processors should support ignore_missing as well as ignore_failure.
RankEval requests gain XContent support with roundtrip testing.
Should the query cache keep track of a longer history?
Ingest will gain a JSON processor.
How should dots in field names be supported by ingest?
Should feature usage stats have its own API or be included in node stats?
Geo-points in 5.0 will be backed by LatLon fields, which are much faster.
Macrobenchmarks in Rally are being integrated with CI. Next up, Cloud benchmarking.

Apache Lucene

Lucene 6.2.0 is released
Some dead yet scary code deep inside IndexWriter is now gone
BooleanQuery now optimizes better when a sub-query occurs more than once
The release script helper that polls mirrors was rewritten from Perl to a better programming language, also starting with P, that has batteries included
A few more fun geo3D corner case failures are fixed
More dead code is gone
Test should use CannedTokenStream instead of their own classes
A rare test bug caused a failure because Lucene now refuses to overwrite a file
InetAddressPoint's javadocs for newSetQuery were confusing
Legacy numeric fields continue to disappear
Sun's JDK bugs became Oracle's
The security test policy needs to allow reading of the line docs file many tests use as a realistic documents source
ToParentBlockJoinCollector should be removed
A new regular expression engine using Memory Occurence Automata may lead to better regular expression queries, but we struggle to understand in which cases

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-08-29

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards