2016年7月26日

This Week in Elasticsearch and Apache Lucene - 2016-07-26

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Wrote a post about combining #elasticsearch RestClient with #jtwig templates for creating and executing queries: https://t.co/LO8fCj7lqk
— Jettro Coenradie (@jettroCoenradie) July 25, 2016

Elasticsearch Core

Changes in 2.x:

S3 repository now supports path style access for virtual hosting of buckets.
S3 and EC2 should allow for different AWS key pairs.
not_analyzed string fields should reject position_increment_gap.

Changes in master:

The new scaled_float field type allows eg percentages to benefit from compression techniques used for long fields.
The result cache can be explicitly enabled per-request even when it returns search hits.
Reindex throttling is now disabled with -1 instead of unlimited.
Automatically created indices should honour index.mapper.dynamic.
The analyze API now supports defining custom character filters, token filters, and tokenizers inline.
Indexing into a relocating primary while replicas are recovering will no longer result in document loss.
Elasticsearch should reject dynamic templates with unknown match_mapping_type.
The Java REST client now supports async and blocking requests, and a benchmark compares transport client to HTTP.
Blocking tasks should not be run on the Scheduler thread.
Resetting a recovery respects reference counting and locking, closes streams and removes all files.
The cardinality aggregation now has a fixed default precision which is easier for the user to understand.
The request circuit breaker now takes aggregation buckets into account.
Automatically generated node names now persist after the node restarts.
Scripts used in ingest pipelines preserve the original exception for easier debugging.
Triggering on_failure should halt any further ingest processing.
Analyzer aliases now work correctly, but will be removed for 5.0.
CORS default settings for allow-methods and allow-headers were not being used.
A Netty4 module is available, but depends on as yet unreleased bug fixes in Netty.
Nested queries can be used inside nested aggregations.
Static methods on Store class need to be shard lock aware to avoid race conditions.
All aggs have been moved to use NamedWritable instead of AggregationStreams.
Plugins registering queries should use the SearchPlugin interface.
Mappings introduced a _parent#null field when parent/child was not used.
TCP transports should map their internal exceptions to those defined by the TCPTransport class.
Index, update, and create REST requests return a LOCATION header.

Ongoing:

The search relevance framework gains a REST interface and support for reciprocal rank and discounted cumulative gain.
A command line tool will allow you to lose data in a corrupt transaction log while recovering data already in the index.
Histograms and date histograms are being split so that the former can bucket on decimal values.
Elasticsearch should be able to listen to virtual network interfaces.
The write_consistency setting will be replaced with wait_for_active_shards which better describes the intent.
The new completion suggester should return matching documents.
Similarities should be dynamically updatable.
Reindex from remote should support reading from clusters that require authentication.
Rally should decouple job scheduling from execution, which will allow support for multiple load generators.
Configuring network partitions in tests should be easier.
Shard copies should only be marked as stale after an acknowledged write.

Apache Lucene

CustomAnalyzer was accidentally switched to use the wrong default attribute factory, but Uwe fixed it
A new DoubleRangeField will index a multi-dimensional range, such as a day range in a calendar, using dimensional points which RangeFieldQuery can then search by overlapped range; IntRangeField, FloatRangeField and LongRangeField are coming next
Cached TermQuery no longer seeks the terms dictionary
Indexing performance tests on the 1.2 B documents New York City taxi rides corpus uncovered performance problems when writing dimensional points to disk using a large indexing buffer
MemoryIndexReader.fields became accidentally 5X slower recently
Dimensional points were failing to enforce maximum per-dimension byte count correctly
The flexible query parse will also be fixed to not pre-split on whitespace, letting the analyzer do that instead
DecimalDigitFilter has problems with digits that use Unicode's non-BMP supplemental characters
ant 1.9.6 somehow breaks Lucene's generated file formats javadocs link
Now that coord is gone, BooleanQuery can be optimized to flatten any nested disjunctions, often created by rewritten queries
Some small javadocs improvements were made to LeafFieldComparator
The obsoleted ScoringWrapperSpans is gone
SpanScorer had poor assert messages
Lucene's builds now run to completion even if some tests fail
JGit is upgraded to version 4.4.1 in Lucene's build scripts

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-07-26

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards