2016年7月11日

This Week in Elasticsearch and Apache Lucene - 2016-07-11

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Just wrote a new blogpost about the new #elasticsearch java RestClient. The post shows how to use the client. https://t.co/5Sf6topxgS
— Jettro Coenradie (@jettroCoenradie) July 8, 2016

Elasticsearch Core

Changes in 2.x:

The query DSL in the percolator should respect the _id field in the same way as in search.
Stale routing data can lead to CRUD requests bouncing between nodes during primary relocation.
Netty3 has been upgraded to 3.10.6.Final.

Changes in master:

Elasticsearch can now reindex from a remote cluster (including clusters of a different major version).
REST responses include "Warning:" headers when a request has used deprecated functionality.
fields has been renamed to stored_fields and will only retrieve values from fields marked as stored. docvalue_fields should be used to retrieve doc values.
All nodes now have persistent IDs which survive restarts, making it easier to track the same node in monitoring tools.
The field-stats API was missing the datatype of each field.
The change to inline reroutes when a node joins the cluster has been reapplied to master with fixes.
Elasticsearch no longer catches Throwable, which can hide important exceptions like OOM and stack overflows. Instead, unrecoverable errors should cause a node to die with dignity.
Log messages for batched cluster state updates were difficult to interpret.
The _size field now has support for doc-values.
The percolator can extract terms from queries used inside function_score. Range queries which use now cannot be used with the percolator.
The Java REST client now provides shorter performRequest() methods when the body or query string is not request. Also, it defaults to using http for sniffing, and logs correct URLs even when the path is missing a preliminary slash.

Ongoing changes:

Sequence IDs should allow fast replica recovery from the translog instead of having to always copy segments.
Work continues on moving aggregations to use NamedWritable instead of their custom AggregationStreams.
Networking code is being refactored to make it possible to move Netty3 to a module, and to add a Netty4 module.
The eradication of Guice from the codebase continues.
CRUD exceptions on the primary need to be replicated for proper sequence ID accounting.
Similarities should be dynamically updatable.
The cardinality agg should use a static precision as it is less confusing to users.
The profile API will also count the number of method calls.
Scaled floats (backed by a long field) will make storing percentages more efficient.
Data loss can occur when indexing during primary relocation while there are ongoing replica recoveries.
Cockroach actions are persistent long-running tasks that can survive cluster restart.
The allocation explain API will also report disk usage.
Script exceptions should not be swallowed by the ingest API.
The query evaluation framework will gain a REST API and other evaluation metrics such as reciprocal rank and discounted cumulative gain.
The FVH should be able to extract terms from nested queries.
The java REST client will be benchmarked against the transport client.
The REST client needs an async API.
The analysis API is being expanded to accept custom tokenizers, token filters and character filters.
The create-index API should wait for sufficient shards to be assigned before returning.
The elasticsearch-tool command-line tool will allow truncating corrupt transaction logs to recover data that has already been indexed.
Node-left and node-failed messages should be batch processed.
Rally will provide a static benchmark page with annotations, as annotations are not yet supported in Kibana.

Apache Lucene

We are removing the archaic queryNorm and coord factors from Lucene's scoring, starting with BooleanQuery , and next with Weight.normalize , since modern scoring models handle term saturation better
Dimensional points get their first index format change, to better compress the on-disk docIDs, with additional improvements to values storage coming soon, giving a 9.7% disk size reduction on our OpenStreetMaps benchmark
The benchmarks module's TestQualityRun uses BM25 scoring
A user thread leads to a improved documentation about maximum length binary doc values fields
MemoryIndex.toString breaks if payloads are used
IndexWriter.getMaxCompletedSequenceNumber was incorrect after IndexWriter.deleteAll
The new Ukrainian lemmatizer's dictionary contains unique token + lemma pairs
At long last, Lucene's classic query parser no longer pre-splits on whitespace, leaving that decision to analyzer instead
Our ant build.xml files had redundant failonerror=true
ScandinavianFoldingFilterFactory and ScandinavianNormalizationFilterFactory implement MultiTermAwareComponent for better multi-term query support
Explanation implements equals and hashCode
Explanation.toHtml is gone
Use Collection.isEmpty instead of Collection.size() == 0
Geo3D throws IllegalArgumentException if the points for path segment intersections are ambiguous
DecimalDigitFilter has problems with digits that use Unicode's non-BMP supplemental characters
The new releasedJirasRegex.py helps automate some parts of the lengthy release process
DelegatingWeight delegates all methods to another Weight
Can we improve the default behavior of query parsers and multi-term queries?
Lucene's default MMapDirectory can lead to JVM crashes if the user incorrectly concurrently closes a still-in-use IndexReader
BooleanScorer has high overhead for tiny segments
How should FieldInfo and FieldInfos implement toString?
Adding a single method to Terms to return all terms as a string is too dangerous
Is it Levenstein or Levenshtein?
Something is invoking toString on a Field object unexpectedly
Should RAMDirectory be able to copy from any other Directory on init?

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-07-11

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards