2016年5月23日

This Week in Elasticsearch and Apache Lucene - 2016-05-23

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Unwittingly using deprecated features in #elasticsearch? Maybe not for much longer

— Chris Earle (@pickypg) May 21, 2016

Elasticsearch Core

Changes in 2.x:

New child types can now be added to existing parent types.
Elasticsearch no longer returns the decoded path with an error.

Changes in master:

The profiler has been refactored to extend profiling beyond just queries.
Improvements to the Painless scripting language keep flowing.
Translog checkpointing and fsyncing has been moved outside the index writer's global lock, allowing for more concurrency.
The ingest node gained a Sort processor.
Added Google Cloud Storage snapshot/restore repository plugin.
The reindex API learned to back off and retry when it encounters search failures, and the default batch size has been increased to 1000.
Delete-by-query is back in core, reimplemented using the reindex infrastructure.
File system I/O statistics are available again on Linux.
Added logging for small but frequent garbage collections, which indicate memory pressure.
Command line settings no longer use the `es.` prefix.
The percolator cache has been removed, saving a large amount of heap. Performance is still better than before thanks to query indexing. Added support for MatchNoDocs query in the percolator.
Terms and significant terms aggs now support string include/exclude for IP addresses and dates.
Registered missing `indices.query.bool.max_clause_count` setting.
Fuzzy, regexp, prefix, and wildcard queries are only supported on text/keyword fields.
Elasticsearch heap size defaults to 256MB min and 2GB max.
Failing shard allocations now give up after 5 attempts, instead of looping forever.
The Debian package no longer tries to create the data dir, as this is already handled by Elasticsearch.
Filter the -server flag from the JVM options file when installing a service on Windows.
Compilation on Java 9 works again.
Fixed time unit rounding for hour, minutes, seconds to cope with DST.
Named queries are now significantly faster than before when used with expensive queries.

Ongoing:

Persistent node IDs will be replaced by node names, which will have to be unique across the cluster.
Creating an index shouldn't turn the cluster red.
Task management should persist the status of long running tasks after they have finished.
Added dedicated masters to the tests - once stable, will add replication tests that use IndexShards without relying on nodes.
Highlighting doesn't play well with GeoPointInBBoxQuery.
Version lookups during indexing are expensive. Is there a low-risk optimisation that could skip the ID lookup if Lucene does the check instead of Elasticsearch?
Use of deprecated features should inform the clients so they can warn.
Adding profile support to aggregations.
Scroll requests can be split for parallel processing.
Multi-shard indices can be merged into a single shard.
Rally will be getting a logging data set, probably from the World Cup, and Elasticsearch will get infrastructure for microbenchmarking.
The Azure repository isn't removing deleted files.
Refactoring of IndicesClusterStateService continues, to make cluster state updates more testable.
Decreasing the delayed allocation timeout can lead to longer delays. Refactoring delayed shard allocation logic to make it simpler to test and maintain.
The cluster allocation explain API will report when allocation is still waiting for the shard state.
Snapshot UUIDs will enable more robust handling of snapshot deletion and partially failed deletes.
The ingest compound processor should be internal only and should report the wrapped processor in error messages.
epoch_millis should support the full range of Longs.
We should use Java's Base64 library instead of our own version.

Apache Lucene

A 6.0.1 bugfix release is coming soon
Highlighter and geo point queries do not mix
A test bug in MoreLikeThisTest is tricky to fix
An extremely thin slice down from the north pole confuses Geo3d
Geo3d gets a doc-values field to enable sort-by-distance, using the unique "distance from a shape's boundary"
Geo3d now uses Double.POSITIVE_INFINITY instead of Double.MAX_VALUE when a point is outside of the shape
DateRangePrefixTree lets you control the calendar template
Every time we prepare for a release (this time 6.0.1) we find and fix fun bugs in our release scripts
The equals and hashCode methods will become abstract in the Query base class
Our build scripts should use -release instead of -source and -target to ensure full Java 8 binary compatibility even when compiling with Java 9
Lucene will soon support half-float points, using 2 bytes to represent a floating point number
ToParentBlockJoinQuery's explain is lame today and should instead include the explanation from its children
Codec-level encryption continues iterating
Our randomizes tests found a failure in heatmap facets
We see a nice performance gain by poaching Solr's ExpandingIntArray for collecting hits
A new lemmatizer appears for Ukrainian
QueryParser should let you sometimes use unescaped internal operator characters
Lucene's highlighter is confused when you try to highlight SynonymQuery
Our Brazilian analyzer has a bug in its stop words file
SlowCompositeReaderWrapper should move out of Lucene's sources

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-05-23

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards