2016年4月18日

This Week in Elasticsearch and Apache Lucene - 2016-04-18

著者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

#elasticsearch 5.0 will use the new Lucene 6 points API to index numeric, date and ip fields

— elastic (@elastic) April 15, 2016

Elasticsearch Core

Changes in 2.x:

Fixed a CORS bug in pre-flight requests.
Selecting concrete indices to restore could result in incorrect selection.
Fixed a bug which prevented reallocating shards unless the cluster was green.
Added a circuit breaker to limit the total size of in-flight requests.
Extended stats agg could result in incorrect results in presence of missing buckets.

Changes in master:

Upgrade to Lucene 6.0.0 and switch numeric, date and IP fields across to the new point format, including exposing point field stats in the segments API. Elasticsearch now supports IPv6!
JVM options now have their own config file.
The query cache can be disabled with a per-index setting.
Added ignore_unmapped option to parent-child, nested, and geo queries to allow multi-index queries to work on indices which don't have the appropriate mappings. This allowed deprecating the indices query.
The reindex API supports disabled _source gracefully.
Field stats now treats floats and doubles as the same field type.
Bootstrap checks are now triggered when the node is bound to any host other than localhost, and outputs all failures at once. Also added bootstrap checks for correctly configured heap size and the max map count for virtual memory.
Several refactorings of mappings to make the code cleaner and easier to follow.
Percolator queries now support position_increment_gap.
EC2 discovery is now tested.
Improved analysis of wildcards and stacked tokens in query strings.
Shard-level bulk action tasks now track their parent tasks correctly.
Tidy-ups to the IndicesService class in preparation for adding deleted index tombstones.

Ongoing:

Many PRs to remove PROTOTYPE and use aggregation registry.
Adding tombstones to the cluster state for deleted indices.
The allocation explain API will display shard store info when appropriate.
It should be possible to disable strict JSON quoting for bwc during 5.x.
The task manager should be able to persist the results of long running tasks like reindex.
EmptyQueryBuilder is being removed.
Java HTTP client support sniffing, now working on connection pooling.
The .percolator type will be replaced by a percolator field.
Indexed scripts will change to stored scripts and live in the cluster state, instead of in an index. Should there be a soft-limit on how many scripts can be stored?
The DSL for inner hits is being cleaned up, and the top-level inner hits DSL is being removed.
Aggregation names can overwrite other keys like doc_count. Is this a problem?

Apache Lucene

It is nearly impossible to come to agreement on how best to name Lucene's numerous new geospatial search implementations, thus demonstrating yet again the Law of Triviality
Here's a nice 3D visualization of how Lucene's new dimensional points slice up the surface of the earth for fast searching
Lucene's geo benchmark, which tests 61 M points exported from OpenStreetMaps, gets new features, including testing distance sort, filters and nearest neighbor performance, an option to pre-build queries, and reporting overall M-hits/sec in addition to QPS
Geo3D continues to move at a fast pace:
- Performance improvements when searching for large polygons
- "Wacko" random polygons cause tricky test failures
- Testing for shape intersections with a polygon is no longer N^2 in the number of polygon vertices
- Tests caught a garden variety attempt to create an illegal shape
- Finding an interior point was a big bottleneck for constructing geo3d polygon queries
- Geo3d tests have the most sophisticated BKD forensics of all our tests, so you can see precisely why a given doc did or did not match
- Convex polygons were being mis-classified as concave resulting in way too many hits in our geo benchmark
- Geo3D needs support for sort-by-distance and nearest neighbor as well
- Tiling the incoming polygon is a costly part of geo3d's polygon query
Geo2d does as well:
- A nice simple point-in-polygon algorithm from the 70s gives a speedup to LatLonPoint's polygon query
- LatLonPoint gets a fast nearest neighbor search, thanks to the efficient BKD tree, to find the N nearest indexed points to a provided query point
- Speed up LatLonPoint.newDistanceQuery by working with haversinSortKey instead of the full haversin distance
- A spooky test failure turned out to be an innocent test bug
- LatLonPoint's distance query has become so fast that we decided to rremove two-phased support, since its overhead is not worth its savings
- The grid we use to speed up LatLonPoint's polygon queries struggled with wee tiny polygons
- Our new, more evil random lat/lon generation uncovered a tricky test failure by creating a "rectangle" that was in fact a line!
- The EarthDebugger gets some improvements such as stating which location you want the earth to rotate to on load, control over the rectangle colors, and some performance improvements
- Better random latitude/longitude generation continues to uncover issues
- We have moved common geo encoding APIs to core so they can be shared across implementations
- A new encoding for GeoPointField will be consistent with LatLonPoint, and use all 64 available bits to minimize quantization error
The XML query parser is conflicted about whether lower and upper bounds on ranges are optional
XMLQueryParser's tests now let subclasses pick the analyzer
The legacy spatial module gets faster by switching from FixedBitSet to DocIdSetBuilder , matching how the three new geo implementations work
The DataSplitter in Lucene's classification module should pay attention to classes when splitting
Remove a wrong copy-paste comment in our replicator module
The details included in exception messages are important
NRTCachingDirectory had a sneaky concurrency bug
Our release scripts still struggle with the switch from Subversion to git
Randomized tests uncovered an extremely rare failure when two randomly generated doubles were exactly 2 ulps apart
Highlighting fails to find terms inside the child query of a BlockJoinQuery
Yes, Lucene 7.0 will in fact support all 6.x indices
Math.toRadians is changing its results slightly between Java 1.8 and 1.9, so we are trying to avoid relying on it
Moving ValueSource and FunctionValues to Lucene's core can break inter-module dependencies
ToParentBlockJoinQuery's explain is lame today and should instead include the explanation from its children
Query parsers are confusing when a clause has only stop words

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

オブザーバビリティ

セキュリティ

Search

業界別

ソリューション別

お客様事例

開発者

つながる

学習

ヘルプ

Elasticの最新情報

This Week in Elasticsearch and Apache Lucene - 2016-04-18

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

SNSリンク

会社概要

参加する

報道資料

パートナー

信頼とセキュリティ

投資家向け情報

Excellence Awards