2016年5月2日

This Week in Elasticsearch and Apache Lucene - 2016-05-02

作者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

We “unlocked” some indexing performance in #elasticsearch github.com/elastic/elasticsearch/pull/18060 Coming to 2.4.0 and 5.0.0!

— Jason Tedor (@jasontedor) April 30, 2016

Elasticsearch Core

Changes in 2.x:

Switching from a sliced lock to a keyed lock when preventing concurrent updates to the same document results in a 15-20% throughput rate on small metric-based documents.
The restore API now supports `_all`.

Changes in master:

Deleted indices leave tombstones in the cluster state to prevent them coming back to life when disconnected nodes rejoin.
The field stats API now accepts wildcards for field matching, and returns whether a field is searchable and/or aggregtable.
Completion suggester fields from 2.x will still be usable in 5.x.
The generic thread pool is now bound, with a max pool size of 128. The queue size remains unbounded.
The cluster allocation explain API now includes info from the shard stores API about why an existing shard copy may or may not be used for recovery.
The `set` ingest process gained the `override: true|false` parameter allowing it to set a default value only if none already exists.
Ingest gained a `date_index_name` processor which understands date-math patterns.
The top-level inner-hits query syntax has been removed, and bugs in the inline inner-hits syntax have been fixed.
A new MatchNoDocsQuery means that queries that do not match anything will now explain why.
ConstructingObjectParser is like ObjectParser, but supports constructing objects whose constructor arguments are mixed in with other arguments.
Queries on fields marked as index:false now fail.
The analyze API now accepts `filter` and `char_filter`, just like the analyzer settings.
The exists() check for settings now handles multi-value settings correctly.
Azure discovery now has integration tests.

Ongoing:

After adding IPv6 support, some work is required to add all features back to ip fields.
Indexing requests will soon be able to wait until their changes are visible.
Rally next steps: adding full text search benchmark, enable benchmarking with plugins.
HTTP compression will be enabled by default, as long as the client requests it.
Nodes will soon have persistent IDs which survive node restarts.
Snapshot restore is getting a google cloud repository.
Upgrade AWS SDK and add cloud.aws.s3.throttle_retries setting to avoid socket timeout exceptions when restoring large shards.

Apache Lucene

Lucene's geo benchmark, which tests a 61 M point subset from OpenStreetMaps for shape filtering, distance sorting, and nearest-neighbor search, is now running in Lucene's nightly benchmarks documenting the impressive gains (for Lucene 6.1.0) in the past few weeks
There's a sudden interest in optimizing how filters collect hits, since this is a hotspot in point queries, leading to reducing per-hit conditionals, using fewer passes in LSBRadixSorter, expl oring different growth factors for the array holding all hits so far, giving geo2D points queries their own optimized filter builder,MatchingPoints and avo id re-computing cardinality of a filter's bitset when possible
Query cache improvements: we should automatically warm new segments based on recently cached queries, reduce lock contention, remove a now unused parameter, re-use whole bitsets that the filter, such as PointRangeQuery, already built and MemoryIndex should never even consider caching queries
Should doc values optimize for the sparse use case?
Index-time sorting should be better supported in Lucene's core
Stats for dimensional points fields failed if one or more segments did not index points for the specified field
InetAddressPoint exposes nextU p/nextDown APIs to make it easier to work with exclusive bounds
InetAddressPoint. newPrefixQuery was broken if prefixLength was not an octet
Geo3d improvements:
- Optimize large polygon searches
- Polygon hole intersections were broken, and we now detect if a hole is (illegally) outside its supposedly containing polygon
- Real-world "dirty" polygons were causing problems
- Geo3D should also offer distance sorting
- Some fun test failures somehow involve Lagra nge multipliers
Geo2D (LatLonPoint and GeoPoin tField) improvements, including major gains for polygon searches:
- Separate doc values from points, so that users can separately choose which function (sorting and/or shape filtering) they need
- Instead of incrementing a counter for every hit, use the grow API
- Points queries get their own optimized filter builder, MatchingPoints
- Use faster orientation methods for polygon relations
- Use a balanced interval tree for faster polygon relations, enabling us to remove LatLonGrid
- Separate latitude/longitude quantization from Morton encoding
- Better random latitude/longitude generation continues to uncover issues
FunctionQuery.explain was not reporting its boost correctly
Our release scripts should use cherry-pick to merge downstream changes
Exotic patterns can cause PatternReplaceCharFilter to work too hard
A spooky span queries test failure remains unexplained
ToParentBlockJoinQuery's explain is lame today and should instead include the explanation from its children
The classic highlighter hits an exception if you use NGramAnalyzer and try to highlight a PhraseQuery
A bug in the latest JDK 9 early access 110 build that broke analyzers-common tests has been fixed
Java 9 and our randomized testing library gets angry about leftover static fields in tests
The XML query parser is conflicted about whether lower and upper bounds on ranges are optional
A spooky exception in MoreLikeThisQuery tests was just a test bug
A new LSH (locality sensitive hashing) TokenFilter and query is an alternative to the standard MoreLikeThisQuery
The flexible query parser does not clone its nodes correctly
Building massive boolean queries is slow
XMLQueryParser makes it hard for subclasses to create span queries
Paging is tricky if you use index-time sorting
Improve XMLQueryParser tests

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

可观测性

安全性

搜索

按行业

按解决方案

客户聚焦

开发人员

保持联系

学习

帮助

了解 Elastic 的最新动态

This Week in Elasticsearch and Apache Lucene - 2016-05-02

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

关注我们

关于我们

加入我们

新闻稿

合作伙伴

信任和安全性

投资者关系

卓越奖

关于我们

加入我们

新闻稿

合作伙伴

信任和安全性

投资者关系

卓越奖