2015年11月30日

This Week in Elasticsearch and Apache Lucene - 2015-11-30

作者

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

From the Found vault: Understanding the Memory Pressure Indicator https://t.co/rQiXobeeDr #Elasticsearch
— elastic (@elastic) November 27, 2015

Elasticsearch Core

Changes in 2.1:

Users were unable to upgrade indices using field datatypes that have been moved to plugins (_size, murmur3, attachment) as the mapping upgrade checks happened before the plugins registered their datatypes.
All shards logged a nasty (but innocent) warning about a missing translog file when starting 2.1.0.
Similarly, writing to already closed translogs was causing confusing log messages which should have been handled more gracefully.
We were not logging the root cause when failing to change a shard's indexing buffer.
When using field constraints with the field stats API, the response should exclude indices that do not contain the field
Added sanity checks that the Lucene version that an Elasticsearch Version object declares is consistent with the indices that we use to test backward compatibility.

Changes in 2.x:

The mapper attachments plugin will be available again in Elasticsearch 2.2.
The field stats API should return both the string and numeric values of date fields for consistent sorting.
Stats for the completion suggester, which often appeared in hot threads output, is now more efficient.
The delayed allocation time for unassigned shards should be updated on every reroute calculation.
Locking down of JVM permissions continues, now preventing the JVM from spawning new processes on BSD/OSX and on Windows.
Most of the BWC tests have been reenabled, bar one for the FunctionScore query which still depends on Groovy.
Upgraded Lucene 5.4 to a snapshot which includes a fix for the off-by-one issue with sparse doc value fields.

Changes in master:

All query parsers now use ParseField, making it easier to support future deprecations.
Allocation IDs are now persisted in the shard state metadata, and the current allocation IDs will soon be persisted in the index metadata. This will make it possible to choose the most recent shard copy when recovering an index.
Full exception objects are now logged in many places instead of just the output of getMessage.
More tests have been refactored to not rely on Groovy.
The following aggregations have been refactored: geobounds, scripted metric, cardinality, filter, missing, nested and reverse_nested, children, cumulative sum, and geo-centroid, leaving 19 still to do.
Most GeoShape builders are now Writable, with an open PR for the remaining builders.
CIDR expressions were parsed too leniently.
Many many PRs to improve the Gradle integration.

Ongoing:

Work has started on the reindex and task manager APIs.
Query profiler being updated after review.
Splitting the `string` field type into `text` and `keyword` fields.
Adding a mechanism for batching up cluster state updates, and using that in the tribe node, and when processing shard started and shard failed events.
Make the BulkProcessor back off and retry when the bulk queue is full.
Add detail response support for _analyze API
The `fields` option should only load stored fields, not from _source. This also allows it to support wildcards.
Node ingest:

Individual items in bulk requests can now fail without failing the whole bulk request.
The mutate processor has been split out into a processor per function.
Added a meta processor for meta fields like _index. Turns out meta fields are reserved anyway, and so can be handled by standard processors instead.
Ingest should be able to access and modify list items.
Much discussion on how to handle processors that fail.

Apache Lucene

The 5.4.0 release branch is cut!
Don't use null to represent sorting by relevance!
GeoPointDistanceQuery matches the wrong documents when it has to cross the dateline
FacetsConfig.getDimConfig need not be synchronized since it's using a ConcurrentHashMap under-the-hood
1D dimensional values is now faster, for indexing and searching, and smaller, for index size on disk and heap used at search time, than numeric fields
If you search on a massive shape, such that it manages to wrap around and span the entire earth, we should rewrite that to just match all documents with the field
The Python script to workaround slow Apache git mirror keeps getting better
Sparsely populated doc values on a large segment tickled an off-by-one bug, discovered by Elasticsearch nightly benchmarks
PhraseQuery does, in fact, allow more than one term at the same position, but it's interpreted differently (conjunction) than MultiPhraseQuery (disjunction)
We can once again sync directory metadata on Java 9, but the OpenJDK issue is still open
IOUtils.fsync should not retry on hitting IOException since that means a great disturbance has occurred
Factor out components of the XML query parser so consumers can inherit from classes without requiring sandbox module code
Optimize stored fields retrieval avoid skipping the last field in the document if it's not needed, but this will only help if the last field is more than 16 KB at default settings
A Coverity scan of Lucene found some minor things to fix
JapaneseTokenizer should offer more then two possible tokenizations
Equals methods are tricky
Should we remove Scorer.getChildren ?
Allow XMLQueryParser's TestParser to be extended so Solr test cases can subclass
The new sandbox geo APIs struggle with large errors when the shapes are very large
JoinUtil should support joins on numeric doc values fields

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

可观测性

安全性

搜索

按行业

按解决方案

客户聚焦

开发人员

保持联系

学习

帮助

了解 Elastic 的最新动态

This Week in Elasticsearch and Apache Lucene - 2015-11-30

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

关注我们

关于我们

加入我们

新闻稿

合作伙伴

信任和安全性

投资者关系

卓越奖

关于我们

加入我们

新闻稿

合作伙伴

信任和安全性

投资者关系

卓越奖