This Week in Elasticsearch and Apache Lucene - 2016-08-22
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
“Less Code, More Nodes, More Features“
Application Scaling with Elasticsearch @ StockTwits | Elastic - https://t.co/MZCrn4OMtF
— Kraut Klíck (@QIMP3G) August 12, 2016
Elasticsearch CoreChanges in 2.x:
- It should be possible to update the
include_in_allsetting on existing object fields.
geohashoptions on geo-point fields are deprecated, as is the
optimize_bboxparameter to the geo-point distance query.
- Jackson has been upgraded to v2.8.1.
- Failing to allocate a primary shard 5 times should prevent further automated allocation attempts.
- The default min and max heap sizes are now set to 2GB, which means we can remove this from the bootstrap checks.
minimum_master_nodessetting has also been removed from bootstrap checks as it only checked that it had been set, not that it had been set correctly.
- Bootstrap check exceptions no longer print stack traces, which were just obscuring the message of the exception.
- Index names may no longer start with
-as these special characters are used in index wildcard matching.
- Index creation requests must use
POST, and a type-exists request has changed from
- Reindex should work with the transport client.
- The snapshot-status API now supports
- String fields with
position_increment_gapwere not being upgraded to
- Plugins should be able to upgrade custom cluster state metadata on startup.
- The routing changes API makes it easier for a node to determine which shard allocation changes have taken place.
- LockObtainFailedException has been renamed to ShardLockObtainFailedException because it is an in-memory lock that has nothing to do with IO.
- Painless will be the new default script language in 5.0
- A big codebase cleanup is under way to reduce the number of packages that we have, and to remove the dependency on Guice.
- SearchContext should use ref counting to prevent accessing an already closed index.
- Response filtering will support exclusions like
- Shards should only be marked as stale when there is a non-replicated write, not when the node shuts down.
- The ingest node should be able to handle dots in field names.
- A post-search hook will allow logging search requests once per request instead of once per shard.
- Should only
keywordfields be included in the
_allfield by default?
_none_would skip the stored-fields phase entirely, meaning meta-fields like
_sourceetc would not be returned.
- The release process for Lucene 6.2.0 will begin shortly
- The surprisingly massive indexing performance drop (annotation
AU), unexpectedly caused by an otherwise great change, was due to a pre-existing performance bug in Lucene only uncovered after much hunting
- Lucene's legacy (postings based) numeric implementation has moved to the
backwards-codecsmodule and will soon be removed entirely for 7.0
- A new Lucene test case tests that you can simultaneously close
SearcherManagerwhile it's also refreshing, and open a new
IndexWriteris closing, while also searching hopefully without risking SIGSEGV
- Lucene now tries harder in its best effort check to detect when
MMapDirectoryis being used after being closed since that can cause a
SIGSEGVwhich terminates the JVM, but its stressful test case will still provoke
SIGSEGV,so it has been disabled
LongRangeFieldlet you index a range and search by ranges overlapping the indexed ranges
- Lucene tests had gotten too slow recently, especially
- We don't need an exemption in Lucene's tests security policy for loading the Wikipedia test documents
- The flakey
MoreLikeThisTestthat keeps failin has finally been muzzled
- Another tricky corner case
geo3dtest failure emerges
- Stemming is tricky and it's hard to make changes without a formal analysis of the impact
MultiPhraseQueryhas only one clause, the classic highlighter will hit an
BooleanQuerycan optimize rewrite in a few cases
- The APIs to track external data structures along with Lucene's
- Nested span queries somehow broke between 4.10.x and today
- Making delete-by-query work with doc-values queries is horribly complex and it may make more sense to remove doc-values queries instead, though some people disagree
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!