This Week in Elasticsearch and Apache Lucene - 2018-03-26
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Highlights
- We will be bumping compilation requirements to JDK 10 for the 6.x and master branches,.
- Persistent Task framework will allow disabling persistent tasks when a cluster is formed. This is needed to make sure the cluster stays dormant when you are preparing to restore it from a snapshot. Without this, things like machine learning will start creating indices which prevent restoring.
- Boaz continues the effort to simplify the logic of opening the Engine (here and here). First attempt was very succesful but went a step too far. Second attempt is less clean but protects us from some nasty lock related exceptions.
Changes in 5.6:
- Call addBytesSent with correct number of bytes #29194
Changes in 6.2:
- Harden periodically check to avoid endless flush loop #29125
- X-Pack:
Changes in 6.3:
-
REST : Split
RestUpgradeAction
into two actions #29124 - Change BroadcastResponse from ToXContentFragment to ToXContentObject #28878
- Add Z value support to geo_point and geo_shape #25738
- REST high-level client: add force merge API #28896
- Propagate mapping.single_type setting on shrinked index #29202
- RankEvalRequest should implement IndicesRequest #29188
- Add new setting to disable persistent tasks allocations #29137
- REST high-level client: add clear cache API #28866
- Plugins: Fix module name conflict check for meta plugins #29146
-
X-Pack:
- Enable security in packaging tests
- SQL: Introduce CSV and TSV tabular output
- Disable security for trial licenses by default
- Watcher: Hide credentials/secret data of integrations in toXContent
- Do not allow registering basic licenses
- Improve license expiration log line
- Provide clearer errors if SAML is not licensed
- X-Pack-Security: Making setup-passwords work with protected keystores
- Remove date from rest resp for non-exp licenses
- Watcher comparisons don’t deal with NaN correctly
- Check cluster heath before setup-passwords
- Add beats_system user to security
Changes in 7.0:
- BREAKING: Remove deprecated options for query_string #29203
-
BREAKING: Reject updates to the
_default_
mapping. #29165
Lucene 7.3
Fixing the existing shingle filter to support synonyms proved challenging, so Alan investigated building a new shingle filter with a more restrictive set of options (single size, never output unigrams) and only targeting support for index-time synonyms, rather than search-time synonyms which are more complex since they may span multiple tokens. Testing found a couple unexpected issues, but nothing that couldn't be fixed, and this filter is now merged.
This is important to Elasticsearch as it means that we will be able to optionally index shingles into a sub field in the future and then run phrase queries against this field rather than the original field, which will return the same matches with better performance.
- MoreLikeThis.setMaxDocFreqPct can overflow ints since it does `pct * numDocs / 100`. But this raised another question about whether it should actually use numDocs or maxDoc.
- TestICUTokenizerCJK has a reproducible test failure.
- GeoPolygonFactory still shows issues with coplanar points (Fixed in 6.7, 7.4, 8.0 by Karl)
- LevenshteinAutomata can precompute its number of states and transitions to preallocate the night amount of memory (Fixed in 7.4, 8.0 by Karl Wright and Christian Ziech).
Discuss
- Can we refactor query caching to allow for trading throughput for latency by caching asynchronously?
- Can we improve cross-field scoring with something like BM25F?