This Week in Elasticsearch and Apache Lucene - 2017-04-18
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Changes in 5.4:
- Search hits and aggregations are now reduced in batches to reduce memory usage on the coordinating node. This allows us to remove the 1,000-shard soft limit.
_remote/infoAPI provides information about remote clusters configured for cross cluster search.
- Remote cluster names can now use wildcards, including
*to match all configured clusters.
- Fixed the handling of array settings when default settings are also present.
- Remove support for default settings except for
path.logs, which will be removed in a later version.
path.dataenvironment variable should not be set unless explicitly set by the user.
- A node which detects remnants of the
default.path.databug will refuse to start.
- Warn when not enough master eligible nodes are present.
- Load S3 plugin static settings eagerly so that the secure settings keystore can be closed.
_field_statsAPI has been deprecated in favour of
- There was a race condition when recovering replicas at the same time as relocating the primary.
- Fixed a memory leak when using inner hits inside a nested query by replacing
- Duplicate command line settings are no longer allowed.
- The JNA library is now built by Elastic with native libs for all of the platforms we support.
warningheaders by hand instead of with a slow regex.
- Reject empty document IDs.
- The context suggester now accepts numeric and boolean contexts, not just strings.
- Closing a
ReleasableBytesStreamOutputnow releases the underlying
BigArrayso that these streams can use
- The secure settings keystore can now store files, needed for the GCS repository.
- Older nodes should be able to parse
TaskInfofrom newer nodes, ignoring any new elements.
- The build will fail if code tries to log before logging is configured.
- Hidden files in the plugin directory are no longer ignored.
- Shadow replicas have been removed.
- It is no longer possible to specify custom
- Sufficient translog generations are now preserved to ensure that shards can recover from their local checkpoint.
- Sequence numbers are now used instead of version numbers to identify out-of-order indexing/delete operations during replication and recovery.
- Removed code to support old Lucene versions that didn't write checksums.
- The Java high level REST client is in the final (and longest) stage: learning to parse aggregation responses.
The 6.5.1 release is delayed
A Solr bug is delaying 6.5.1, which triggered a discussion about whether we should still get 6.5.1 out and work on getting 6.5.2 released short afterwards, or whether we should wait for the bug to be fixed before building a new release candidate for 6.5.1. Weak consensus seems to be to wait.
Elasticsearch master is now on Lucene 7
Elasticsearch master has been upgraded to a Lucene 7 snapshot so that we can start verifying what impact it has for us, especially in terms of disk footprint and performance given changes around sparse norms and doc values. The nightly benchmarks should pick up this change as of tomorrow.
- TermInSetQuery should expose a way to know which field it runs on.
- HeatMapFacetCounter should skip segments with no values.
- The KNN classifier and More-like-this are moving to BM25 rather than TF-IDF.
- Could we only run precommit on files that need it?
- We should check that close listeners are not registered on closed readers.
- OfflineSorted should not consume exhausted iterators.
- Can we make RAMDirectory faster by removing unnecessary synchronization?
- An issue about exposing how much memory BKDWriter may use quickly turned into making offline sorting faster.
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!