This Week in Elasticsearch and Apache Lucene - 2018-03-19
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Robustness improvements in recovery and replication
Nhat has improved the concurrency layer of Elasticsearch to now feature a tighter hand-off between recovery and replication. This improves the efficiency of the replica recovery process as in addition to making that same process more straightforward to reason about.
Triple-quoting Painless scripts in docs
JVM Crash Logs
Jason contributed an improvement to ensure that JVM Crash Logs will now be dumped to the default logging directory. Previously, this log file could have ended up in an unexpected location, hindering supportability.
Archive Unknown Settings
An issue plaguing some users is that invalid or unknown settings present in the cluster state can prevent all updates to cluster settings -- including removing the unknown setting. While we archive these unknown settings during startup today, this situation can arise during a rolling upgrade from 5.x to 6.x. The previous workaround for this issue was to shut down all master nodes together, followed by a restart. As of 6.3.0, any unknown or invalid settings will be automatically archived on an update to cluster settings. Thanks to Jason for this fix.
Changes in 5.6:
- Do not renew sync-id if all shards are sealed #29103
- Avoid class cast exception from index writer #28989
- Make PKI BootstrapCheck work with SecureSettings #3993
Changes in 6.2:
- Archive unknown or invalid settings on updates #28888
- REST api specs : remove unsupported wait_for_merge param #28959
- Watcher: Fix TransformInput toXContent serialization #4061
Changes in 6.3:
- Configure error file for archive packages #29129
- Configure heap dump path for archive packages #29130
- Client: Wrap synchronous exceptions #28919
- Align thread pool info to thread pool configuration #29123
- Fix EsAbortPolicy to conform to API #29075
- Store offsets in index prefix fields when stored in the parent field #29067
- Fix #29057 CWD to ES_HOME does not change drive #29086
- Allow overriding JVM options in Windows service #29044
- Fix Parsing Bug with Update By Query for Stored Scripts #29039
- CLI: Close subcommands in MultiCommand #28954
- Validate regular expressions in dynamic templates. #29013
- Untangle Engine Constructor logic #28245
- Improve error message when installing an offline plugin #28298
- Cleanup exception handling in IOUtils #29069
- Add ingest-attachment support for per document indexed_chars limit #28977
- Protect against NPE in RestNodesAction #29059
- Enforce that java.io.tmpdir exists on startup #28217
- Correct class to name string conversion #28997
- Do not swallow fail to convert exceptions #29043
- Add total hits to the search slow log #29034
- REST : deprecate field_data for Clear Indices Cache API #28943
- Put JVM crash logs in the default log directory #29028
- Log template creation and deletion #29027
- Remove interning from prefix logger #29031
- Reduce heap-memory usage of ingest-geoip plugin #28963
- Stop sourcing scripts during installation/removal #28918
- BREAKING: Create keystore on package install #28928
- Add check when trying to reroute a shard to a non-data discovery node #28886
- Disallow logger methods with Object parameter #28969
- Restore tiebreaker for cross fields query #28935
- Update audit trail filter policy settings #3984
- Audit Logging add realm name alongside principal in authz events #3260
- Watcher: Ensure usage stats work properly in distributed environment #4094
- Fix expiration millis for start_basic #4124
- SQL: Ban PrintWriter#println in CLI #4118
- Add type parameter to start_trial api #4102
- Add api to start basic license #4101
- SQL: Be more careful with break and eof #4092
- Add api to start basic license #4083
Changes in 7.0:
- BREAKING: REST : Clear Indices Cache API remove deprecated url params #29068
- BREAKING: Main response should not have status 503 when okay #29045
We delayed the cut of the first release candidate due to some last-minute bugs. They all look fixed now, so hopefully Alan, our release manager for 7.3, will be able to build a release candidate later this week.
IndexWriter doesn't swallow exceptions anymore
In some cases, IndexWriter could swallow exceptions. This is bad as it might hide some of the issues to users of the API. Exceptions are now always re-thrown, or added as suppressed exceptions to the original exception and aborting logic has been simplified. In the context of Elasticsearch, this is especially important in order to make sure that we can die with dignity in case of error. We will need to review lower levels as well, such as Directory and codecs, to make sure that they do not swallow exceptions either.
Simon gave IndexWriter the ability to perform soft updates by atomically updating doc-value fields on an existing document and introducing a new document with the same id. It would enable users to implement a soft-delete mechanism by using a doc-value field to store documents that have been "soft-deleted". This superseded an earlier idea to allow merges to keep deletes around.Compared to regular deletes, soft deletes have the benefit that documents may be un-deleted. It might also be useful in the context of versioning, since you need to remember the version of a document after it has been deleted in order to make sure to reject updates to the same id with lower versions.
- Using the center of mass of the shape as a test point can significantly speed up the build of GeoComplexPolygon.
- GeoComplexPolygon fails to validate that a point il within the polygon when the points are parallel to the test point. - Vector.isNumericallyIdentical is confusing since it actually checks for parallelism.
- The new implementation of intervals, an alternative to span queries, looks like it's almost ready to be pushed to sandbox.
- Should we add case-insentivity support to regexps?
- Should we make query caching asynchronous to prevent caching from hurting latency too much? - ReqOptSumScorer, which is used for mix of SHOULD and MUST/FILTER clauses, should leverage max scores of the sub clauses.
- We had more discussions regarding how to simply incorporate static relevance signals into the final score.
- A recent spike of Windows failures was due to Windows search indexing folders that were used by Jenkins to run tests. This caused errors when attempting to remove files in the tear-down phase since those files appeared to be still open.
- Upcoming release of ICU4J will fix a concurrency bug that would have made Lucene build corrupt indices when using ICUTokenizer.
- CommonGramsFilter should work on top of stacked tokens (synonyms).
- Can we make ShingleFilter correctly deal with index-time synonyms if we reduce its feature set?