19 March 2018

This Week in Elasticsearch and Apache Lucene - 2018-03-19

By Adrien GrandTom Callahan

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Robustness improvements in recovery and replication

Nhat has improved the concurrency layer of Elasticsearch to now feature a tighter hand-off between recovery and replication. This improves the efficiency of the replica recovery process as in addition to making that same process more straightforward to reason about.

Triple-quoting Painless scripts in docs

Nik contributed a feature that helps make scripts in the docs substantially more readable. Rather than a script appearing on a single-line, scripts may now be broken into several lines.

JVM Crash Logs

Jason contributed an improvement to ensure that JVM Crash Logs will now be dumped to the default logging directory. Previously, this log file could have ended up in an unexpected location, hindering supportability.

Archive Unknown Settings

An issue plaguing some users is that invalid or unknown settings present in the cluster state can prevent all updates to cluster settings -- including removing the unknown setting. While we archive these unknown settings during startup today, this situation can arise during a rolling upgrade from 5.x to 6.x. The previous workaround for this issue was to shut down all master nodes together, followed by a restart. As of 6.3.0, any unknown or invalid settings will be automatically archived on an update to cluster settings. Thanks to Jason for this fix.

Changes in 5.6:

  • Do not renew sync-id if all shards are sealed #29103
  • Avoid class cast exception from index writer #28989
  • X-Pack:
    • Make PKI BootstrapCheck work with SecureSettings #3993

Changes in 6.2:

  • Archive unknown or invalid settings on updates #28888
  • REST api specs : remove unsupported wait_for_merge param #28959
  • X-Pack:
    • Watcher: Fix TransformInput toXContent serialization #4061

Changes in 6.3:

  • Configure error file for archive packages #29129
  • Configure heap dump path for archive packages #29130
  • Client: Wrap synchronous exceptions #28919
  • Align thread pool info to thread pool configuration #29123
  • Fix EsAbortPolicy to conform to API #29075
  • Store offsets in index prefix fields when stored in the parent field #29067
  • Fix #29057 CWD to ES_HOME does not change drive #29086
  • Allow overriding JVM options in Windows service #29044
  • Fix Parsing Bug with Update By Query for Stored Scripts #29039
  • CLI: Close subcommands in MultiCommand #28954
  • Validate regular expressions in dynamic templates. #29013
  • Untangle Engine Constructor logic #28245
  • Improve error message when installing an offline plugin #28298
  • Cleanup exception handling in IOUtils #29069
  • Add ingest-attachment support for per document indexed_chars limit #28977
  • Protect against NPE in RestNodesAction #29059
  • Enforce that java.io.tmpdir exists on startup #28217
  • Correct class to name string conversion #28997
  • Do not swallow fail to convert exceptions #29043
  • Add total hits to the search slow log #29034
  • REST : deprecate field_data for Clear Indices Cache API #28943
  • Put JVM crash logs in the default log directory #29028
  • Log template creation and deletion #29027
  • Remove interning from prefix logger #29031
  • Reduce heap-memory usage of ingest-geoip plugin #28963
  • Stop sourcing scripts during installation/removal #28918
  • BREAKING: Create keystore on package install #28928
  • Add check when trying to reroute a shard to a non-data discovery node #28886
  • Disallow logger methods with Object parameter #28969
  • Restore tiebreaker for cross fields query #28935
  • X-Pack:
    • Update audit trail filter policy settings #3984
    • Audit Logging add realm name alongside principal in authz events #3260
    • Watcher: Ensure usage stats work properly in distributed environment #4094
    • Fix expiration millis for start_basic #4124
    • SQL: Ban PrintWriter#println in CLI #4118
    • Add type parameter to start_trial api #4102
    • Add api to start basic license #4101
    • SQL: Be more careful with break and eof #4092
    • Add api to start basic license #4083

Changes in 7.0:

  • BREAKING: REST : Clear Indices Cache API remove deprecated url params #29068
  • BREAKING: Main response should not have status 503 when okay #29045

Apache Lucene

Lucene 7.3

We delayed the cut of the first release candidate due to some last-minute bugs. They all look fixed now, so hopefully Alan, our release manager for 7.3, will be able to build a release candidate later this week.

IndexWriter doesn't swallow exceptions anymore

In some cases, IndexWriter could swallow exceptions. This is bad as it might hide some of the issues to users of the API. Exceptions are now always re-thrown, or added as suppressed exceptions to the original exception and aborting logic has been simplified. In the context of Elasticsearch, this is especially important in order to make sure that we can die with dignity in case of error. We will need to review lower levels as well, such as Directory and codecs, to make sure that they do not swallow exceptions either.

Soft updates

Simon gave IndexWriter the ability to perform soft updates by atomically updating doc-value fields on an existing document and introducing a new document with the same id. It would enable users to implement a soft-delete mechanism by using a doc-value field to store documents that have been "soft-deleted". This superseded an earlier idea to allow merges to keep deletes around.Compared to regular deletes, soft deletes have the benefit that documents may be un-deleted. It might also be useful in the context of versioning, since you need to remember the version of a document after it has been deleted in order to make sure to reject updates to the same id with lower versions.

Geo

Query processing

Testing - TestIndexWriterWithThreads has wall-clock time dependency.

Analysis