This Week in Elasticsearch and Apache Lucene - 2020-01-10

Elasticsearch

Index and Snapshot Lifecycle Management

We added integration between ILM and SLM by introducing a new action to the ILM delete phase that will wait for a named SLM policy to finish taking a snapshot before the index is deleted. (#50454)

For example:

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "delete": {
        "actions": {
          "wait_for_snapshot" : {
            "policy": "slm-policy-name"

As part of our effort to make ILM more resilient (#48183):

  • We refactored the Rollover API (used both in manual rollovers and ILM) execution to use only one cluster state update (as opposed to 3 chained updates). (#50388)
  • We added the infrastructure to support retrying the execution of async ILM actions and converted the RolloverStep to be (the first) retryable async step. (#50522)
  • We converted InitializePolicyContextStep to be retryable (#50685).
  • We updated the UpdateRolloverLifecycleDateStep to be retryable as part of (#50702).

We expect these improvements to make a big impact both on our users and our support team as they use and support ILM.

CONTAINS support for BKD-backed geoshapes is merged

We merged the PR in Elasticsearch that adds CONTAINS support for BKD-backed geoshapes.  This was a big blocker for some users wanting to upgrade from the old prefix trees, and gets us one step closer to full feature parity.

The mappings editor

The mappings editor has been a substantial UI project for us. This editor provides users with an expandable tree for navigating and editing the mappings object of an index template, and has lots of explanations and links to documentation. We're on the verge of merging this feature (integrated as part of the Index Templates wizard) and we're all pretty excited.

A few of the recent highlights:

Mappings Editor Screenshot

Cleaning up snapshot repositories

We added a feature to the Snapshot Restore app that allows users to clean up repositories at the click of a button. This deletes unreferenced snapshot data from the repository, freeing up storage space.

Repository Cleanup Button Screenshot

Mandatory soft-deletes in 8.0

Elasticsearch uses a feature called soft deletes to preserve recent deletions in the Lucene index. This allows us to keep a history of operations inside of each index of how it got to its current state. Today we use this history to support both cross-cluster replication as well as shard recovery.

We are making soft-deletes mandatory in 8.0. This constraint allows us to greatly simplify the logic in the engine, translog, recoveries, and other components. Existing 7.x indices with soft-deletes disabled will automatically start using soft-deletes in 8.x, and no new 8.x indices can be created with soft-deletes disabled. One of the technical challenges is to maintain quick peer recoveries when doing a rolling upgrade from 7.x without soft-deletes to 8.x with soft-deletes. We are solving this by making sure that all indices in 7.6+ will have peer recovery retention leases, even for indices without soft-deletes. The remaining tasks of this story can be tracked on the following meta-issue.

Lucene

Lucene 8.4

Lucene 8.4 has been released! Elasticsearch master and branch 7.x are now on this release.

Speed up merging doc-values term dictionaries

We sped up merging of SORTED/SORTED_SET doc values by specialising the termsEnum used in this case which takes advantage of the current position during lookup.

Other

  • In case you missed it, we wrote a blog post summarizing the highlights on Apache Lucene during 2019.
  • We fixed a bug in the NRTCachingDirectory where files with no flush or merge info were assumed to have length zero. This might lead to the cache directory not honouring the max cache size.
  • We are working in adding the capability to the Matches API to return the actual terms that matched the given query. This will allow users of this API to indicate which specific terms are present or absent in each top-k hit.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Remove type parameter from PutMappingRequest.buildFromSimplifiedDef() #50844
  • BREAKING: Remove the 'template' field in index templates. #49460
  • Remove type parameter from CIR.mapping(type, object...) #50739
  • BREAKING: Remove the 'local' parameter of /_cat/nodes #50594
  • Always use soft-deletes in InternalEngine #50415

Changes in 7.6:

  • Ensure we emit a warning when using the deprecated 'template' field. #50831
  • Do not force refresh when write indexing buffer #50769
  • [Transform] fail to start/put on missing pipeline #50701
  • ILM action to wait for SLM policy execution #50454
  • Make InitializePolicyContextStep retryable #50685
  • Make Multiplexer inherit filter chains analysis mode #50662
  • Upgrade to Lucene 8.4.0. #50518
  • Sync grok patterns with logstash patterns #50381
  • Guess root cause support unwrap #50525
  • Allow parsing timezone without fully provided time #50178
  • Import replicated closed dangling indices #50649
  • Make the UpdateRolloverLifecycleDateStep retryable #50702
  • Populate User metadata with OpenIDConnect collections #50521
  • Add cliSetup command to test clusters configuration #50414
  • Correctly handle MSM for nested disjunctions #50669
  • Add 'monitor_snapshot' cluster privilege #50489
  • Security should not reload files that haven't changed #50207
  • Deprecate indices without soft-deletes #50502
  • Deleted docs disregarded for if_seq_no check #50526
  • Collect shard sizes for closed indices #50645
  • Make EC2 Discovery Cache Empty Seed Hosts List #50607
  • Guard against null geoBoundingBox #50506
  • Add parameter to make sure that log of updating IndexSetting be more detailed #49969
  • Deprecate the 'local' parameter of /_cat/nodes #50499
  • ILM retryable async action steps #50522
  • Add fuzzy intervals source #49762
  • Don't dump a stacktrace for invalid patterns when executing elasticse… #49744
  • Make EC2 Discovery Plugin Retry Requests #50550
  • Add remote info to the HLRC #49657

Changes in 7.5:

  • Fix upgrade of custom similarity #50851
  • Fix unintended debug logging in subclasses of TransportMasterNodeAction #50839
  • Skip test suite entirely for non-applicable distribution types #50824
  • Foreach processor - fork recursive call #50514
  • Ensure that field collapsing works with field aliases. #50722
  • SQL: Optimisation fixes for conjunction merges #50703
  • Check allocation id when failing shard on recovery #50656

Changes in 6.8:

  • Fix NPE bug inner_hits #50709

Changes in Elasticsearch SQL ODBC Driver

Changes in 6.8:

  • SQLColAttribute: add bwc attributes for 2.x #205

Changes in Rally

Changes in 1.4.0:

  • Pass plugin params for all plugins #861
  • Allow to use the bundled JDK in Elasticsearch #853

Changes in Rally Tracks

  • Add reindex operation #92

  • Update http_logs and geonames target throughput #94