Week in Elasticsearch and Apache Lucene - 2019-04-12

Elasticsearch

Highlighted Project: Snapshot Lifecycle Management (In Progress)

This is a prototype new section in our weekly updates in which we’ll tell a bit more of the story around a feature in development and solicit feedback from our community as detailed below. All of this is subject to substantial change. We won’t necessarily include this section every week.

What is it?

Snapshot Lifecycle Management (SLM) is our next key feature in our Index Lifecycle Management (ILM) work.  For background on ILM, check out a blog post that we published this week. Absent from the list of actions ILM can take on an index is snapshotting -- in fact, snapshotting is not well-suited to an index-level policy as snapshots can span many indices.

Filling this need will be SLM, a separate type of policy that permits scheduling of snapshots entirely inside of Elasticsearch.  Each policy will be able to specify index patterns to include, a schedule, a snapshot repository, a name pattern for the snapshot, and whether or not to include cluster-state items such as the index templates in a cluster.

We also have plans in the future to allow configuring retention for a policy, meaning something like "keep 30 days of snapshots" or "keep the last 10 successful snapshots" so the user doesn't have to manually delete older snapshots.

Why are we doing this?

ILM currently has many of the same features for managing indices as Elastic Curator, a command-line tool for managing the lifecycle of Elasticsearch indices.  Curator is itself a Python application, and therefore runs externally from an Elasticsearch cluster -- we would like to provide many of its features directly within Elasticsearch so that an external tool is not needed.  Curator provides snapshot capabilities, and therefore SLM closes an important functionality gap here.

Our own Cloud offering also uses snapshots heavily with a custom scheduling mechanism, and we'd like to provide a native feature for use there as well.

Feedback we're looking for

If you have feedback around this feature, please comment on the Github SLM Issue. We're particularly interested to see if folks have any special requirements created snapshot schedules or defining retention policies.

Highlights

Typeless Blog Post

We recently published a blog post about types removal and how to migrate to typeless APIs.

Watcher UI

We're currently working on an advanced watcher UI, written in React. We merged the initial PR for the UI, we've also been working on form validation. Work is also progressing on the threshold watch edit page and the watch history and detail pages. Here's a screenshot:

Advanced Watcher UI

Making upgrades easier

One speed bump users can hit while preparing for an upgrade of Elasticsearch is determining which of their internal applications is sending a query to Elasticsearch that leverages deprecated functionality. To this end, we've started work on enriching deprecation logs by adding a originating ID (X-Opaque-ID) to the log message.

New JDK Requirements

Elasticsearch 8.0.0 and Lucene 9.0.0 will require JDK 11 at a minimum. We modified the Elasticsearch build to remove support for JDK 8 and bump the minimum JDK to 11. Aside from the productivity benefits this gives developers working in the Elasticsearch codebase (there are new language-level features such as var and APIs such as convenience methods on collections), there is also new functionality available that we can build user-facing features on top of. Running Elasticsearch on any version less than JDK 11 will produce a deprecation warning starting with a future Elasticsearch 7.x release. Starting with Elasticsearch 7, our default download includes a bundled JDK; if you’re using the bundled JDK or our official Docker image you will no longer need to worry about the JDK version.

Snapshot/Restore

We've introduced an eventually consistent mock blob-store to allow us to simulate S3's behaviour in unit tests, and validated that this would indeed have identified a recently-fixed issue to do with the naming of segment files in a snapshot. This relied on a larger refactoring of TransportShardBulkAction to replace blocking operations with futures so we can run single-threaded tests that still do things like dynamic mapping updates.

We also opened a PR to save another unnecessary API call to S3.

Lucene

  • We fixed a bug calling toString on SegmentInfos can cause ConcurrentModificationExceptions
  • Discussions on a new Postings format continue
  • We opened an issue to reconsider how we encode postings now that we can skip non-competitive hits. 
  • Should reading docs and freqs be specialized when reading with impacts?
  • Luke is now a Lucene module
  • IndexReaders are getting support for per-reader attributes which allows controlling lower level constructs like per-field off-heap term dictionaries.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Cleanup JVM options after JDK 11 bump #40961
  • BREAKING: Bump the minimum Java version to Java 11 #40754

Changes in 7.1:

  • Use the breadth first collection mode for significant terms aggs. #29042
  • Reindex from remote deprecation of escaped index #41005
  • Add packaging to cluster stats response #41048
  • Deprecate permission over aliases #38059
  • Adjust init map size of user data of index commit #40965
  • Docs: Simplifying setup by using module configuration variant syntax #40879
  • Handle mindocfreq in suggesters #40840

Changes in 6.7:

  • Do not trim unsafe commits when open readonly engine #41041
  • Use setIfSeqNo(…) and setIfPrimaryTerm(…) for updating watch status if all nodes are at least on 6.7.0 #40888
  • Properly handle Monitoring exporters all disabled #40920
  • SQL: Change schema calls to empty set #41034
  • Fix rewrite of inner queries in DisMaxQueryBuilder #40956
  • SQL: Fix catalog filtering in SYS COLUMNS #40583
  • Short-circuit rebalancing when disabled #40966
  • Fix unsafe publication in opt-out query cache #40957
  • SQL: Use ResultSets over exceptions in metadata #40641
  • Be lenient when parsing build flavor and type on the wire #40734
  • Suppress lease background sync failures if stopping #40902

Changes in Elasticsearch Hadoop Plugin

Changes in 7.0:

  • [DOCS] Adds shared attributes #1274

Changes in 2.0:

  • Docs: Clean up for asciidoctor #1275

Changes in Elasticsearch Management UI

Changes in 7.0:

  • Update Delete Remote Cluster API to support multiple comma-delimited clusters. #34595