This Week in Elasticsearch and Apache Lucene - 2020-01-17

Elasticsearch

Check for deprecations when analyzers are built

Today in Elasticsearch, it is possible to create an index with a component that was disallowed after a given version, and not get an error message until you attempt to index a document.  We want to fail early, and refuse to create the index. To address this, we added a new step to the analysis module that runs empty strings through all custom analyzers defined on an index.  This means that we now get deprecation warnings and, more importantly, component error messages, at index creation time. 

New cluster state storage

Work on the new Lucene-based cluster state storage mechanism has been merged in time for 7.6 feature freeze. We measured the performance improvements: The new storage layer is slightly slower when only very little information is written out (e.g. only cluster state version changed), but works much better in case where large cluster states are written out.

The new storage layer also comes with two new command-line tools that can help resurrect a broken cluster. The elasticsearch-node remove-settings tool allows removing persistent settings from the on-disk cluster state in case settings are incompatible in a way that prevents the cluster from forming and from updating the persistent setting using the APIs. This will help in situations where settings are not properly validated before being put into the cluster state, as we have seen in a number of issues. We also added a command-line tool to remove broken custom metadata from the cluster state, which helps in situations where a plugin has broken a cluster state in ways that prevent it from being loaded.

ILM & SLM

One issue with ILM is that actions are not easily updatable while in the middle of execution. #50820 addresses this by allowing ILM phase definitions to be refreshed on indices where the new policy does not differ too much from the cached definition. This allows users to update their Rollover criteria, for example, while in the middle of the Rollover step within ILM.

We found a serious bug where a previously restored snapshot convinced SLM's retention that a snapshot was always being continually restored. #50868 fixes this behavior.

Fallback to anonymous user

Due to the way we handle the validation of Elasticsearch Token Service tokens and API Keys, an invalid or incorrect token/API key would be treated as if the incoming request had no authentication information. If anonymous access for Elasticsearch is enabled then these requests would fall back to being authenticated as the anonymous user. While this is not a security issue, it is a confusing behavior for users and in general we should be treating the lack of credentials and erroneous credentials differently. This caused problems for Kibana's token based authentication providers (i.e. everything that is not username & password). We raised the least intrusive change we could to resolve the issue. We have future plans to rewrite the logic that handles our AuthenticationToken.

Certutil http command

We merged a new "http" sub-command for elasticsearch-certutil that provides a guided process for generating certificates for the REST interface of elasticsearch, and includes instructions for configuration changes to both Elasticsearch and Kibana. We plan to add instructions for other products and clients in the near future.

Rally 1.4.0 released

We released Rally 1.4.0 this week. In this release, Rally now supports the use of the JDK that is bundled with Elasticsearch. This is enabled by specifying --runtime-jdk="bundled" when running Rally. We also added support for using secure settings for plugins in Rally. This is handled by passing these settings through the --plugin-params switch. The Rally teams documentation has been updated for using this ability for the repository-s3 and repository-gcs plugins.

Watcher UI enhancements

We merged a number of bug fixes and enhancements lately (#53751, #53719). One of the most valuable has been an update to the Watcher Threshold Alert form that allows users to specify the URL scheme of a Webhook action (#53757). Without this option, users were unable to use secured URLs (beginning with "https").

enter image description here

Apache Lucene

Gradle build

A longstanding effort to move the build to gradle has landed in master. It is not a complete replacement for Ant yet, so you can still build the old way. The plan is to move progressively to a full Gradle build. Please don't hesitate to test locally and open issues if you find anything odd.

Simplify postings API by removing long[] metadata

We removed some duplicated methods to store metadata information about a term in different shapes (long[] or byte[])  in the postings API. As part of this simplification, we also removed the experimental FSTOrd postings format that allows searching for terms in the dictionary by ordinals. However, the FST50 postings format has been preserved, so you can still use a terms dictionary entirely backed by an FST.

Build FuzzyQuery automata up-front

Alan merged a change in the FuzzyQuery to build its automata at construction time rather than during the rewrite of the query. This is consistent with what the other automaton queries do in order to ensure that they build the automata once.

Terms dictionary compression

We are adding a second layer of compression for the default terms dictionary. Currently, the prefixes of terms are stored in an FST that can be loaded off-heap and the suffixes are stored in blocks on disk. The idea is to compress these blocks in order to reduce the size on disk, but use a lightweight method to preserve good performance while accessing the dictionary.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • BREAKING: Goodbye and thank you synced flush! #50882
  • Disallow Password Change when authenticated by Token #49694
  • Remove type parameter from PutMappingRequest.buildFromSimplifiedDef() #50844
  • BREAKING: Remove the 'template' field in index templates. #49460

Changes in 7.7:

  • Encrypt generated key with AES #51019
  • Follow symlinks in Docker entrypoint #50927
  • Fail gracefully on invalid token strings #51014
  • Add analysis components and mapping types to the usage API. #51031
  • Add pipeline name to ingest metadata #50467
  • SQL: Extend the optimisations for equalities #50792

Changes in 7.6:

  • Emit warnings when index templates have multiple mappings #50982
  • Report progress of multiple plugin installs #51001
  • Allow installing multiple plugins as a transaction #50924
  • Remove custom metadata tool #50813
  • Adds support for geo-bounds filtering in geogrid aggregations #50002
  • Deprecate and remove camel-case nGram and edgeNGram tokenizers #50862
  • Add "did you mean" to ObjectParser #50938
  • Check for deprecations when analyzers are built #50908
  • Deprecating kibana_user and kibana_dashboard_only_user roles #46456
  • Support Client and RoleMapping in custom Realms #50534
  • Refresh cached phase policy definition if possible on new poli… #50820
  • Deprecate synced flush #50835
  • Move metadata storage to Lucene #50907
  • Add certutil http command #49827
  • Make .async-search-* a restricted namespace #50294
  • Add max_resource_units to enterprise license #50735
  • Increase Size and lower TTL on DLS BitSet Cache #50535
  • Validate field permissions when creating a role #50212
  • Fix memory leak in DLS bitset cache #50635
  • Fix format problem in composite of unmapped #50869
  • Ensure we emit a warning when using the deprecated 'template' field. #50831

Changes in 7.5:

  • Fix caching for PreConfiguredTokenFilter #50912
  • Block too many concurrent mapping updates #51038
  • Auto-format buildSrc #50786
  • Fix SLM check for restore in progress #50868
  • Fix upgrade of custom similarity #50851
  • Fix unintended debug logging in subclasses of TransportMasterNodeAction #50839

Changes in 6.8:

  • Improve warning value extraction performance in Response #50208

Changes in Elasticsearch SQL ODBC Driver

Changes in 6.8:

  • SQLColAttribute: add bwc attributes for 2.x #205

Changes in Rally

Changes in 1.4.1:

  • Fix W0631 #868
  • Fix W0143 #867
  • More resilient node shutdown #860

Changes in 1.4.0:

  • Fix W1201. #869

  • Detach Elasticsearch on startup #859

  • Pass plugin params for all plugins #861