This Week in Elasticsearch and Apache Lucene - 2019-08-02

Elasticsearch

Automatic Search Cancellation

We continued to work on cancelling searches when the connection that made the request gets closed. We are now working on hardening the tests but the main functionality is ready. The Kibana team has been working on making Kibana close the connection when the user navigates away from the page.

Painless Date Examples

We added some date usage examples for painless. We also fixed a casting issue related to the temporary compatilbity class JodaCompatibleZonedDateTime which we added for the transition from joda to java time. With the new auto casting behavior between JCZDT and ZDT, our example documentation can now refer to ZonedDateTime even though the concrete class is JCZDT.

Geo

With the upcoming release of cartesian shapes the pieces are in place to begin adding support for geospatial projections. This will enable the Elasticsearch Geo community to index geospatial geometries in their native projection without loss of precision resulting from the requirement to reproject back to WGS84 lat/lon to index in Lucene. Oil & Gas users, for example, have a lot of data in NAD27 and NAD83 based projections, the academic community has a lot of data in State Plane projections, and NASA and other government agencies use a large variety of spatial projections (bacon globular being one of my particular favorites). Work has begun to explore the performance tradeoffs of using either Apache SIS or Proj4J (both Apache 2 licensed) to handle the reprojection logic vs. rolling our own projection library (something we could do but probably don't need or want to).

We merged in the new ShapeFieldMapper, which means indexing arbitrary 2 dimension geometry has landed! Elasticsearch uses can now index any arbitrary 2 dimension (non Geospatial) geometries making it possible to map out virtual worlds, sporting venues, theme parks, and CAD diagrams. Thanks to the awesome team effort to refactor the existing ShapeBuilders, FieldMappers, and QueryBuilders to make this relatively straightforward to integrate and easier to manage. The last piece of the puzzle is to merge the ShapeQueryBuilder PR which adds the ability to query indexed shapes using the familiar GeoJson and WKT formats used in geo_shape queries.

Master Resiliency / Slow Cluster State Updates

Based on a number of recent support cases that involved slow cluster state updates, we significantly improved our logging in this area, adding a lot more detail in various places to allow identifying slow nodes during cluster state publication as well as slow components within these nodes during cluster state application, and this even without requiring to crank up logging levels.

Peer recovery retention leases

We reinstated the recovery of history from Lucene during peer recovery now that we have all the supporting infrastructure to retain that history using retention leases. Peer recovery will decide whether to perform a file-based recovery or not according to the existence of an appropriate lease (on indices with soft-deletes enabled, which is the default since 7.0). As leases expire after 12 hours, and persist across restarts, we expect clusters to be able to perform operation-based recoveries for replicas that were active at some point in the last 12 hours.

There are still some outstanding tasks in this project. Most notably we now only use the translog for recovering locally (on indices with soft-deletes enabled) and can therefore discard old translog generations much more enthusiastically than we currently do.

Cluster Privilege Refactoring

For a long time we've had a plan to make it possible to be more fine-grained with cluster level privileges. For example, the manage privilege allows users to update cluster settings, but that means they are allowed to update all settings - there is no way to restrict a user to specific settings.

Last year, when introducing application privileges we wanted the kibana reserved user to be able to manage its own privileges, but not the privileges of other applications, so we started down the path of having cluster privileges that were could take the request object into account rather than simply authorising on an action name.

With API Keys we wanted to do something similar where any user could be assigned a privilege that allowed them to manage their own API keys, but no one else's. It turns out that the model we had put in place last year didn't quite fit in with what we wanted to do here, so we have been working to refactor the cluster privilege model to support this use case.

We're hopeful that this will make it easier to implement some of the finer grained cluster privileges that customers have been asking for.

Proxy PKI in Kibana

We merged most of the necessary PRs to the proxied-pki feature branch, which allows the Kibana team to start working on their changes to support PKI authentication in Kibana.

Index Lifecycle Management

We added a couple of helpful flags for the ILM explain response output - 'only_managed' and 'only_errors', these limit the responses either only to indices that currently have a policy, or to indices that have a policy where the policy is currently in the ERROR step. This should help cut down the size of the response that a user has to read when trying to figure out which of their indices is having ILM troubles (https://github.com/elastic/elasticsearch/pull/44777).

The team had an interesting discussion during our weekly stand-up about supporting indices that use ILM policies with rollover as well as manually rolling over. If a user currently does this, ILM will enter the ERROR state when it goes to automatically roll over the index. We've seen a couple of internal folks hit this as well as customers, so we brainstormed a solution that would allow use to skip the ILM rollover when we detect it has already been done manually outside of ILM (https://github.com/elastic/elasticsearch/issues/44175). This work aims to lower the chance a user can enter the ERROR state, which we would always prefer to avoid.

Snapshot Lifecycle Management

On the SLM side, work continues towards implementing retention for snapshots, the initial snapshot retention predicates have been implemented (https://github.com/elastic/elasticsearch/pull/44926) which means that we now support expiration based on the time value of a snapshot, a minimum number of snapshots, or a maximum number of snapshots (or any combination of all three). Additionally, performing the actual "delete" of snapshots may take a long amount of time, so we opened a PR to time-bound the amount of time that SLM will spend doing snapshot deletion during retention (https://github.com/elastic/elasticsearch/pull/45065). This is a user configurable setting and will initially be capped at an hour.

Lucene

Matches API

The matches API allows to retrieve positions where a query matches on a per-document basis. 

It is implemented in two phases, a match phase where the full query is evaluated to check if the query matches the document and a second phase that creates an iterator over all the positions that match inside the document for the requested fields. When checking if a multi-term query, such as an AutomatonQuery or TermInSetQuery, matches we currently find all matching term iterators up-front, to return a disjunction over all of them. This can be inefficient if we're only interested in finding out if anything matched, and are iterating over a different field to retrieve positions.
We improved this by returning immediately when the first matching term is found, and only collecting other matching terms lazily (when we start iterating the positions for that field).

Boolean Query

We already support early termination of constant score queries (queries that give the same score to all documents) when the hit count is not requested but some cases were missing. For instance a boolean query that is only composed of filters will assign a score of 0 to all documents but the support for early termination is this case was not implemented. We worked on a patch to handle this case efficiently without the need to wrap the boolean query in another high level query.

Other

  • The minimal french stemmer now avoids stemming numbers.
  • The tests of the Korean Dictionary builder have been merged to the main src directory and are now part of the general test suite (ant test).
  • IndexSearcher.termStatistics should not require TermStates but only docFreq and totalTermFreq.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Add missing dependency on plugin bundle task #45103
  • Rename HLRC 'indexlifecycle' components to 'ilm' #44982
  • BREAKING: Remove client feature tracking #44929
  • RestController should not consume request content #44902

Changes in 7.4:

  • Remove unnecessary plugin application and project configuration #45100
  • Updated slm API spec parameters and URL #44797
  • Use IndicesModule named writables in elasticsearch-shard tool #45036
  • More logging for slow cluster state application #45007
  • Remove task null check in TransportAction #45014
  • Stop Recreating Wrapped Handlers in RestController #44964
  • Improve errors when TLS files cannot be read #44787
  • [GEO] New ShapeFieldMapper for indexing cartesian geometries #44980
  • Adds usage stats for vectors: #44512
  • [ML][Data Frames] unify validation exceptions between PUT/_preview #44983
  • [ML][Data Frame] add support for bucket_selector #44718
  • Remove leniency during replay translog in peer recovery #44989
  • Explicitly fail if a realm only exists in keystore #44471
  • Fix JodaCompatibleZonedDateTime casts in Painless #44874
  • Remove leniency in reset engine from translog #44711
  • Geo: fix geo query decomposition #44924
  • TaskListener#onFailure to accept Exception instead of Throwable #44946

Changes in 7.3:

  • Do not use scroll when finding duplicate API key #45026
  • Sparse role queries can throw an NPE #45053
  • bug fix about elasticsearch.common.settings.Settings.processSetting #44047
  • Fix version logic after 7.3 release (BWC) #45077

Changes in 6.8:

  • Fix early termination of aggregators that run with breadth-first mode #44963