This Week in Elasticsearch and Apache Lucene - 2019-10-04

Elasticsearch

Highlighted Discuss Post

We helped a user debug a performance degradation for queries with range clauses: Elasticsearch uses Lucene’s IndexOrDocValuesQuery to decide whether to use either index structure (points or terms), or doc values in executing a query. A characteristic of the user’s data, large arrays on a field, was causing the planner to pick a slower execution strategy. Ignacio provided a workaround of not using doc values for the field that worked for the user, and opened a lucene issue to address the underlying problem.

Too many translog files

We looked into an issue where a node in a cluster had run out of file descriptors because of a very large number of many small translog files. The default retention policy keeps translog generations up to 12 hours and 512MB to power sequence-number based recoveries, regardless of the number of translog files. We introduced a new limit for the number of retained files so that we can protect the cluster better in these situations. Note that this patch is mainly for 6.x (and 7.x clusters that have soft-deletes disabled), as since version 7.4, the age- and size-based translog retention policies are no longer active due to the use of peer recovery retention leases.

API Keys

We started on a UI for managing API Keys. That highlighted a gap in the GET API (to list existing API keys) which didn’t support listing all keys (even if called by a superuser). We has raised a PR to support retrieving all API keys.

We originally had a requirement that API key names had to be globally unique. This had a few issues

  1. Individual users could not see each other’s keys, yet we expected them to be able to pick a unique name each time they created one (which would encourage schemes like including your userid and the date in the name).
  2. The unique check introduced a noticeable performance penalty, particularly when creating API keys in bulk.
  3. We could never truly enforce uniqueness because the check-for-existing & create-new key could not be done within a single atomic transaction. 

We agreed to drop this uniqueness requirement from the API and We have opened a PR for that. We started a discussion about whether the upcoming UI should seek to have unique names within a single user’s set of API keys.

Removing types

We continue the work on the removal of types. He already handled several APIs in master (Get and MultiGetExplainValidateQuery, ...) and will continue to pick up work from the meta issue. These are all reasonably small changes, but types are deeply woven into a lot of code in Elasticsearch so the cleanup is not over. 

Point in time searcher

We continued the cleanup of the search context in order to be able to rebuild it on every search phase. Today the search context is built once (on the first search phase) and then saved in a map in memory. This is fragile because the context is entirely mutable and exposes a lot of informations that are usable in some phases but not all. For this reason, we opened a pr to make the context immutable where possible and to allow building a new one easily. This cleanup will allow to rebuild the context on each phase using the original shard request and passing the needed information directly from the coordinating node. We will now focus on splitting the search context into a reader context that should survive the search phase and a request context that will be re-created on each phase. This is required to implement the ability to reuse the same point in time searcher on multiple queries.

Circuit Breaker improvements

We opened a PR to wrap various usages of Lucene's PriorityQueue with a circuit breaker. This PQ allocates its internal array immediately on instantiation, and if instantiated with a large size (like when people use INT.MAX_VALUE for agg sizes) it could easily OOM a node because we don't track it in a circuit breaker.

Enrich

We added monitoring support for Enrich. There is now a collector for Enrich that will add Enrich specific items to the .monitoring-es index. The information here can be used to help troubleshoot and possibly correlate with other metrics recorded. The information recorded mirrors the existing stats and can be found here.

The get policies api should be able to return more information than just configuration. Things like last policy execution and policy status. This will make it easier for the UI to build a policy overview. In order to facilitate this a config namespace was added the the get policies api.

Executing an enrich policy (i.e. copying data from source index to the .enrich index) can take some time. The policy execute api should support an async model via a wait_for_completion parameter, which will allow the ui execute a policy in the background without waiting for a response.

API reference reformatting

Work to standardize our API reference material is nearing completion. We finished the Ingest API docs (#47409), Deb is wrapping up the Document APIs (#45365), and finished the Profile API (#47211). Going forward, we should use the API Ref template when we're adding new API docs. 

Elasticsearch reference reorg

You might have noticed that things are moving around within the Elasticsearch reference and that more information is being pulled in from the Stack Overview. For example, We recently moved the following pieces.

Running SLM retention on-demand

The ES team added a new API endpoint this week to run Snapshot Lifecycle Management (SLM) retention immediately, without scheduling it. We submitted a PR that allows users to perform this action through the UI. This may be useful to perform a manual invocation or to perform a one-off cleanup. This is a continuation of the work she started a few weeks ago that added the ability to configure retention when creating a SLM policy.

Console improvements

We fixed a 2-year-old bug in Console in which GET requests with bodies were forwarded to Elasticsearch as POST requests. This resulted in requests creating documents instead of returning search results. The solution involved migrating the logic which mediates these requests off the Hapi library.

We also fixed some important UI issues in Console. This included a bug in Safari where Console did not render properly, and a customer bug reported around the misalignment of the Console menu controls. Additionally, we continue to work on the Console refactor: we have the first iteration of Console autocomplete without directly consuming the Ace editor working this week.

Apache Lucene

Better heuristic for range queries

The most important optimization we have for range queries consists of deciding whether it's more efficient to run the range query on points or doc values. Points work better when the range is the most selective filter of the query, as it can quickly identify all matching documents, but doc values work better when there is another filter that is more selective. This logic relies on figuring out an approximation of the number of matched documents for a range query.

Until now we were approximating the number of matched documents as an approximation of the number of matched values. However this can trigger bad decisions if you have multi-valued fields, as the number of matches might be over-estimated. Ignacio improved the heuristic by deriving an expected number of matching documents based on the number of matching values as well as the total number of documents and values.

Asynchronous caching

The query cache is getting the ability to cache filters on a different thread. While this doesn't help with query throughput since the same amount of work needs to be done in total, it might help with latency in case not all search threads are already busy.

Searching a single index in parallel 

The way that Lucene can distribute searching a single index across multiple threads in order to improve latency is receiving a lot of attention these days. One thing that makes it challenging is that recent optimizations like block-max WAND rely on information about previously collected documents in order to skip more documents. So if different subsets of the index get collected independently, each thread needs to do the work of learning what is the minimum score that a document must have to be competitive independently. Jim and Atri are looking into sharing information between threads in order to make things more efficient, which is already proving helpful.

Other

  • We are fixing NRTSuggester so that it wouldn't miss suggestions. This can happen today because the suggestion collector sometimes needs to ignore suggestions for deduplication purposes, but it doesn't let the suggester know about it so the suggester might stop collecting while not enough suggestions have been collected yet.
  • A simple short circuit helped improve exact seeks in the terms dictionary significantly.
  • We wonder whether we should encode the type of triangle when indexing shapes in order to speed up query evaluation.
  • We are looking at improving the splitting decisions of our KD tree by taking into account the fact that splitting on one dimension may affect the range of the values of other dimensions.

Changes in Elasticsearch

Changes in 7.5:

  • Allow setting validation against arbitrary types #47264
  • Remove groovy test code from buildSrc #47416
  • Add API to execute SLM retention on-demand #47405
  • Fix highlighting of overlapping terms in the unified highlighter #47227
  • Use optype CREATE for single auto-id index requests #47353
  • Clarify missing java error message #46160
  • Remove empty buildSrc subproject #47415
  • SQL: Implement DATE_PART function #47206
  • Cancel recoveries even if all shards assigned #46520
  • Provide better error when updating geo_shape field mapper settings #47281
  • Testclusters fix bwc #46740
  • Allow optype CREATE for append-only indexing operations #47169
  • Fail allocation of new primaries in empty cluster #43284

Changes in 7.4:

  • Fix dependency resolution conflict in SQL security integration tests #47531
  • Fix Rollover error when alias has closed indices #47148
  • Fix es.http.cname_in_publish_address Deprecation Logging #47451
  • Add client jar for mapper-extras #47430
  • Reset Token position on reuse in predicate_token_filter #47424
  • Fix alias field resolution in match query #47369
  • Fix AD realm additional metadata #47179
  • Omit writing index metadata for non-replicated closed indices on data-only node #47285

Changes in 6.8:

  • Limit number of retaining translog files for peer recovery #47414
  • Use VAULT_TOKEN environment variable if it exists #45525
  • SQL: wrong number of values for columns #42122
  • Add workaround for building docker on debian 8 #47106

Changes in Elasticsearch Hadoop Plugin

Changes in 7.4:

  • [DOCS] Add 7.4.0 release notes #1356

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.5:

  • Fix: don't check logging path if logging disabled #186
  • Support the new SHAPE data type #185
  • Provision and test against remote ES instance #184

Changes in Rally

Changes in 1.4.0:

  • Add support for OSNAME and ARCH variables in dist repo URLs. #781
  • BREAKING: Gather cluster-level metrics in driver #779
  • Expose race-id as command line parameter #778

Changes in Rally Tracks

  • Replace interval parameter with fixed_interval and calendar_interval #85