This Week in Elasticsearch and Apache Lucene - 2018-11-09

Elasticsearch

OpenID Connect Realm

We are beginning work on OpenID Connect, an authentication layer that is built on top of OAuth2, which is an authorization protocol. We have summarized implementation plan in a meta issue.

Disabling hit counts by default

We will be disabling hit counts on search request by default in 7.0. This means that we will not track the total hits on searches by default. Tracking total hits creates quite a lot of overhead to search requests and if we do not track total hits we can take advantage of some big improvements in Lucene where entire blocks of documents can be skipped if their results are not competitive. In some cases the search can be returned 1000s of times faster with these optimisations.

There are a few important points to note here. Users can still opt to track the total hits and will have two options for this:

  1. Accurately track total hits - This option will provide an accurate count for the total hits as we have today but with the trade off that we will not be able to use any of the above mentioned optimisations so the user will not benefit from the search performance boosts.
  2. Track total hits up to a user defined value - With this option the user will specify the lower bound on the number of hits Elasticsearch should track in the search request. This means that if the user sets this value to 1000, Elasticsearch will accurately track the total hits (and therefore not use any of the performance optimisations) until it has seen 1000 hits (or until the search is complete if there are less than 1000 documents that match the query). When Elasticsearch has seen 1000 hits it will switch to using the performance optimisations and the total hits returned in the response will be a lower bound on the true total hits.

The exact request and response changes we intend to make are still being worked out but we have raised a PR that shows one option of how this might work.

Closed & frozen indices

We started to implement our plan for transitioning an open index to closed. The goals of this transition are to ensure that new indexing requests are rejected while the existing data is properly flushed to disk, so that reopening the shards of this replicated closed index will not require a recovery from translog. This API will also be crucial for frozen indices, as frozen indices will rely on proper closing of the index first. Moving an index to frozen will internally consist of three steps

  1. Moving it to closed using the above functionality
  2. marking the index as frozen
  3. reopening it again, this time as frozen.

Going through the close/open cycle allows us to switch out the “engine" implementation, which is the part of the shard that is responsible for executing the actual searches. We have implemented steps 2 and 3 with an API to mark a closed index as frozen and by adding a frozen engine implementation which allows lazily opening and releasing search resources by wrapping searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases.

CCR and Networking

Cross-cluster replication will potentially ship large volumes of data all across the globe. To save on network bandwidth, we would like to allow compressing remote-cluster traffic independently of within-cluster traffic. We have introduced a namespaced setting for compression that allows users to configure compression on a per remote cluster basis. Tim has also changed the number of threads we use for network IO. The netty-based transport was using 2 * number of core threads both per client and per server profile. With this change, this modifies the netty transport to use 2 * number of core threads for the entire transport, aligning it with how the nio transport already works.

Changes

Changes in 5.6:

  • Fix DeleteRequest validation for nullable or empty id/type #35314

Changes in 6.5:

  • SQL: Handle null literal for AND and OR in WHERE #35236
  • Engine.newChangesSnapshot may cause unneeded refreshes if called concurrently #35169
  • watcher: Fix integration tests to ensure correct start/stop of Watcher #35271
  • Scripting: Add back lookup vars in score script #34833
  • Use soft-deleted docs to resolve strategy for engine operation #35230
  • Ignore date ranges containing now when pre-processing a percolator query #35160
  • SQL: Improve CircuitBreaker logic for SqlParser #35300
  • Register Azure max_retries setting #35286
  • SQL: Fix null handling for AND and OR in SELECT #35277
  • SQL: Introduce NotEquals node to simplify expressions #35234
  • Do not alloc full buffer for small change requests #35158
  • SQL: Fix null handling for IN painless script #35124

Changes in 6.6:

  • HLRC: Add InvalidateToken security API #35114
  • Preserve format when aggregation contains unmapped date fields #35254
  • Allow unmapped fields in composite aggregations #35331
  • Remove ALL shard check in CheckShrinkReadyStep #35346
  • [ILM] Check shard and relocation status in AllocationRoutedStep #35316
  • Add a frozen engine implementation #34357
  • Put a fake allocation id on allocate stale primary command #34140
  • Apply ignore_throttled also to concrete indices #35335
  • [CCR] Added HLRC support for pause follow API #35216
  • HLRC: add support for the clear realm cache API #35163
  • Fix UpdateRequest.fromXContent #35257
  • SQL: Upgrade jline to version 3.8.2 #35288
  • SQL: new SQL CLI logo #35261
  • ingest: dot_expander_processor prevent null add/append to source document #35106
  • Prevent throttled indices to be searched through wildcards by default #34354
  • SQL: Introduce Coalesce function #35253
  • [Scripting] Make Max Script Length Setting Dynamic #35184
  • Add dedicated step for checking shrink allocation status #35161
  • [Monitoring] Add cluster metadata to cluster_stats docs (#33860) #34023
  • Upgrade 6.x to lucene-7.6.0-snapshot-f9598f335b #35225
  • Remove Joda usage from ILM #35220
  • HLRC: Add ML API PUT filter #35175
  • Small corrections to HLRC doc for _termvectors #35221
  • Add document _count API support to Rest High Level Client. #34267
  • Adds Index lifecycle feature #35193

Changes in 7.0:

  • Make limit on number of expanded fields configurable #35284
  • BREAKING: Logfile auditing settings remove after deprecation #35205
  • Watcher: Ignore system locale/timezone in croneval CLI tool #33215
  • Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35202


Apache Lucene

Lucene 7.6

We are chasing the last blockers for the release.

Reduce reads on sparse doc values

In Lucene 7 sparse doc values uses a block encoding that requires to read block headers when advancing to a random document. This issue tries to reduce these reads with an additional data structure that indexes the start of each block and allows to jump forward more efficiently. The patch is targeting 8.0 at the moment but there are also discussions to backport it to 7x.

Other:

We added a support for a LatLonShapeQuery that allows to query LatLonShape fields by arbitrary lines.

We merged a user patch that fixes a performance bug in field infos merging.