9 novembre 2018

This Week in Elasticsearch and Apache Lucene - 2018-11-09

Par

•

•

•

•

•

Colin Goodheart-Smithe

Elasticsearch

OpenID Connect Realm

We are beginning work on OpenID Connect, an authentication layer that is built on top of OAuth2, which is an authorization protocol. We have summarized implementation plan in a meta issue.

Disabling hit counts by default

We will be disabling hit counts on search request by default in 7.0. This means that we will not track the total hits on searches by default. Tracking total hits creates quite a lot of overhead to search requests and if we do not track total hits we can take advantage of some big improvements in Lucene where entire blocks of documents can be skipped if their results are not competitive. In some cases the search can be returned 1000s of times faster with these optimisations.

There are a few important points to note here. Users can still opt to track the total hits and will have two options for this:

Accurately track total hits - This option will provide an accurate count for the total hits as we have today but with the trade off that we will not be able to use any of the above mentioned optimisations so the user will not benefit from the search performance boosts.
Track total hits up to a user defined value - With this option the user will specify the lower bound on the number of hits Elasticsearch should track in the search request. This means that if the user sets this value to 1000, Elasticsearch will accurately track the total hits (and therefore not use any of the performance optimisations) until it has seen 1000 hits (or until the search is complete if there are less than 1000 documents that match the query). When Elasticsearch has seen 1000 hits it will switch to using the performance optimisations and the total hits returned in the response will be a lower bound on the true total hits.

The exact request and response changes we intend to make are still being worked out but we have raised a PR that shows one option of how this might work.

Closed & frozen indices

We started to implement our plan for transitioning an open index to closed. The goals of this transition are to ensure that new indexing requests are rejected while the existing data is properly flushed to disk, so that reopening the shards of this replicated closed index will not require a recovery from translog. This API will also be crucial for frozen indices, as frozen indices will rely on proper closing of the index first. Moving an index to frozen will internally consist of three steps

Moving it to closed using the above functionality
marking the index as frozen
reopening it again, this time as frozen.

Going through the close/open cycle allows us to switch out the “engine" implementation, which is the part of the shard that is responsible for executing the actual searches. We have implemented steps 2 and 3 with an API to mark a closed index as frozen and by adding a frozen engine implementation which allows lazily opening and releasing search resources by wrapping searchers in a LazyDirectoryReader that also allows to release and reset the underlying index readers after any and before secondary search phases.

CCR and Networking

Cross-cluster replication will potentially ship large volumes of data all across the globe. To save on network bandwidth, we would like to allow compressing remote-cluster traffic independently of within-cluster traffic. We have introduced a namespaced setting for compression that allows users to configure compression on a per remote cluster basis. Tim has also changed the number of threads we use for network IO. The netty-based transport was using 2 * number of core threads both per client and per server profile. With this change, this modifies the netty transport to use 2 * number of core threads for the entire transport, aligning it with how the nio transport already works.

Changes

Changes in 5.6:

Fix DeleteRequest validation for nullable or empty id/type #35314

Changes in 6.5:

SQL: Handle null literal for AND and OR in WHERE #35236
Engine.newChangesSnapshot may cause unneeded refreshes if called concurrently #35169
watcher: Fix integration tests to ensure correct start/stop of Watcher #35271
Scripting: Add back lookup vars in score script #34833
Use soft-deleted docs to resolve strategy for engine operation #35230
Ignore date ranges containing now when pre-processing a percolator query #35160
SQL: Improve CircuitBreaker logic for SqlParser #35300
Register Azure max_retries setting #35286
SQL: Fix null handling for AND and OR in SELECT #35277
SQL: Introduce NotEquals node to simplify expressions #35234
Do not alloc full buffer for small change requests #35158
SQL: Fix null handling for IN painless script #35124

Changes in 6.6:

HLRC: Add InvalidateToken security API #35114
Preserve format when aggregation contains unmapped date fields #35254
Allow unmapped fields in composite aggregations #35331
Remove ALL shard check in CheckShrinkReadyStep #35346
[ILM] Check shard and relocation status in AllocationRoutedStep #35316
Add a frozen engine implementation #34357
Put a fake allocation id on allocate stale primary command #34140
Apply ignore_throttled also to concrete indices #35335
[CCR] Added HLRC support for pause follow API #35216
HLRC: add support for the clear realm cache API #35163
Fix UpdateRequest.fromXContent #35257
SQL: Upgrade jline to version 3.8.2 #35288
SQL: new SQL CLI logo #35261
ingest: dot_expander_processor prevent null add/append to source document #35106
Prevent throttled indices to be searched through wildcards by default #34354
SQL: Introduce Coalesce function #35253
[Scripting] Make Max Script Length Setting Dynamic #35184
Add dedicated step for checking shrink allocation status #35161
[Monitoring] Add cluster metadata to cluster_stats docs (#33860) #34023
Upgrade 6.x to lucene-7.6.0-snapshot-f9598f335b #35225
Remove Joda usage from ILM #35220
HLRC: Add ML API PUT filter #35175
Small corrections to HLRC doc for _termvectors #35221
Add document _count API support to Rest High Level Client. #34267
Adds Index lifecycle feature #35193

Changes in 7.0:

Make limit on number of expanded fields configurable #35284
BREAKING: Logfile auditing settings remove after deprecation #35205
Watcher: Ignore system locale/timezone in croneval CLI tool #33215
Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35202

Apache Lucene

Lucene 7.6

We are chasing the last blockers for the release.

Reduce reads on sparse doc values

In Lucene 7 sparse doc values uses a block encoding that requires to read block headers when advancing to a random document. This issue tries to reduce these reads with an additional data structure that indexes the start of each block and allows to jump forward more efficiently. The patch is targeting 8.0 at the moment but there are also discussions to backport it to 7x.

Other:

We added a support for a LatLonShapeQuery that allows to query LatLonShape fields by arbitrary lines.

We merged a user patch that fixes a performance bug in field infos merging.

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2018-11-09

Elasticsearch

OpenID Connect Realm

Disabling hit counts by default

Closed & frozen indices

CCR and Networking

Changes

Apache Lucene

Lucene 7.6

Reduce reads on sparse doc values

Other:

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS