This Week in Elasticsearch and Apache Lucene - 2018-11-16

Elasticsearch

Frozen/Closed Indices

We’ve added an optimization for frozen shards that allows executing “can_match" phases efficiently without opening the underlying index reader. This is particularly useful for time-based indices where older indices will typically be moved to frozen indices. Filtering frozen indices with date filters can then efficiently skip large portions of frozen shards that can't match the query in the pre-filter phase. We also added documentation for using frozen indices as well as the relevant _freeze and _unfreeze APIs. If you want to learn more about using frozen indices, give this one a read.

We’re continuing working on the close index API, making core changes to our replication and shard-level permit system to allow a clean transition from open to closed index.

Geo in SQL

We’ve been working on changes to make geo in Elasticsearch more robust and have now laid enough groundwork so we can expose the first GeoSQL functions in the SQL plugin. The first function is ST_WKTToSQL and can be used to convert geo-shapes represented in WKT (Well Known Text) to GeoShape objects that Elasticsearch understands.

Pre-hashed Password in PUT Users API

We added formal support for the password_hash field in the put users api. This came about because of a request to have a bulk API for creating users; we discussed the notion of a bulk API for users and did not feel this was the right direction. However, the request identified a valid concern with the throughput of the users API. We performed a cursory performance test with a naive, single-threaded python script to create users. With default options, we were able to create about 8.5 users per second, whereas with client-side hashing and refresh=false, we were able to create about 37 users per second. If an administrator had 5000 users to create, this would take the overall time from about 10 minutes to about 2.5 minutes.

Cross-Cluster Replication

We published the CCR getting started guide and overview docs for CCR beta. With 6.5 out of the door, our eyes are now all set on driving to a successful GA. We are moving the auto-follow logic (based on the auto-follow patterns) to long-polling. The first step of this effort is to add a new option to the cluster state API to wait, up to a timeout, for a given metadata version.

We’ve implemented a fix to address indexing slowdowns we observed in update- or refresh-heavy use cases and will be benchmarking the effects of this fix.

Tasks API + Security

A bug was found shortly after the 6.5.0 release in that the permissions supplied for the Kibana system role prevent full use of the tasks api. Ultimately, this boils down to the fact that we leak the implementation detail that tasks are stored in an index and the kibana system role doesn't grant access to this index. For 6.5.1, we added these permissions as a short term fix. As a temporary workaround, users may run Kibana as a user with the kibana_system role and a role that grants "create_index", "read", and "create" privileges on the .tasks index.

Audit Request ID

We opened a PR that will add a synthesized ID based on a user’s request to each event in the file audit log. This will allow users to correlate events such as “authentication success” and “access granted” for the same request.

Zen2

We are working towards running all existing integration tests and REST test suites using Zen2 and are very close to putting the last missing pieces into place for this.

We added the capability for safely down-scaling the number of master-eligible nodes in a cluster. With this, we can switch a substantial portion of our integration tests over to Zen2. We also introduced a bootstrapping process that will allow Elasticsearch to determine its own initial cluster configuration. At the moment this is controlled by a temporary node setting which works well for our REST tests (where we are sure to start the right number of nodes or die trying). Initial tests with this PR show that we can soon switch our REST tests over to Zen2.

We’ve also completed the most critical aspect of the cluster state persistence layer that will give us much stronger atomicity guarantees.

Finally, we implemented serialization compatibility between Zen1 and Zen2 transport actions, allowing a Zen2 node to join a fully-formed cluster with a Zen1 master and vice-versa. Follow-up work will focus on failure conditions and an automated transition from a Zen1 master to a Zen2 master, ultimately enabling a smooth rolling upgrade experience from 6.x to 7.0.

Changes

Changes in 6.5:

  • Grant .tasks access to kibana_system role #35573
  • Handle IndexOrDocValuesQuery in composite aggregation #35392
  • SQL: clear the cursor if nested inner hits are enough to fulfill the query required limits #35398
  • Correct implemented interface of ParsedReverseNested #35455
  • SQL: Fix query translation for scripted queries #35408
  • Upgrade to Joda 2.10.1 #35410

Changes in 6.6:

  • HLRC: migration api - upgrade #34898
  • [RCI] Check blocks while having index shard permit in TransportReplicationAction #35332
  • HLRC: Add parameters to stopRollupJob API #35545
  • Clean up XPackInfoResponse class and related tests #35547
  • Extract RunOnce into a dedicated class #35489
  • Suppress CachedTimeThread in hot threads output #35558
  • HLRC: Adding ML Update Filter API #35522
  • Add Delete Privileges API to HLRC #35454
  • HLRC: add support for get license basic/trial status API #33176
  • Formal support for "password_hash" in Put User #35242
  • Add stop rollup job support to HL REST Client #34702
  • [Rollup] Add wait_for_completion option to StopRollupJob API #34811
  • HLRC: Adding ml get filters api #35502
  • Rest HL client: Add watcher stats API #35185
  • HLRC support for getTask #35166
  • Allow efficient can_match phases on frozen indices #35431
  • [HLRC] Added support for CCR Put Follow API #35409
  • Handle OS pretty name on old OS without OS release #35453
  • Geo: enables coerce support in WKT polygon parser #35414
  • [HLRC] Add GetRollupIndexCaps API #35102
  • Fix the names of CCR stats endpoints in usage API #35438
  • Add a java level freeze/unfreeze API #35353

Changes in 7.0:

  • Replace usages of AtomicBoolean based block of code by the RunOnce class #35553

Apache Lucene

Lucene 7.6

Most blockers for Lucene 7.6 are resolved and we will cut the release branch soon.

Searching for geo points in a polygon got faster

A costly operation when running point-in-polygon queries is to check whether a line segment crosses a rectangle. We added a simple optimization that first checks whether either end of the segment belongs to the rectangle before running more costly computations. This triggered a 20% throughput improvement for one query that we use for benchmarking which searches for points in London.

Other