This Week in Elasticsearch and Apache Lucene - 2019-05-05

Elasticsearch

Nightly Benchmarks

We've created a new benchmarking environment serving default distribution benchmarks and migrated https://elasticsearch-benchmarks.elastic.co. You'll notice that for each of the datasets, you can select either OSS or Default to follow benchmarking results for each distribution of Elasticsearch.

Documentation Improvements

After helping with a couple support tickets related to configuring synonyms, we've added documentation around how to best use synonyms with token filters such as word_delimiter that produce stacked tokens.

We are also reworking the docs for the discovery-ec2 plugin.

Geo_line Aggregation

We recently opened the geo_line PR.  This is an aggregation which consumes a series of points and a sort value (e.g. time) and sorts those points into a linestring.  An example use-case is GPS coordinates logged periodically by taxis, container ships, etc.  These individual points are much more useful when arranged chronologically in a line. Here's what that looks like:

Geo Lines

Community PR's

We love it when our community submits pull requests to Elasticsearch! Thank you to all of our contributors, past and present. Here are a few recent community pull requests:

We reviewed a community PR to reject port ranges in discovery.seed_hosts, which were previously accepted but silently ignored by Elasticsearch.

We're reviewing a community PR that is adding the index name to cluster block exceptions. These exceptions are triggered for example when trying to write to an index with a read-only cluster block, which is automatically added by the system when nodes holding this index are running low on disk space.

We're reviewing a community PR that allows running _cluster/reroute commands even if the maximum number of retries limit for failed shard allocations has been reached.

Data Replication Resiliency

We've updated the resiliency status page, closing off two important issues that were addressed by a multi-year effort, starting with the sequence numbers project in the 5.x and 6.x series, and culminating in the release of 7.0 with the new cluster coordination subsystem:

  • Documents indexed during a network partition cannot be uniquely identified: We have switched optimistic concurrency control (OCC) from the _version field to the new _seq_no (sequence number) and _primary_term (primary term) fields, which do uniquely identify each operation. To be clear, the _version field continues to not uniquely identify a particular version of a document; if you need to do this then you should move to using the _seq_no and _primary_term fields. All internal consumers that are making use of OCC (e.g. reindex, update-by-query, ...) have been switched to these new fields as well.
  • Replicas can fall out of sync when a primary shard fails: After a primary failover, the new primary now realigns the replicas with itself by rolling back the replicas to a safe point in the history and sending it the missing operations.

We've recently also worked on more extensive testing in these areas. We added tests for our new optimistic concurrency control structures (if_seq_no and if_primary_term), checking linearizability of compare-and-set operations. Based on this work, we learned quite a bit about the guarantees that the system provides under certain failure conditions, and will look at strengthening some of those guarantees. This work also helped in uncovering bugs. We also fixed a problem where a shard that was being closed during a replica rollback was keeping an active index writer around, causing various follow-up checks to fail.

We've added stronger consistency checks to our disruption test suite to verify that, beside the existing checks on _seq_no and _primary_term fields, also _source and _version fields were fully aligned across all shard copies to make sure that at the end of each disruption test all shard copies would contain exactly the same set of documents.

Token Service

Our long running change to move security tokens into their own index (and out of the main security index) has been merged. This has been a pretty big effort as we want the change to happen automatically when you upgrade the cluster to a new (7.2+) version, but not happen during a mixed cluster (when some nodes are on <7.2), and for it to seamlessly accept any active tokens that were created on the old cluster version and are stored in the main security index.

This is part of our long running effort to make it easier and safer to backup & restore the security index.

We've made a range of changes to the token service for 7.2, some of which forced us to change the format of the tokens that we provide to client and how we stored them in the index. We are tackling the last part of those changes which is to change the way we store token strings in the index so that getting read access to the tokens index does not allow you to authenticate with someone else's token. (All previously released versions of ES had this protection, but the way that was implemented didn't fit with the changes we were making in 7.2, so we removed it during this development cycle and are implementing a different solution before feature freeze for 7.2).

Enrich Processor

The Enrich Processor (formerly referred to as the "Lookup Processor") will allow users to define ingest pipelines that enrich the ingested document with data from another index on the cluster (subject to some limitations). There is still a lot of work left to do, but the core pieces are are coming together, which include data-local Lucene queries, a policy runner, and the REST APIs. If you'd like to follow our progress or provide feedback, feel free to check out the meta issue.

We've merged the first iteration of the enrich processor to the feature branch. We've also merged the first iteration of the enrich policy runner. The enrich policy runner is the background task that reads the enrich policy and synchronizes the source index to the specialized .enrich index. Under the covers this is implemented with the re-index API and will eventually get support for a cron scheduler. Finally, we added an API to list enrich policies and have started the work on the _execute API to allow a user to manually run a policy.

Snapshot Lifecycle Management

The first cut of the SLM documentation and an API path change from /ilm/ to /slm/ have now been merged to the feature branch. We've also introduced two new roles: manage_slm and read_slm to allow configuration for more fine-grained permissions. Finally, we've started the work to store the results from SLM's snapshot creation to a dedicated history index. This will allow us to set up alerts and have a history of failed/successful snapshots.

Lucene

Apache Lucene / Solr 8.1

The release branch for Apache Lucene / Solr 8.1 has been cut and the release process has started. We await the first RC later this or early next week. For Lucene in particular this will bring:

  • a new BKD tree strategy for segment merging providing significant performance boost for high dimensions
  • the new Luke module
  • new query visitor API allowing to traverse a query tree efficiently
  • read time attributes that allow to control codec level functionality on a per reader basis for instance to load FSTs per field off-heap.

Other

  • We're working on the Luwak codebase with Lucene to prepare the donation of Luwak to Lucene.
  • Can we improve search performance by sorting the segments by an estimated number of hits?
  • Some of our JDK 11 builds are hitting a JVM Bug that's fixed in 12 but not in 11.
  • JDK 12 doesn't seem to be bug free either - lucene is hitting this bug frequently.
  • We are still discussing how we can slice up segments better for parallel search
  • One persons bug is another persons feature...
  • Spooky failures are actually bugs sometimes.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Update TLS ciphers and protocols for JDK 11 #41385
  • BREAKING: Parse empty first line in msearch request body as action metadata #41011
  • Suppress illegal access in plugin install #41620
  • Fix: added missing skip #41492

Changes in 7.2:

  • Improve error message for ln/log with negative results in function score #41609
  • Add details to BulkShardRequest#getDescription() #41711
  • Amend prepareIndexIfNeededThenExecute for security token refresh #41697
  • Implement Bulk Deletes for GCS Repository #41368
  • Security Tokens moved to a new separate index #40742
  • Simplify initialization of max_seq_no of updates #41161
  • Handle WRAP ops during SSL read #41611
  • Upgrade to Netty 4.1.35 #41499
  • Close and acquire commit during reset engine fix #41584

Changes in 7.0:

  • Run packaging tests on RHEL 8 #41662
  • Fix multi-node parsing in voting config exclusions REST API #41588

Changes in 6.8:

  • Fix for full cluster restart tests #41723
  • Drop distinction in entries for keystore #41701
  • Fix Watcher deadlock that can cause in-abilty to index documents. #41418

Changes in 6.7:

  • Bump the bundled JDK to 12.0.1 #41627
  • Change JDK distribution source #41626

Changes in Elasticsearch Management UI

Changes in 7.0:

  • [ILM] Surface shrink action in edit form if it's already been set on the policy #35987

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.2:

  • Add JDBC's protocol tests as integration tests #149

Changes in 6.7:

  • Consider interval's precision. Allow non-aligned period values as interval encoding #148