17 August 2018

This Week in Elasticsearch and Apache Lucene - 2018-08-17

By Adrien GrandColin Goodheart-SmitheTom CallahanJason TedorJay ModiYannick Welsch

Elasticsearch

Cross-cluster replication

We have continued working on adding long polling for global checkpoints, so that a follower cluster does not need to periodically poll the leader to ask if there are new changes that the follower does not have. To this end, we have extended the global checkpoint component to allow registering listeners that will be be notified when the global checkpoint is updated. This will be used in CCR for a follower to make a remote request to the leader that attaches a listener to the global checkpoint, and is then called back when the global checkpoint advances.

New dense vector field and cosine similarity in the works

We have started working on a new field type that will allow users to store dense vectors in Elasticsearch. Storing dense vectors can be useful for users wanting to augment relevance with data for models generated outside of Elasticsearch. This would be paired with a new query which would apply cosine similarity scoring on top of this field. It's very early days in this project and we are mostly exploring what’s possible and what role Elasticsearch should play in this so there are not yet any PRs or concrete plans for exactly what and when.

Structured Audit Logging

Our work structured audit logging is now ready for review after a tough fight with tests.

Zen2

Master elections in Zen2 will be significantly faster than in current Elasticsearch releases. Today when a cluster is without a master, each node undertakes a master-eligible node discovery and election process. This is done in a series of rounds, each taking three seconds.

In Zen2 the "pinging" or node discovery phase is decoupled from the election process. Rather than exposing the set of discovered nodes at the end of a three-second round, we make the current set of discovered nodes available continuously, allowing us to start an election as soon as we discover enough master-eligible nodes. We are also more enthusiastically reciprocal in our discovery of other nodes: if an unknown node pings us then we ping it straight back, possibly learn about new nodes, start pinging those, and so on, which means that nodes typically converge on a complete picture of the cluster very quickly. The election process in Zen2 is triggered by a single node reaching out and requesting votes from the other master-eligible nodes, rather than waiting for each node to independently decide to join the best master it can find. Doing this correctly is a little tricky because if multiple nodes decide to request votes at a similar point in time, then it's unlikely for any of them to gather enough votes to become elected. Similar as in other well-known algorithms, we probabilistically protect against split votes by introducing a random wait on each node before starting an election. We optimistically start with a short wait and adapt by backing off in case of failed elections. We expect this to result in master elections that will take a couple of 100 milliseconds at most, significantly faster than the 3 seconds that we have today. Nodes joining a cluster with an already well-established elected master will also take advantage of the new approach. Today a node performs a full three-second pinging round before joining an existing master. In Zen 2, the node will join the master immediately upon discovery.

Watcher scalability

We have some users that push hard on the scalability of Watcher. One case that arose recently depends on the fact that when a watch is executed, Watcher will perform single-document operations to manage the triggered watches, watch status, and watch history. If you are executing a lot of watches, these single-document operations can pile up. We opened a pull request to switch to using bulk requests to execute these operations. The defaults will still execute single-document operations. For users not pushing on the limits of Watcher, these defaults make sense so that a node outage does not lead to lost watch history, or watches executing again because a triggered watch was not pruned. Yet, the knob is now there for users that need it.

Fixing role queries that match nested documents

We fixed a bug in the way role queries are applied when nested fields are used. We use the role query to restrict documents for a specific user. In 6.3 we added the support for nested documents but it introduced a bug when the role query is defined in a way that can match nested documents as well as root documents (only excluding terms for instance). To fix this bug we now restrict the role query to root documents only and adds the nested documents in a different clause.

Changes

Changes in 6.3:

  • Update WordPress plugins links #32194

Changes in 6.4:

  • Guard against null in email admin watches #32923
  • Remove client connections from TcpTransport #31886
  • Mute security-cli tests in FIPS JVM #32812
  • Unmute WildFly tests in FIPS JVM #32814
  • [Kerberos] Add debug log statement for exceptions #32663

Changes in 6.5:

  • Watcher: migrate PagerDuty v1 events API to v2 API #32285
  • Introduce global checkpoint listeners #32696
  • BREAKING: Use generic AcknowledgedResponse instead of extended classes #32859
  • Aggregations/HL Rest client fix: missing scores #32774
  • High Level REST Client: migration get assistance API #32744
  • High Level REST Client: Add Delete License API #32586
  • INGEST: Create Index Before Pipeline Execute #32786
  • Fix NOOP bulk updates #32819

Changes in 7.0:

  • BREAKING: INGEST: Add Configuration Except. Data to Metdata #32322
  • BREAKING: Introduce the dissect library #32297
  • Core: Add java time version of rounding classes #32641


Lucene