This Week in Elasticsearch and Apache Lucene - 2018-06-01

Elasticsearch

Cross Cluster Replication

The primary focus last week was on benchmarking, specifically getting to a point where we index full speed to an index in one region and the follower index in another region keeps up. If you remember, in last week update we reported indexing a 30GB data set and that was fully replicated but it took ~11 hours to complete. After increasing the number of workers and the batch size, we can now do a full run in 1h10m, which is roughly what it takes to index the data. From here we proceed to formalize our benchmarking infra. The plan is to introduce multiple data types, multiple platforms, nightly runs and more.

On the API side, we have merged a PR to convert an existing index to a follower, and also added support for syncing the mappings between the leader and following index. The latter is important for use cases like logging where fields can be introduced during indexing. Without syncing those fields, we can't index the new data on the leader.

Improvements in Cross Cluster Search

We have merged a PR that ensures we do not use dedicated master nodes as the gateway node for the remote cluster. This is an important enhancement as it ensure we do not stress remote master nodes by making them coordinate the remote part of a cross cluster search.

Search Templates now throw 400 (bad request) on a ScriptException

We have merged a PR which changes the status code returned when a ScriptException is thrown from a 500 (Internal Server Error) to a 400 (Bad Request). This means that if the user has a bug in their script, we will report that request is invalid and the user should correct the request, rather than returning a 500, which would incorrectly indicate a server error.

SQL

Work continues on the ODBC driver, which allows connectivity between BI tools like Tableau to Elasticsearch. This week, we have been working on data conversion; the date, time and timestamp conversion have been implemented, interval and GUID types are handled as well (proper rejection codes provided).

Changes in 5.6:

  • Fsync state file before exposing it #30929

Changes in 6.3:

  • [DOCS] Reset edit links #30909
  • Do not serialize basic license exp in x-pack info #30848

Changes in 6.4:

  • Harmonize include_defaults tests #30700
  • Refactor Sniffer and make it testable #29638
  • Deprecate accepting malformed requests in stored script API #28939
  • REST high-level client: add synced flush API (2) #30650
  • Reuse expiration date of trial licenses #30950
  • Transport client: Don’t validate node in handshake #30737
  • HLRest: Allow caller to set per request options #30490
  • Deprecates indexing and querying a context completion field without context #30712
  • Cross Cluster Search: do not use dedicated masters as gateways #30926
  • Fix AliasMetaData#fromXContent parsing #30866
  • Add Verify Repository High Level REST API #30934
  • BREAKING: Include size of snapshot in snapshot metadata #29602
  • Add missing_bucket option in the composite agg #29465
  • stable filemode for zip distributions #30854
  • Fix IndexTemplateMetaData parsing from xContent #30917
  • Limit the scope of BouncyCastle dependency #30358
  • Move list tasks API under tasks namespace #30906
  • SQL: Remove the last remaining server dependencies from jdbc #30771
  • Verify signatures on official plugins #30800
  • Fix bad version check writing Repository nodes #30846

Changes in 7.0:

  • Remove version read/write logic in Verify Response #30879
  • BREAKING: Core: Remove RequestBuilder from Action #30966
  • Add “took” timing info to response for _msearch/template API #30961
  • Change ScriptException status to 400 (bad request) #30861
  • BREAKING: Include size of snapshot in snapshot metadata #30890
  • Change BWC version for VerifyRepositoryResponse #30796

Lucene

Soft deletes

Soft deletes were initially implemented on top of the Lucene index, but recent changes have tightened the integration with IndexWriter. We've started discussions about preventing from changing the field that is used to record soft deletes and recording the number of soft deletes into segment infos.

Other