This Week in Elasticsearch and Apache Lucene - 2019-02-01 | Elastic Blog

This Week in Elasticsearch and Apache Lucene - 2019-02-01

Elasticsearch

TLS

In preparation for the goal of enabling TLS for basic licenses in a future release, We have made a breaking change for 7.0 in our heuristics for automatically enabling security on trial licenses. These heuristics were added in 6.3.0 in response to including security with the default distribution. In 6.3+, if xpack.security.transport.ssl.enabled is set to true but xpack.security.enabled was not explicitly set to true, security would be turned on because we inferred that the user wants to use security. In 7.0+, you need to explicitly set xpack.security.enabled:true when using a trial license.

In preparation for TLSv1.3 support, we updated the httpclient libraries that we use. During the process of upgrading an upstream bug was found. We have also opened a PR that will enable TLSv1.3 by default if the JDK supports it.

Packaging

As part of our effort to bundle a JDK with Elasticsearch, we have introduced platform specific builds for Windows, MacOS and Linux and finally bundle an OpenJDK distribution for those platform-specific artifacts.

Logging

Elasticsearch 7.0.0 will now emit logs as JSON; We have merged 36833 this week. This enables Beats to ingest Elasticsearch logs and Monitoring to show them in a nice UI. For backwards compatibility reasons, Elasticsearch 7.0.0 will emit logs in the old format and in JSON.

Sequence numbers

We moved all internal consumers of internal versioning in our codebase to sequence-number based CAS in order to deprecate usage of interval versioning for optimistic concurrency control. We have provided an option to expose sequence number and primary terms in search responses when seq_no_primary_term=true is specified on the search request, and added seq# based optimistic concurrency control to update requests. With this, we moved update- and delete-by-query as well as Watcher to use seq# and primary term for optimistic concurrency control.

TLA

Our use of TLA+ has made it to the industrial use cases section of Leslie Lamport's official TLA+ website mentioning the TLA+ model that we developed and which uncovered a bug in our implementation.

Ingest Node

The user_agent processor now has an "ecs" option for 6.7. This will change the output format to conform to the schema described by ecs. This will be the default (and only) option in 7.0. #37727, #37984

Geo

The new geotile_grid agg was merged, and will be available in 7.0! The new tiles are nice for UIs because, as you zoom, they evenly subdivide the grid (one square tile becomes four smaller tiles, aligned on the boundary of the larger tile).

We opened a PR which clearly documents the current limitations of the new BKD shapes, namely lacking CONTAINS relation and multi-point query support.

Cross Cluster Search

We merged a new execution mode for cross cluster search. This mode minimizes the roundtrip between the clusters by sending a single request per cluster (instead of one per shard in the remote cluster). It will be activated by default in 7.0 but users can opt out by setting to false the new parameter css_minimize_roundtrips if they want to force the current behavior.

Vector field

We added two scoring functions that can be used with dense and sparse vector fields.

These functions (cosine similarity and dot product) are available through painless and provide a simple way to compute the similarity or distance between two vectors.

Console

We updated the console autocomplete specs to reflect the deprecation of the _xpack namespace in the APIs. We are also working on a big refactor/overhaul of console, moving it to React/EUI and investigating changing the editor from brace to Monaco.

Changes in Elasticsearch

Changes in 7.0:

  • fix a few versionAdded values in ElasticsearchExceptions 37877
  • BREAKING: Use mappings to format doc-value fields by default. 30831
  • BREAKING: Elasticsearch json logging 36833
  • Fail start on obsolete indices documentation 37786
  • Introduce ability to minimize round-trips in CCS 37828
  • BREAKING: Add ECS schema for user-agent ingest processor (#37727) 37984
  • Fix size of rolling-upgrade bootstrap config 38031
  • Deprecate minimummasternodes 37868
  • Switch default time format for ingest from Joda to Java for v7 37934
  • Geo: Fix Empty Geometry Collection Handling 37978
  • Enforce cluster UUIDs 37775
  • Restore a noop _all metadata field for 6x indices 37808
  • Add OS/architecture classifier to distributions 37881
  • BREAKING: Remove implicit index monitor privilege 37774
  • Step down as master when configured out of voting configuration 37802
  • Support merge nested Map in list for JIRA configurations 37634
  • deprecate types for watcher 37594

Changes in 6.7:

  • Fix ILM status to allow unknown fields 38043
  • Speed up converting of temporal accessor to zoned date time 37915
  • Update verify repository to allow unknown fields 37619
  • Log document id when MapperParsingException occurs 37800
  • Fix exit code for Security CLI tools 37956
  • HLRC: Fix strict setting exception handling 37247
  • Ingest node - user_agent, move device parsing to an object 38115
  • Soft-deletes policy should always fetch latest leases 37940
  • Correct argument names in update mapping/settings from leader 38063
  • Treat put-mapping calls with _doc as a top-level key as typed calls. 38032
  • Handle scheduler exceptions 38014
  • Fix Painless void return bug 38046
  • Ignore shard started requests when primary term does not match 37899
  • SQL: Implement FIRST/LAST aggregate functions 37936
  • Move update and delete by query to use seq# for optimistic concurrency control 37857
  • [API] spelling: java script (not JavaScript) 37057
  • Introduce ssl settings to reindex from remote 37527
  • Move watcher to use seq# and primary term for concurrency control 37977
  • Reduce flakiness of ccr recovery timeouts test 38035
  • ILM setPriority corrections for a 0 value 38001
  • Skip Shrink when numberOfShards not changed 37953
  • Fix ILM Lifecycle Policy to allow unknown fields 38041
  • Expose retention leases in shard stats 37991
  • Fix limit on retaining sequence number 37992
  • Inject Unfollow before Rollover and Shrink 37625
  • Give precedence to index creation when mixing typed templates with typeless index creation and vice-versa. 37871
  • Streamline S3 Repository- and Client-Settings 37393
  • Fix fetch source option in expand search phase 37908
  • Added ccr to xpack usage infrastructure 37256
  • Get Aliases with wildcard exclusion expression 34230
  • Add Seq# based optimistic concurrency control to UpdateRequest 37872
  • Close Index API should force a flush if a sync is needed 37961
  • Issue deprecation warning if TLSv1.0 is used without explicit config 37788
  • Sync retention leases on expiration 37902
  • Create snapshot role 35820
  • Introduce retention lease syncing 37398

Changes in 6.6:

  • SQL: Added SSL configuration options tests 37875
  • SQL: Skip the nested and object field types in case of an ODBC request 37948

Changes in Elasticsearch Management UI

Changes in 7.0:

  • Update xpack console specs 29506

Changes in 6.7:

  • Put back API integration tests for cross-cluster replication 29494
  • making badges clickable to filter in index management 29635
  • copy edit for set_priority action in ilm UI 29540

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.0:

  • Remove xpack reference 103

Changes in 6.7:

  • Allow disabling the cursor page size limit 105
  • Support for querying current user and catalog 104
  • Close cursor 102
  • DATE type initial support 100
  • CMake configuration for driver definition file 101

Changes in Rally

Changes in 1.0.4:

  • Ignore JSON logs for merge parts analysis 637
  • Correct recorder-based sampling interval 638

Apache Lucene

Boosting Synonyms

We updated the patch to apply synonym boost also on impact in order to improve the computational precision the maximum score.

Geo Land

We fixed a bug in WITHIN queries for LatLonShape polygons that cross the dateline and added a new LatLonShapePointQuery for searching all indexed LatLonShape fields with a target point / set of points. Both are necessary steps in order to reach feature parity in elasticsearch with the old PrefixTree indexing approach.

We opened an issue in order to discuss new approach for segment merging on BKD trees (with dimension > 1). It's using radix selection at every node instead of sorting every dimension before hand. Results on the issue look pretty good so far.

Bugs Bugs Bugs

While integrating top-k hits retrieval in elasticsearch, we found a bug which occurs when a boolean query has sub clauses that report an infinite max score over the whole range of doc IDs but finite scores when asked for a more narrow range of doc IDs. This issue will / has been addressed in time for the 8.0 release

Backward compatibility policy

Lucene introduced hard checks that version N cannot read indices that have been created before version N-1. We proposes to relax this check so that it is still possible to open old indices if their segments are on a supported codec, with restrictions that still need to be defined. We reached agreement that we want to move forward exploring what it takes to relax this check in the near future.