22 June 2018

This Week in Elasticsearch and Apache Lucene - 2018-06-22

By Tom CallahanColin Goodheart-SmitheAdrien GrandJay ModiBoaz Leskes

Elasticsearch

Highlights

Using dots in aggregation names will be deprecated

We will begin deprecating using dots in the names of aggregations. Dots are used as a separator in buckets_path, and ordering syntax aggregations with dots in their names has thrown up many bugs. Therefore, we intend to require that all aggregation names are free of dots. As this may affect a large number of users, we plan deprecate this ability in 6.4 but not prohibit dots in aggregation names until 8.0 to give users a enough time to modify their applications.

Rollups will use the “missing_bucket” option from the composite aggregation

We opened a PR to make rollups use the new “missing_bucket” on aggregations. This closes an important limitation with rollups returning bad doc counts when you have multiple non-overlapping schemas. One situation where this can occur is with different beats each having a different schema but residing in the same index. With this enhancement, the user should be able to configured a single "combined" job which is more convenient and returns correct doc counts.

Reloadable Secure Settings

We merged and backported the work for reloadable secure settings. This work allows for re-reading the secure settings stored in the keystore on each node. During this reload, if a component has registered to be notified about update to these settings, it will be notified. This initial work allows for the discovery-ec2, repository-s3, repository-azure, and repository-gcs plugins to update the clients they use that depend on secure settings.

Build Improvements

We are making good progress pushing our build forward. In particular, we’ve worked through a variety of issues in order to get our build working with Gradle 4.8 and JDK11 in a feature branch.

Changelog

Changes in 5.6:

  • Ensure we don’t use a remote profile if cluster name matches #31331

Changes in 6.3:

  • [DOCS] Omit shard failures assertion for incompatible responses #31430
  • Security: fix joining cluster with production license #31341

Changes in 6.4:

  • In NumberFieldType equals and hashCode, make sure that NumberType is taken into account. #31514
  • Remove QueryCachingPolicy#ALWAYS_CACHE #31451
  • Add Delete Snapshot High Level REST API #31393
  • Preserve response headers on cluster update task #31421
  • Multiplexing token filter #31208
  • Add get stored script and delete stored script to high level REST API #31355
  • Skip get_alias tests for 5.x #31397
  • Avoid sending duplicate remote failed shard requests #31313
  • Fix defaults in GeoShapeFieldMapper output #31302
  • RestAPI: Reject forcemerge requests with a body #30792
  • Use system context for cluster state update tasks #31241
  • REST high-level client: add validate query API #31077
  • Expose lucene’s RemoveDuplicatesTokenFilter #31275
  • Add ingest-attachment support for per document indexed_chars limit #31352

Changes in 7.0:

  • Reload secure settings for plugins #31383
  • lower rollover-info version bound to 6.4 #31414
  • extend is-write-index serialization support to 6.4 #31415
  • Choose JVM options ergonomically #30684
  • BREAKING: Packaging: Remove windows bin files from the tar distribution #30596
  • BREAKING: Percentile/Ranks should return null instead of NaN when empty #30460

Lucene

Lucene 7.4.0

The vote has passed, the release will be announced shortly. Elasticsearch was already upgraded to this final release.

Spatial code organization

There is an ongoing discussion regarding whether spatial code should entirely be in the lucene/spatial module, or whether the most commonly used bits should live in lucene/core. The current situation is not good since what we consider the best way to index geo-points is not even in lucene/core or lucene/spatial but in lucene/sandbox.

Other