This Week in Elasticsearch and Apache Lucene - 2018-07-07

Elasticsearch

Highlights

We have added documentation for painless script contexts, which includes each place in the Elasticsearch APIs that a script may be used, as well as what variables are available in each of those contexts.

As part of our ingest node work, we have added a "bytes" processor that converts human readable byte sizes (e.g., 1kb) to raw byte sizes (e.g., 1024). The new processor has been merged and is targeted to the next 6.x minor.

We have opened a PR which will allow for more flexibility in how fields are selected for inclusion in “all” queries. This will remove the current limitation in which plugins cannot control whether or not fields are searched in an “all” query.

We have undertaken an effort to improve testability and test coverage of our cloud platforms integration. Examples include: clean up some repository-s3 tests , merge AwsS3Service and InternalAwsS3Service in a S3Service class and Merge AzureStorageService and AzureStorageServiceImpl and clean up tests.

We recently enhanced our support for AWS session tokens by adding support for 3-part credentials. With MFA-secured AWS access, you use your permanent (2-part) credentials plus the MFA code to obtain a different set of temporary (3-part) credentials which permit access to the desired resources. Today, Elasticsearch can obtain temporary credentials from the EC2 metadata service but they cannot be supplied by the user as would be needed for use outside of EC2. In 6.4.0 Elasticsearch gains support for three-part temporary credentials supplied by the user, which means that, via the repository-s3 plugin, it's possible to snapshot and restore to a MFA-secured S3 bucket from outside of EC2.

Changes in 5.6:

  • Propagate mapping.single_type setting on shrinked index #31811

Changes in 6.3:

  • SQL: Allow long literals #31777
  • JDBC: Fix stackoverflow on getObject and timestamp conversion #31735
  • SQL: Fix incorrect message for aliases #31792
  • Watcher: Fix check for currently executed watches #31137

Changes in 6.4:

  • REST high-level client: add get index API #31703
  • Fix handling of points_only with term strategy in geo_shape #31766
  • Watcher: Consolidate setting update registration #31762
  • Fix not waiting for Netty ThreadDeathWatcher in IT #31758
  • Fix not waiting for Netty ThreadDeathWatcher in IT (#31758) #31789
  • Add analyze API to high-level rest client #31577
  • REST high-level client: add cluster get settings API #31706
  • Implemented XContent serialisation for GetIndexResponse #31675
  • Fixture for Minio testing #31688
  • ingest: Introduction of a bytes processor #31733
  • Fix coerce validation_method in GeoBoundingBoxQueryBuilder #31747
  • Add support for AWS session tokens #30414
  • resolveHasher defaults to NOOP #31723
  • Split CircuitBreaker-related tests #31659
  • Add write*Blob option to replace existing blob #31729
  • Watcher: Fix chain input toXcontent serialization #31721
  • Extend allowed characters for grok field names (#21745) (#31653) #31722

Changes in 7.0:

  • Remove support for deprecated StoredScript contexts #31394
  • Account for XContent overhead in in-flight breaker #31613
  • has_parent builder: exception message/param fix #31182

Lucene

Reclaiming deletes through merges

Today, the default merge policy, called TieredMergePolicy, exposes an opaque 'reclaimDeletesWeight' parameter to configure how aggressively deletes should be reclaimed. Its value is used in the function that scores merges. Unfortunately, values of this parameter don't mean much, only larger values will reclaim deleted documents more aggressively at the expense of more I/O. There is a suggestion that we replace it with a new 'indexPctDeletedTarget' parameter, which defines the maximum percentage of deleted documents that the index may have, which is much easier to reason about.

Other