This Week in Elasticsearch and Apache Lucene - 2019-10-21

Elasticsearch

API Keys UI

We have merged a new API Keys app into Management > Security in Kibana. This UI allows users with the manage_own_api_key permission to see and invalidate there own API keys (if they have any). It also allows admins with the manage_api_key permission to view and invalidate all users' API keys as well as seeing the realm and user that created the key.

api key management ui

Painless

We have added a new script context api. This API allows users to list all the script context names that are available in Elasticsearch. Next, the api will be extended to provide more information about the variables and methods available to each context, which we hope to use in a future debugger as well as documentation generation.

Improving circuit breaker memory accounting

Previously, we merged a PR which greatly reduced the number of generated doc value iterators from n^2 to 2n when aggregating global ordinals, a huge improvement in memory used while running Terms aggregations.

We are always looking for ways to improve our circuit breakers' accounting of memory. Running terms aggregations can accumulate a large number of cached TermsDict objects, each of which contains a BytesRef sized to the largest term in that segment. This memory usage is not currently tracked by the circuit breakers, and can be a non-trivial percentage of memory. Discussion is still ongoing as to whether this PR is the right approach, but we want to track this usage as we believe it's significant.

API reference reformatting

We have wrapped up the formatting changes to the SLM APIs, so the work to reformat the API reference docs is now complete

This means that the REST APIs section of the Elasticsearch reference guide all follows a consistent format and should help users can more easily find the information they need when developing against the APIs.

That said, there will be some ongoing work to streamline the reference content. As noted in #48189, applying the template has highlighted some issues.We have updated the bulk API to address missing information and we will be filling any other similar gaps in future changes.

Enrich

We have merged the enrich processor. This new ingest processor aims to enrich documents at ingest time with data from another index. Users can define an enrich policy which will define which field to match on and then what information from the source index to add to the incoming document.Example use cases for the enrich processor include:

  • Adding contact information such as names and company information from known contacts based on the email address used in a webinar sign up form.
  • Adding product information to retail orders based on product IDs in the order
  • Identify web services and vendors based on known IP Addresses

Lucene

Lucene 8.3

We integrated a new snapshot of Lucene 8.3 in Elasticsearch and this is not causing regressions anymore. The plan is to get a first release candidate built today so hopefully we will have Lucene 8.3 released in the coming week or two.

Faster filtering on shapes

Our construction logic for KD trees assumes independence of dimensions. However, for some data we index using KD trees, this is usually not the case. For instance the bounding boxes we index tend to be small, so the minimum latitude is usually close to the maximum latitude, and the minimum longitude tends to be close to the maximum longitude. We updated the building logic in order to regularly re-compute the actual bounds of values on every dimension, which made filtering on geo shapes significantly faster.

Other

Changes

Changes in Elasticsearch

Changes in 7.6:

  • Scripting: get context names REST API #48026
  • Cache all rest tests tasks so long as they don't use shared clusters #48161

Changes in 7.5:

  • [7.x] Fix build failures when docker-compose is unavailable #48166
  • Fix packaging tests on debian 10 #48138
  • Create test superuser when launching elasticsearch via 'run' task #48072
  • Add SLM support to xpack usage and info APIs #48096
  • Add enrich processor #48039
  • Update forbiddenapis to v2.7 #47969
  • SQL: Implement DATEDIFF function #47920
  • SQL: Fix arg verification for DateAddProcessor #48041
  • Make the run task honor tests.es properties #47860
  • Add cloudId builder to the HLRC #47868
  • Add Pause/Resume Auto-Follower APIs to High Level REST Client #47989
  • [ML][Transforms] signal listener early on task _stop failure #47954
  • Don't apply the plugin's reader wrapper in can_match phase #47816
  • Sequence number based replica allocation #46959
  • Add Pause/Resume Auto Follower APIs #47510
  • Remove uniqueness constraint for API key name and make it optional #47549
  • Add snapshot support to distribution download plugin #47837
  • SQL: Fix Nullability of DATEADD #47921
  • Add builder for distance_feature to QueryBuilders #47846
  • Enable ResolverStyle.STRICT for java formatters #46675
  • Shrink should not touch max_retries #47719

Changes in 7.4:

  • Drop stored scripts with the old style-id #48078
  • [7.5][Transform] prevent assignment if any node is older than 7.4 #48055
  • [ML][Transforms] fix bwc serialization with 7.3 #48021
  • Add build scan tag for all pull request builds #47889
  • [Monitoring] Add new cluster privilege now necessary for the stack monitoring ui #47871
  • Allow truncation of clean translog #47866

Changes in 6.8:

  • Do not prefix already prefixed format with 8 #48139
  • Slow log must use separate underlying logger for each index #47234
  • Fix refresh optimization for realtime get in mixed cluster #48151
  • Fix ILM HLRC Javadoc->Documentation links #48083
  • SQL: Fix issue with negative literels and parentheses #48113
  • Fix Bug in Azure Repo Exception Handling #47968
  • Avoid unneeded refresh with concurrent realtime gets #47895
  • Fix Rollover error when alias has closed indices #47839
  • Upgrade to lucene 7.7.2 #47901
  • Improve translog corruption detection #47873

Changes in Elasticsearch Hadoop Plugin

Changes in 8.0:

  • Fix spark test failures #1369

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.5:

  • update set of advertised supported time functions #190
  • Test: remove doc type #188

Changes in Rally

Changes in 1.4.0:

  • Allow definition of body in restore-snapshot operation #798

  • Add ability to restore from a snapshot #793

  • Remove compatibility layer for old mappings. #790

  • Let the runner determine progress #789