This Week in Elasticsearch and Apache Lucene - 2019-08-09

Elasticsearch

Enrich processor

Deleting an enrich policy that is attached to existing pipelines can result in a nasty situation. There is a tight correlation between the enrich policy and the specialized .enrich index. Deleting the policy will result in cleaning up the backing .enrich index. If the .enrich index is deleted any ingest pipelines that is using that enrichment will fail.

We exposed a new method in the ingest service that allows a pipeline ID and a processor class to be passed in, and will validate if the processor exists in the pipeline. This will allow us to safely delete a policy (coming soon..). This also exposed that we have nested processors (which we call wrapping processors), and the nesting level is arbitrary, so we ensure that these are checked as well in the ingest service method. (#45156)

We also started the initial documentation for the enrich project. (#45084)

ILM

We merged the guide to using ILM with existing indices.

Hadoop

Today when reading data using the ES scroll API, if the node that handles the scroll is slow, then the request will time out, and es-hadoop will attempt to retry the same scroll request. This is problematic because scroll requests cannot be retried on other nodes. This can lead to fewer documents being returned than expected, or ES returning an error because a scroll id is accessed multiple times. The fix is to fail the entire scroll on the es-hadoop side and start a new scroll. (#1330)

ES request details in the UI

Part of the Elasticsearch UI team's mission is to use the UI to educate our users about Elasticsearch's APIs and support the user's growth from beginner into power user. To further this mission, we're surfacing the Elasticsearch requests that underlie the various create and update actions users can take in our apps. Users can see what the UI is going to do under the hood and copy and paste these requests to execute them manually or incorporate them into their automated processes. This feature also makes it easier for consultants to demonstrate the UI and API to customers in parallel. So far we have added this feature to Index Lifecycles, the Rollup Job Wizard, Remote Clusters, and Cross-Cluster Replication, with more to come.

ES request details

Rollup UI gains clone and start immediately functionality

As part of empowering users with UI that helps them do what they need to do fast and accurately, we implemented clone functionality for Rollup Jobs. Additionally, Kibana users are now able to start Rollup Jobs at creation.

Rollup UI

Painless

Users have asked for additional whitelisted utility methods, like randomUUID. We have historically resisted this particular method for a while due to the possibility /dev/random can block. However, we added the method this week with the expectation that any users dealing with hangs can force Java to use /dev/urandom.

Named queries

We started to rewrite named queries using the Lucene Matches API. Today, named queries can be used to check if a sub-clause matches in top-documents but we'd like to extend this feature with the list of terms that match internally. Matches can retrieve more than a list of terms, so we are also making the rewrite more extensible so that we can build a number of different things including highlighting and pluggable interface.

SQL

We are looking into improving the extraction from source (and thus avoiding doc values even further) and improving the use of timezone across pagination.

We fixed a bug that made all the CURRENT_DATE/CURRENT_TIME functions not work with date fields that have a mapping with custom format. The culprit was the lack of "format" parameter sent to the range query, Elasticsearch using the date field's default in this case.

We merged a PR that adds GUI configuration elements for the connection string options not covered in the DSN editor already. Another PR has been opened that introduces CBOR support. The functionality stops short of supporting encoding/decoding of the parameters and result sets values. It will allow building a non-parameterized query and decoding the response object (either with a result-set or with an error), but without unpacking the values in the result set of the latter. The rest of the functionality will be part of subsequent PRs, to chunk the work into smaller sizes.

Peer recoveries

The feature branches for peer recovery retention leases have been merged into master and 7.x, with follow-ups directly going to those branches.

We added functionality to discard leases for peers where a file-based recovery is preferred over an operation-based recovery, even though an operation-based recovery would technically be feasible. This is because file-based recoveries can be much quicker when there are many operations to transfer, and this is even more true on indices that are not append-only: in the limit, if you delete every document in your index while a replica is offline then that's numDocs operations to replay, but a file-based recovery need only copy a trivial amount of data. We've settled on falling back to a file-based recovery when the number of operations to replay exceeds 10% of the document count of the index, although that heuristic is still to be validated.

We fixed an ugly corner case in our replication model. The translog of a replica performing operation-based recovery is kept for two purposes: (1) retain history so that if a replica is promoted it can recover other copies using operation-based recovery; (2) allow us to move to a non-destructive peer recovery model in the future. However, we need to trim the translog before marking a recovering replica as in-sync, as otherwise stale operations above the primary's max sequence number can be resurrected when the replica becomes primary.

Reindex resilience

We are unifying the code that retries searches for remote and local reindex, serving as a groundwork change to add the first pieces for making searches resilient within reindex.

As the new resilient reindex is building on persistent tasks, the way reindex requests are processed internally in an ES cluster is quite different to the old way. In order to provide backwards compatibility in terms of APIs, there's some major refactoring required. We extracted reindexing code that is to be used by both old and new-style reindex and also improved the new code to better interact with rethrottling.

Networking

We added support for configuring per-socket TCP keepalive options. This builds on JDK 11's ExtendedSocketOptions which allow per-socket configuration of TCP keepalive on Linux and Mac. The advantage is that these can now be configured using Elasticsearch settings instead of doing this configuration at the system level. By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like to explore whether we can enable them by default, in particular to force keepalive configurations that are better tuned for running ES.

We optimized reading Strings from the wire in our transport protocol by avoiding byte-by-byte reads from the underlying network buffers, which comes with overheads such as extra bound checks, moving this to a chunk by chunk model instead.

Dynamic realm configuration

We began work on the ability to configure security realms without requiring a node restart.

SMTP TLS for Watcher

Watcher has long supported using a custom truststores/CAs for HTTP requests, but had limited options when sending email to TLS secured mail servers. We opened a PR to add SSL settings for Watcher SMTP.

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Avoid unnecessary eager creation of Gradle tasks #45098

Changes in 7.4:

  • Upgrade to Netty 4.1.38 #45132
  • Only retain reasonable history for peer recoveries #45208
  • [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry #45108
  • removes the CellIdSource abstraction from geo-grid aggs #45307
  • Fix for build runtime classpath instability #45347
  • Remove loop counter from Reserved in Painless AST #45298
  • Add destrictive task encapsulating all destructive packaging tasks #45315
  • Change the ingest simulate api to not include dropped documents #44161
  • Allow empty token endpoint for implicit flow #45038
  • Convert vagrant tests to per platform projects #45064
  • Add ingest processor existence helper method #45156
  • BREAKING: [ML-DataFrame] Combine task_state and indexer_state in _stats #45276
  • Include in-progress snapshot for a policy with get SLM policy API #45245
  • [ML][Data Frame] add support for geo_bounds aggregation #44441
  • [ML][Data Frame] Add update transform api endpoint #45154
  • Auto-release of read-only-allow-delete block when disk utilization fa… #42559
  • Decouple Painless AST Lambda generation from the grammar #45111
  • Avoid building docker images when running precommit task #45211
  • Fix Rollup job creation to work with templates #43943
  • Improve slow logging in MasterService #45086
  • Allow pipeline aggs to select specific buckets from multi-bucket aggs #44179
  • Add per-socket keepalive options #44055
  • Trim local translog in peer recovery #44756
  • Add more flexibility to MovingFunction window alignment #44360
  • Add description to force-merge tasks #41365
  • Use the full hash in build info #45163
  • Whitelist randomUUID in Painless #45148
  • Geo: add Geometry-based query builders to QueryBuilders #45058
  • Use index for peer recovery instead of translog #45136

Changes in 7.3:

  • SQL: uniquely named inner_hits sections for each nested field condition #45039
  • Restore DefaultShardOperationFailedException's reason during deserialization #45203
  • Fix clock used in update requests #45262
  • [ML][Data Frames] Fix null aggregation handling in indexer #45061
  • Fix failed dependency resolution with external build-tools users #45107
  • Enable reloading of synonym_graph filters #45135

Changes in 6.8:

  • Add build hash to global build info #45162
  • Improve SCM info in build scans #45264
  • Normalize environment paths #45179
  • Upgrade to JDK 12.0.2 #45172
  • Add OCI annotations and adjust existing annotations #45167
  • Lift build date to global build info #45166

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.4:

  • New "Misc" panel for various config options #164