This Week in Elasticsearch and Apache Lucene - 2019-01-25
Search as you type
We are working on a new field type called search_as_you_type. The goal of this new field is to allow fast completion query at the cost of a bigger index. This field type builds shingles of multiple sizes at index time and adds a custom field that is filled with all prefixes (edge ngrams) of the various shingles. These additional fields (shingles and prefix field) can then be used to build efficient search_as_you_type query. Coupled with the ability to not track the total hits of the query, this new field is a good alternative to the completion suggester. Unlike the completion suggester this new field can be mixed with any other field present in the mapping to add filters or boosts to your suggestion queries and can handle infix matching out of the box. Also note that this field's goal is not to provide query suggestions, searches using this field will return the best documents that match the completion query.
We've opened a PR that completes support for authentication in the OIDC realm using the authorization code grant and the implicit flows as described in https://openid.net/specs/openid-connect-core-1_0.html.
Index Lifecycle Management
We added badges to highlight follower indices in the index management UI:
We fixed a bug / blocker that could cause ILM to enter an error state if certain are performed while a snapshot in progress, and it has been backported to 6.7. (#37624). This change is large enough that we do not intend to backport to 6.6 (ILM is beta in 6.6).
We merged the ILM unfollow action (#36970) and the Unfollow injection (#37625) is ready and just waiting to get past the great CI catastrophe of early this week. Once these are finished, we will be done with the CCR/ILM follower integration.
We merged the work to enabled ILM to be used for watcher history. We are also working on a PR deprecating the old monitoring setting which will be removed in 8. #37624
Java Time Migration
We added tests for the [SQL protocol], to make sure no changes to the code affecting the protocol go unnoticed. Also, a seemingly easy [bug fix] for how the date intervals are displayed in the CLI turned into a wider change involving the code around CLI formatting.
We added a control parameter [odbc version] to allow laxing the versioning check, to better support clients trying out newer drivers with older servers. Handling the SQL API cursor has been refactored [odbc cursor], to allow arbitrary sizes that this can grow to in large queries. A minor change [odbc dsn] now provides a better error message in the case the DSN name validation is failed by the ODBC native libs.
We are working on allowing SQL to perform client-side sorting of groups by aggregations.
We've been working on introducing the SQL runtime-only date data type in 2 steps, first renaming the existing date type (mapped to ES date) to [datetime], and then the actual implementation of [date].
We continued to bolster our geoshape benchmarks, and ported the geopoint benchmark over to geoshape. We have a long-term goal to move points over to shapes (since they are both BKD now), so this is a good test to start running to can compare performance. Initial tests show that points-on-shapes is about 13% slower indexing, but more investigation is needed to confirm these results.
We discussed implementing a new geohash agg and deprecating the current one. However, we worked out a way to avoid this and instead refactor the existing agg. This refactoring work has begun (1, 2, 3, 4, 5), and the new quad key type will be following after.
We're continuing work on snapshot resiliency, fixing a race condition between snapshot creation and deletion, extending the test infrastructure for snapshotting while fixing further situations that lead to stuck snapshots and improving the handling of client-level repository settings.
We've tracked down the various places where a TLSv1 connection could be negotiated and has a implementation that enables deprecation logging if a TLSv1 connection is established.
Following the new
_ccr/info API, we have finished "Pause/Resume" UI for a follower index. We can now correctly list all the follower ("active" and "paused") on the table. We can't resume yet a follower index with its previous advanced settings due to https://github.com/elastic/elasticsearch/issues/37740#issuecomment-457257533. Here is a screenshot of the UI:
We also added an advanced settings UI for follower indices, which allows for some fine-grained tuning of replication parameters for transfers from leader to follower. Here is a screenshot of that UI:
We fixed a bug where a follower cluster might not have the right mapping to index the freshly-fetched operations, and is adding a safeguard to prevent users from making direct mapping updates to follower indices.
We fixed an issue where compression and ping interval settings are not dynamically updatable for remote clusters, which required extending the settings infrastructure to provide the ability to listen to a group of settings With this new infrastructure, remote connections are closed and reopened when changes to the connection-related settiings are made. We are also continuing work on the recovery from remote functionality, adding rate limiting on the leader cluster and integrating all the recovery from remote work behind the simple
PUT /follower_index/_ccr/follow command.
We simplified the exception handling for situations where remote cluster connections are missing or broken and added an internal hook to the shard follow task when a shard follow task is unable to fetch changes from the remote leader shard, because the ops are no longer there.
Repurposing a master-eligible or a data node as a coordinating-only node can trigger an issue where it will start reimporting its stale state as dangling indices, even though this node can't allocate any shards. This seems to be a recurring issue for some users, which is why we're putting necessary safeguards into place, implemented through the following PRs: In ES 6.7, dangling indices import will be disabled on coordinating-only nodes, with warnings logged. This means that the node will potentially still have old data / cluster states on disk, but won't disrupt the cluster by importing it as dangling. In ES 7.0, starting up a non-data node with shard data will fail the node at startup and starting up a coordinating-only node with shard or index (meta-)data will fail as well. Cleaning up the right data or metadata might not always be easy in case of such a transition. Follow-up work will therefore focus on tooling to help with these and other transitions, e.g., from mixed master/data nodes to dedicated masters.
Zen1's as well as Zen2's cluster coordination layer are based on quorums. As long as two out of your three master-eligible nodes are available, they will be able to to elect a master and process cluster state updates. Permanently losing a majority of master-eligible nodes, however, brings the cluster to a halt, and bears the risk of data loss. With only a single node out of the three original ones left, you don't know whether or not that remaining node will have seen the latest cluster state updates, as the two other nodes might have happily formed a cluster and processed cluster state updates before becoming permanently unavailable. While our preferred solution is to fully recreate the cluster from a backup using the snapshot/restore functionality (restoring the cluster to a good last-known state), a method of last resort is to unsafely recover the cluster based on the last remaining master node. In Zen1, this unsafe recovery mechanism could just be achieved by starting up 1 or 2 more fresh master-eligible nodes, and the old node would happily be elected as a master again, making the cluster available again, with the caveat of silent data loss. In Zen2, we want to be much more explicit about steps which lead to data loss. As a consequence, the previous method of just starting a few fresh master-eligible nodes will not magically heal the cluster under silent data loss. Instead, we will require the cluster administrator to run a command line tool to unsafely (i.e. with risk of data loss) recover the cluster again, by changing the expected election quorum size on the last remaining master to just the node itself, thereby allowing this node to become master again without votes from anyone else (and other fresh master-eligible nodes can then join it). This tool will amply warn about the risk of data loss. We've implemented the command line tool for unsafely bootstrapping a cluster again after losing a majority of master-eligible nodes, and are working on the corresponding docs.
We made cluster bootstrapping more flexible by just requiring a majority of nodes matching the initial_master_nodes and adapted auto-bootstrapping during a rolling upgrade so that we don't require configuring minimum_master_nodes during a rolling upgrade from 6.x to 7 anymore. This change will also enable us to now fail an ES 7 node at startup if it has minimummasternodes configured in its elasticsearch.yml file, as this setting is no longer needed in an ES 7 cluster.
While Zen2 does not necessarily require for the master to be in the current voting configuration (i.e. the set of nodes whose votes matter in a master election), it is easier to understand the overall system if these two coincide. We've adapted the master election and voting reconfiguration logic as to only bootstrap and elect a node in current voting configuration as master and to have the master abdicate to another master-eligible node in case where it is reconfigured out of the voting configuration, for example through the use of voting configuration exclusions. We've also added an optimization to Zen2 to prioritize sending cluster state updates to master-eligible nodes first, so that cluster state updates are committed more quickly.
Changes in Elasticsearch
Changes in 7.0:
- Add tool elasticsearch-node unsafe-bootstrap 37696
- Deprecate xpack.watcher.history.cleaner_service.enabled 37782
- Bubble exceptions up in ClusterApplierService 37729
- Use mmnodes from Zen1 master for Zen2 bootstrap 37701
- BREAKING: Remove index audit output type 37707
saturationin script_score 37766
- Use ILM for Watcher history deletion 37443
- Streamline skip_unavailable handling 37672
- Fix edge case in PutMappingRequestTests 37665
- Remove kibanauser and kibanadashboardonlyuser index privileges 37441
- Fail start of non-data node if node has data 37347
- Bootstrap a Zen2 cluster once quorum is discovered 37463
- Upgrade to lucene-8.0.0-snapshot-83f9835. 37668
- Add support for merging multiple search responses into one 37566
- BREAKING: Remove Watcher Account "unsecure" settings 36736
Changes in 6.7:
- Deprecate HLRC EmptyResponse used by security 37540
- Analysis Deprecate standard html analyzer in 6.x 37292
- Deprecate types in create index requests. 37134
- Do not allow put mapping on follower 37675
- Expose minimummasternodes in cluster state 37811
- Optimize warning header de-duplication 37725
- SQL: Fix BasicFormatter NPE 37804
- Add built-in user and role for code plugin 37030
- SQL: Introduce SQL DATE data type 37693
- Set acking timeout to 0 on dynamic mapping update 31140
- Deprecate index audit output type 37671
- Update authenticate to allow unknown fields 37713
- Deprecate types in get field mapping API 37667
- Make prepare engine step of recovery source non-blocking 37573
- Make sure PutMappingRequest accepts content types other than JSON. 37720
- Retry ILM steps that fail due to SnapshotInProgressException 37624
- Ensure either success or failure path for SearchOperationListener is called 37467
- SyncedFlushService.getShardRoutingTable() should use metadata to check for index existence 37691
- Expose sequence number and primary terms in search responses 37639
- Remove warn-date from warning headers 37622
- SQL: Return Intervals in SQL format for CLI 37602
- Un-assign persistent tasks as nodes exit the cluster 37656
- BREAKING: Follow stats api should return a 404 when requesting stats for a non existing index 37220
- Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines 37426
- Removes awaits fix as the fix is in. 37676
- Add note about how the body is referenced 33935
- Fix a test failure in CompositeRolesStoreTests 37661
- Fix Race in Concurrent Snapshot Delete and Create 37612
- Handle requiring versioned java home during execution 37599
- Permission for restricted indices 37577
- Watcher notification settings Upgrade checks 36907
- Update jdk used by the docker builds 37621
Changes in 6.6:
- Tests: disable testRandomGeoCollectionQuery on tiny polygons 37579
- Ensure changes requests return the latest mapping version 37633
- Always return metadata version if metadata is requested 37674
- Fix potential NPE in UsersTool 37660
- Do not set fatal exception when shard follow task is stopped. 37603
Changes in 6.5:
- SQL: Fix issue with complex expression as args of PERCENTILE/_RANK 37102
- Use explicit version for build-tools in example plugin integ tests 37792
Changes in Elasticsearch Management UI
Changes in 6.7:
- Update console ilm ccr 29183
- adding set priority action support to ILM UI 29205
- add follower badge to index management 29177
- Localize strings in Rollup and ILM apps. 29034
- Small improvements to add cluster page 29142
- Adding frozen indices support index management 28855
- i18n Index Management translate missing labels 28816
Changes in Elasticsearch SQL ODBC Driver
Changes in 6.7:
- SQLFetchScroll support 98
- Integration testing 96
- Add SQLEndTran API function 94
- Change type name from DATE to DATETIME 93
Changes in Rally
Changes in 1.0.3:
- Improve error message on missing repo directory 630
Lucene 7.7 and 8.0
We are proposing to release 7.7 and 8.0 in a row, starting on the week of February 4th. Elasticsearch 6.7 will be on Lucene 7.7 and Elasticsearch 7.0 will be on Lucene 8.0.
Backward compatibility policy
Lucene introduced hard checks that version N cannot read indices that have been created before version N-1. We're proposing to relax this check so that it is still possible to open old indices if their segments are on a supported codec, with restrictions that still need to be defined.
Jump tables for doc values
Sparse doc values used to be only able to advance by 65536 documents at a time, the size of a block. Jump tables were added so that one can advance directly to the right block. Blocks that are recorded as a bit set also have an additional rank index to make jumps within a block faster.
We made a proposal to support boosting query-time synonyms, which helps with terms that share some meaning even though they are not exactly synonyms. It works by scaling the term frequency for a synonym by its boost, similarly to how BM25F scores across multiple fields. This feature had been requested multiple times in the past.
- We are working on removing deprecations that we introduced in Lucene 8 and before from Lucene 9.
- We found a bug in the way that WANDScorer scales scores.
- We added support for intervals on multi-term queries such as wildcards.
- There are active discussions regarding how we could better support a disk-based terms index.
- We fixed a bug in Polygon2D#relateTriangle that made it think the polygon was disjoint from the triangle whet it was actually fully within the triangle.
- We fixed WITHIN and DISJOINT when multi-polygons are indexed. It used to return false positives in case one polygon would be within/disjoint but another polygon would not.
- We're working on CONTAINS for BKD-backed shapes.