This Week in Elasticsearch and Apache Lucene - 2019-04-26
Elasticsearch
Watcher UI
There is a PR in code review which will add support for threshold alert actions. Another has been merged a PR that updates the threshold alert visualization with a chart from the elastic-charts library and one more to add support for three new comparison types when build a threshold alert. We have also refined the UI of the watch history detail panel and added 403 and 404 error feedback.
Enrich Processor
We have continued to work on the enrich processor. The groundwork that allow enrich processor to access a locally allocated shard has been merged in the feature branch. A new PR has been opened that adds an enrich processor to decorate documents based on exact lookups. This PR also adds a special field mapper that allows the values being decorated to be retrieved in a fast manner.
We have some changes in the works to add get and delete enrich policy APIs and we have completed the put policy API. We also simplified the EnrichStore (internal helper used to store and access enrich policies), so that it can be used more easily in the various places where it is needed.
Consistent Settings
We are working on a feature that will allow us to ensure certain secure (keystore) settings are consistent across the whole cluster.
API Key privileges
there are now some additional cluster privileges that will allow API keys to be useful for users that do not have superuser/manage_security privileges in the cluster.
Block-Max Score
We have disabled max-score optimization on queries that contain a mandatory scoring clause with unbounded max score. Lucene 8 has the ability to skip blocks of non-competitive documents. However some queries don't track their maximum score (script_score, span, ...) so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down some boolean queries if other clauses have bounded max scores.
Field Capabilities
The Field Capabilities API will now report fields that are missing in some indices in an "unmapped" section.
Following this change we have enhanced the index resolution in SQL to use field_caps not just for merging but also for individual table discovery (used inside metadata).
Querying frozen indices in SQL
SQL now supports frozen indices. In SQL you can now indicate that you want index resolution to include frozen indices in two ways:
- Through a dedicated
FROZEN
grammer extension (e.g.SELECT field FROM FROZEN index
) - Through a configuration parameter on the drivers
index.include.frozen: true
Conditional logic in SQL SELECTs
We finished up with implementation of CASE. CASE is a powerful ANSI SQL expression which implements the IF.. THEN.. ELSE.. logic of programming languages. It can be used in the SELECT, WHERE, GROUP BY, ORDER BY & HAVING clauses. Here is an example:
SELECT count(*) AS count,
CASE WHEN NVL(languages, 0) = 0 THEN 'zero'
WHEN languages = 1 THEN 'one'
WHEN languages = 2 THEN 'bilingual'
WHEN languages = 3 THEN 'trilingual'
ELSE 'multilingual'
END as lang_skills
FROM test_emp
GROUP BY lang_skills ORDER BY 2;
Snapshot resiliency
The snapshot repository plugin testing for Azure is now on par with the testing we have for S3 and GCS. We have worked on adding third-party tests for Azure, i.e., CI jobs that run some of our snapshot/restore related tests against the actual Azure service instead of just mocks as we do in local test runs. This also completes the work to add CI capabilities that run basic snapshot/restore tests against the real infrastructure of the three major Cloud providers.
Based on the recent work that allows snapshot repositories to implement deletes more efficiently using bulk operations, we are switching the GCS repository to this new feature by making use of GCS's capability to batch deletes, which will significantly speed up snapshot deletions on a GCS repository. The same functionality was recently added for S3 as well.
Cluster coordination
We worked with an external contributor to fix the sample configuration files in our docker-compose docs, which were missing a vital discovery.seed_nodes setting for all nodes, resulting in a cluster to form fine only the first time but then fail to properly restart half of the time.
We have also adapted the logging output of the functionality where we periodically log a warning message with detailed info when a node cannot discover a master or elect a master, making sure to filter out master-ineligible nodes from the cluster state, as these nodes are just adding noise and confusion to the log message.
Lucene
Backward offsets
Users are complaining that the fact that Lucene started rejecting backward offsets in 7.0 breaks their use-case since they no longer can index compound terms as a synonym of their sub terms while preserving offsets. This is unfortunate but the ongoing discussion suggests that Lucene will keep rejecting backward offsets since it allows to implement some algorithms without backtracking and allows to encode offsets more efficiently.
Performance regressions with Java 11
Since Lucene master now requires Java 11, the benchmarks were upgraded to Java 11 as well, but it was found that it triggered a slight slowdown. We suspect it is due to the change of the default garbage collector from ParallelGC to G1. The benchmarks will temporarily force the ParallelGC collector to isolate whether this is actually due to the garbage collector.
Other
- Work is ongoing to align the Luwak codebase with Lucene to prepare the donation of Luwak to Lucene.
- We have javadocs for analysis components, but we generally don't document the name of these components, making them hard to used.
- An Elasticsearch user reported that MinHashFilter generates illegal unicode.
- Corner cases in the Tessellator were fixed. *How should Lucene map segments to threads when configured to parallelize query execution?
Changes
Changes in Elasticsearch
Changes in 8.0:
- Make 0 as invalid value for
min_children
inhas_child
query #41347
Changes in 7.1:
- Disable max score optimization for queries with unbounded max scores #41361
- Deprecate support for first line empty in msearch API #41442
- Improve accuracy for Geo Centroid Aggregation #41033
- Disallow null/empty or duplicate composite sources #41359
- Peer recovery should not indefinitely retry on mapping error #41099
- SSLDriver can transition to CLOSED in handshake #41458
- Introduce aliases version #41397
- fix #35262 define deprecations of API's as a whole and urls #39063
- SQL: Implement IIF(<cond>, <result1>, <result2>) #41420
- SQL: Use field caps inside DESCRIBE TABLE as well #41377
- SQL: Implement CASE... WHEN... THEN... ELSE... END #41349
- Add ignore_above in ICUCollationKeywordFieldMapper #40414
- Move keystore-cli to its own tools project #40787
- Omit non-masters in ClusterFormationFailureHelper #41344
- Handle unmapped fields in _field_caps API #34071
Changes in 6.7:
- Fix Has Privilege API check on restricted indices #41226
- Fix role mapping DN field wildcards for users with NULL DNs #41343
- Reduce security permissions in CCR plugin #41391
- SQL: Fix bug with optimization of null related conditionals #41355
Changes in Elasticsearch Management UI
Changes in 6.7:
- [CCR] Retrieve paused state of follower index from ES instead of depending upon the client to provide it #35342
- [CCR] Allow user to use CCR when security is not enabled. #35333
Changes in Rally Tracks
-
Update target throughput #74