This Week in Elasticsearch and Apache Lucene - 2019-05-10
Queryable Object Fields
We completed some benchmarks for embedded_json fields. The summary is that they look quite good, the performance is not too different from the baseline of mapping each individual field separately. This should conclude the work on this new field and the branch should be merged into master soon now that we have a better idea of the performance.
We released Rally 1.1.0! This release contains quite a few enhancements, including added support for ES 7.x as a metrics store and updates to support 7.x APIs plus some bug fixes and docs for the ccr-stats telemetry device.
We also worked on making Rally safe by opening a PR to fail Rally fast if a user attempts to override track variables that don’t exist. The existing behavior can be particularly dangerous e.g. when instead of
bulk_size a user, due to a typo, specifies
--track-params=”bulk-size:1000” thus eventually benchmarking a wrong scenario.
OpenID Connect Realm
The configuration guide doc is ready and the kibana PR that adds an OpenID Connect authentication provider in Kibana is undergoing review, so all the pieces for OpenID Connect support in our stack are falling into place!
We are working on improving the Upgrade Assistant and Deprecation Info API to include guidance for migrating from joda to java.time patterns.
Say no to downgrades
Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data and metadata written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. David is adding checks that will prevent in-place downgrades, with an escape hatch that requires manual intervention.
We are renaming the top-level reindex parameter size for limiting the maximum number of documents processed to max_docs, since size caused confusion (scroll/batch size vs hit size).
We opened a PR for merging geosql branch to master! We spent the week working on GeoSQL finishing touches, merging the PR for switching server side of GeoSQL from ShapeBuilder to libs/geo classes, added the ST_Z function. We also fixed the reported display length for geo types and improved error messages and documented current limitations for geo in SQL (notably, geoshapes don't have doc values yet, geopoints are indexed with a potential loss of precision, and that altitude is accepted but not indexed).
We addressed some backward compatibility issues, while these issues are technically beyond our supported compatibility; the fixes were relatively straightforward and we want our users to have the best possible experience. (#1287) (#1284) (#1283).
Snapshot Lifecycle Management
We finished up the PR to persist SLM history. This will allow users to view the history of snapshots that have taken place and help troubleshoot failures. This also opens the door to create watches to notify someone if the snapshot fails. (#41707)
- LUCENE-7673 Much faster merging of BKD trees with high numbers of dimensions like shapes. Points and ranges benefit a bit from it too but to a much lesser extent.
- LUCENE-3041 A query visiting API.
- LUCENE-8652 Synonyms can be assigned different weights.
- LUCENE-8671 Easier configuration of whether to load the terms index on or off- heap.
- LUCENE-8735 makes FilterDirectory propagate pending deletions, we currently have a workaround for it in Elasticsearch that we could remove in the future like we did on master.
- LUCENE-8688 is a bug that makes `_forcemerge` merge too much when `max_num_segments` is not 1.
Java 11 slowdowns
If you remember the update from 2 weeks ago, we got curious about slowdowns after we switched to Java 11 and our gut feeling was that it was due to the change of garbage collector. A contributor fixed nightly benchmarks to set the garbage collector back to the same that was used previously instead of relying on defaults, which resolved most slowdowns.
- A bug was reported, which seems to be a bug in IBM J9.
- We are working on improving the way that working units are computed when IndexSearcher is configured to search with multiple threads.
- We are exploring switching IntArrayDocIdSet#advance from binary search to exponential search, which is expected to better perform in worst-case scenarios.
- IndexWriter#deleteAll has concurrency issues due to the fact that it tries to reset the schema. Note: Elasticsearch doesn't use this API.
- A contributor is exploring encoding FST arcs densely so that looking up a label would consist of an array lookup rather than a binary search.
- The community is suggesting adding a new rescorer implementation that is based on a collector rather than a query.
Changes in Elasticsearch
Changes in 8.0:
- Throw exception if legacy interval cannot be parsed in DateIntervalWrapper #41972
- BREAKING: Removes typed endpoint from search and related APIs #41640
- Update TLS ciphers and protocols for JDK 11 #41808
- BREAKING: Cleanup versioned deprecations in analysis #41560
Changes in 7.2:
- Add painless string split function (splitOnToken) #39772
- SQL: Add support for FROZEN indices #41558
- fix unlikely bug that can prevent Watcher from restarting #42030
- Fix node close stopwatch usage #41918
- Limit max direct memory size to half of heap size #42006
- Fix IAE on cross_fields query introduced in 7.0.1 #41938
- Add HTML strip processor #41888
- Remove manual parsing of JVM options #41962
- Fix assertion error when caching the result of a search in a read-only index #41900
- Cleanup exceptions thrown during RollupSearch #41272
- Update lintian overrides #41561
- Add gradle plugin for downloading jdk #41461
- Make ISO8601 date parser accept timezone when time does not have seconds #41896
- Allow IDEA test runner to control number of test iterations #41653
- Always set terminated_early if terminate_after is set in the search request #40839
- Upgrade SDK and test discovery-ec2 credential providers #41732
- Allow unknown task time in QueueResizingEsTPE #41810
- BREAKING: Reject port ranges in discovery.seed_hosts #41404
- Update error message for allowed characters in aggregation names #41573
- Force selection of calendar or fixed intervals in date histo agg #33727
- Skip explain phase when only suggestions are requested #41739
- Add index name to cluster block exception #41489
- StackOverflowError when calling BulkRequest#add #41672
- Noop peer recoveries on closed index #41400
- Cut over ClusterSearchShardsGroup to Writeable #41788
- Adapt low-level REST client to java 8 #41537
Changes in 7.1:
- Fix fractional seconds for strict_date_optional_time #41871
Changes in 7.0:
- Handle serialization exceptions during publication #41781
Changes in 6.8:
- fix org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests #41777
- Allow reindexing into write alias #41677
- SQL: Remove CircuitBreaker from parser #41835
- Remove Harmful Exists Check from BlobStoreFormat #41898
- Add tasks to build Docker build context artifacts #41819
- Log warning when unlicensed realms are skipped #41778
- Enforce that Maven/Ivy repositories use https #41812
Changes in Elasticsearch Management UI
Changes in 7.2:
- Reenable Rollup Jobs API test that was failing due to interval change in ES #36310
Changes in 6.8:
- Update license management copy related to security #36204
Changes in Elasticsearch SQL ODBC Driver
Changes in 7.2:
- Fix integration testing app to run on non-Win platforms. Update ES setup sequence #153