This Week in Elasticsearch and Apache Lucene - 2019-05-10

Elasticsearch Highlights

Queryable Object Fields

We completed some benchmarks for embedded_json fields. The summary is that they look quite good, the performance is not too different from the baseline of mapping each individual field separately. This should conclude the work on this new field and the branch should be merged into master soon now that we have a better idea of the performance.

Performance

We released Rally 1.1.0! This release contains quite a few enhancements, including added support for ES 7.x as a metrics store and updates to support 7.x APIs plus some bug fixes and docs for the ccr-stats telemetry device.

We also worked on making Rally safe by opening a PR to fail Rally fast if a user attempts to override track variables that don’t exist. The existing behavior can be particularly dangerous e.g. when instead of bulk_size a user, due to a typo, specifies --track-params=”bulk-size:1000” thus eventually benchmarking a wrong scenario.

OpenID Connect Realm

The configuration guide doc is ready and the kibana PR that adds an OpenID Connect authentication provider in Kibana is undergoing review, so all the pieces for OpenID Connect support in our stack are falling into place!

Java Time

We are working on improving the Upgrade Assistant and Deprecation Info API to include guidance for migrating from joda to java.time patterns.

Say no to downgrades

Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data and metadata written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. David is adding checks that will prevent in-place downgrades, with an escape hatch that requires manual intervention.

Reindex

We are renaming the top-level reindex parameter size for limiting the maximum number of documents processed to max_docs, since size caused confusion (scroll/batch size vs hit size).

Frozen SQL

We worked on a PR to support frozen indices in SQL. This work revealed a small bug in Elasticsearch that we fixed, and we adapted the code in ODBC to accommodate the SQL work with frozen indices.

GeoSQL

We opened a PR for merging geosql branch to master! We spent the week working on GeoSQL finishing touches, merging the PR for switching server side of GeoSQL from ShapeBuilder to libs/geo classes, added the ST_Z function. We also fixed the reported display length for geo types and improved error messages and documented current limitations for geo in SQL (notably, geoshapes don't have doc values yet, geopoints are indexed with a potential loss of precision, and that altitude is accepted but not indexed).

Hadoop

We addressed some backward compatibility issues, while these issues are technically beyond our supported compatibility; the fixes were relatively straightforward and we want our users to have the best possible experience. (#1287) (#1284) (#1283).

Snapshot Lifecycle Management

We finished up the PR to persist SLM history. This will allow users to view the history of snapshots that have taken place and help troubleshoot failures. This also opens the door to create watches to notify someone if the snapshot fails. (#41707)

Watcher UI

We are working on bug fixes that will get us over the finish line on the angular to react migration.

Apache Lucene

Lucene 8.1

A second release candidate of Lucene 8.1 is under vote. It includes several changes we are interested in, notably:
  •  LUCENE-7673 Much faster merging of BKD trees with high numbers of dimensions like shapes. Points and ranges benefit a bit from it too but to a much lesser extent.
  •  LUCENE-3041 A query visiting API.
  •  LUCENE-8652 Synonyms can be assigned different weights.
  •  LUCENE-8671 Easier configuration of whether to load the terms index on or off- heap.


Lucene 7.7.2

We plan to start working on releasing 7.7.2 once 8.1 is out. It already has 2 bug fixes that we are interested in for Elasticsearch:
  •   LUCENE-8735 makes FilterDirectory propagate pending deletions, we currently have a workaround for it in Elasticsearch that we could remove in the future like we did on master.
  •   LUCENE-8688 is a bug that makes `_forcemerge` merge too much when `max_num_segments` is not 1.

Java 11 slowdowns

If you remember the update from 2 weeks ago, we got curious about slowdowns after we switched to Java 11 and our gut feeling was that it was due to the change of garbage collector. A contributor fixed nightly benchmarks to set the garbage collector back to the same that was used previously instead of relying on defaults, which resolved most slowdowns.

Other

  •  A bug was reported, which seems to be a bug in IBM J9.
  •  We are working on improving the way that working units are computed when IndexSearcher is configured to search with multiple threads.
  •  We are exploring switching IntArrayDocIdSet#advance from binary search to exponential search, which is expected to better perform in worst-case scenarios.
  •  IndexWriter#deleteAll has concurrency issues due to the fact that it tries to reset the schema. Note: Elasticsearch doesn't use this API.
  •  A contributor is exploring encoding FST arcs densely so that looking up a label would consist of an array lookup rather than a binary search.
  •  The community is suggesting adding a new rescorer implementation that is based on a collector rather than a query.

Changes in Elasticsearch

Changes in 8.0:

  • Throw exception if legacy interval cannot be parsed in DateIntervalWrapper #41972
  • BREAKING: Removes typed endpoint from search and related APIs #41640
  • Update TLS ciphers and protocols for JDK 11 #41808
  • BREAKING: Cleanup versioned deprecations in analysis #41560

Changes in 7.2:

  • Add painless string split function (splitOnToken) #39772
  • SQL: Add support for FROZEN indices #41558
  • fix unlikely bug that can prevent Watcher from restarting #42030
  • Fix node close stopwatch usage #41918
  • Limit max direct memory size to half of heap size #42006
  • Fix IAE on cross_fields query introduced in 7.0.1 #41938
  • Add HTML strip processor #41888
  • Remove manual parsing of JVM options #41962
  • Fix assertion error when caching the result of a search in a read-only index #41900
  • Cleanup exceptions thrown during RollupSearch #41272
  • Update lintian overrides #41561
  • Add gradle plugin for downloading jdk #41461
  • Make ISO8601 date parser accept timezone when time does not have seconds #41896
  • Allow IDEA test runner to control number of test iterations #41653
  • Always set terminated_early if terminate_after is set in the search request #40839
  • Upgrade SDK and test discovery-ec2 credential providers #41732
  • Allow unknown task time in QueueResizingEsTPE #41810
  • BREAKING: Reject port ranges in discovery.seed_hosts #41404
  • Update error message for allowed characters in aggregation names #41573
  • Force selection of calendar or fixed intervals in date histo agg #33727
  • Skip explain phase when only suggestions are requested #41739
  • Add index name to cluster block exception #41489
  • StackOverflowError when calling BulkRequest#add #41672
  • Noop peer recoveries on closed index #41400
  • Cut over ClusterSearchShardsGroup to Writeable #41788
  • Adapt low-level REST client to java 8 #41537

Changes in 7.1:

  • Fix fractional seconds for strict_date_optional_time #41871

Changes in 7.0:

  • Handle serialization exceptions during publication #41781

Changes in 6.8:

  • fix org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests #41777
  • Allow reindexing into write alias #41677
  • SQL: Remove CircuitBreaker from parser #41835
  • Remove Harmful Exists Check from BlobStoreFormat #41898
  • Add tasks to build Docker build context artifacts #41819
  • Log warning when unlicensed realms are skipped #41778
  • Enforce that Maven/Ivy repositories use https #41812

Changes in Elasticsearch Management UI

Changes in 7.2:

  • Reenable Rollup Jobs API test that was failing due to interval change in ES #36310

Changes in 6.8:

  • Update license management copy related to security #36204

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.2:

  • Fix integration testing app to run on non-Win platforms. Update ES setup sequence #153