15 February 2019

This Week in Elasticsearch and Apache Lucene - 2019-02-15

By Daniel MitterdorferJake LandisJay ModiBill McConaghyAdrien GrandPaul SanwaldZachary TongJim FerencziYannick Welsch

Elasticsearch Highlights

API Keys

We added the ability to get the enabled status for both the token service and API key service to the xpack usage API.

Zen2

Zen2 is ready for release and has met all goals for 7.0. Follow-up work focuses on extending the documentation, clean-ups, benchmarks, and promotional and educational material. We are investigating how Zen2 behaves in large clusters up to 100 nodes. This work is based on the Stack/Terraform project, which has recently seen a lot of use for Cross Cluster Replication as well. Scalability tests focus on master elections, general cluster formation and cluster state publishing performance. Typical setups (3 dedicated master nodes) show sub-second elections, and even larger setups (e.g. 51 master-eligible nodes, which is definitely not a setup we would recommend) still show that elections are possible, even though they're slower. Additional testing for typical setups (e.g. 3 dedicated master nodes and 100 data nodes) as well as large cluster states is in the works.

Build

Deprecations

We deprecated in 6.7.0 and removed support in 7.0.0 for using the low-level REST client on JDK 7. JDK 7 has been EOL since April 2015, and JDK 8 was marked as EOL in January 2019. Removing support for JDK 7 enables us to modern language features when developing and maintaining the client.

Ingest node

We added support for the ecs flag in the user_agent processor in 7.0. This flag allow the user_agent processor to emit Elastic Common Schema compatible output. This flag was introduced in 6.7 and was originally scheduled to be removed in 7.0. However, to help Beats consumers with the migration from 6.7 to 7.0, this flag is now preserved in 7.0, but will now default to true (use ECS output). The flag will be removed in 8.0.

Geo Performance

We have been working on a new radix partitioning for BKD segment merging, which gave great improvements to indexing throughput and temp disk usage. We have continued to optimize the algorithm by re-using objects when working on-heap, netting an additional 12-17% improvement in indexing throughput.

We merged geopoints-as-geoshape benchmark into rally.

We have been working on creating a new nightly geo_shape benchmarks specifically for comparing performance of different shape implementations. Lucene's nightly geo benchmarks uses a corpus of 60.8 million points from the Planet OSM data set to measure index and query performance. While its good practice to measure incremental performance improvements as code is committed or changed, comparing the performance between LatLonPoint / Geo3D, and LatLonShape on the same graph is misleading and an apples and oranges comparison since LatLonShape is a higher dimension index targeted for indexing both shapes and points.

What we really need are performance comparisons between the different shape indexing and search implementations (BKD shapes, vs PrefixTree approaches: geohash, quadtree, composite). Preliminary results for 60M shapes are extremely promising:

  • BKD shapes are 45454.55 X more accurate (0.11 cm vs 50 m)
  • BKD shape index has 2.4 X smaller index (9.94 GB vs 24.13 GB)
  • BKD shape indexes 2.4 X faster (1hr 35 mins vs 3hrs 48 mins)
  • BKD shape queries are 50% faster (6.85 QPS vs 4.5 QPS)

Types Removal

We adjusted the deprecation behavior in 6.7 to be more lenient. It will be very common to set include_type_name=true in 6.7 so we've decided to emit a deprecation warning only if include_type_name is not set at all. The warning serves as an important note to users that the request and response format of these APIs will change in a breaking way in 7.0.

Watcher UI

We fixed a lot of bugs in master on the watcher UI this week, disabling unsupported actions for threshold alerts, fixing watch deletion, and adding action types

Changes

Changes in Elasticsearch

Changes in 8.0:

  • Enable JsonThrowablePatternConverterTests with more debug 38721

Changes in 7.1:

  • Add enabled status for token and api key service 38687
  • _cat/indices with Security, hide names when wildcard 38824
  • SQL: Implement :: cast operator 38774
  • Fix failing bwc test against 6.3 38770
  • Clean up ShardSearchLocalRequest 38574
  • mute Failing tests related to logging and joda-java migration 38704
  • Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7x 38597

Changes in 7.0:

  • [Monitoring] Remove include_type_name parameter from GET _template request 38818
  • Fix line separators in JSON logging tests 38771
  • Make 7.0 like 6.7 user agent ecs 38757
  • [Monitoring] Remove _type usages in _search requests 38819
  • Format Watcher.status.lastChecked and lastMetCondition 38626
  • Mute failing test 20_mix_typless_typefull 38781
  • Enable IndexActionTests and WatcherIndexingListenerTests 38738
  • Tie break on cluster alias when merging shard search failures 38715
  • Restore date aggregation performance in UTC case 38221
  • Move testToUtc test to DateFormattersTests 38610
  • Reject setting index.optimize_auto_generated_id after version 7.0.0 28895
  • BREAKING: Drop support for the low-level REST client on JDK 7 38540
  • Mute failing ApiKeyIntegTests 38614
  • Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 7.0 38600

Changes in 6.7:

  • Remove immediate operation retry after mapping update 38873
  • Filter out upgraded version index settings when starting index following 38838
  • Look up connection using the right cluster alias when releasing contexts 38570
  • Handle the fact that ShardStats instance may have no commit or seqno stats 38782
  • Fix PreConfiguredTokenFilters getSynonymFilter() implementations 38858
  • Ensure that maxConcurrentShardRequests is never defaulted to 0 38734
  • [TEST] address testCollectNodes rare failure 38559
  • Fix LocalIndexFollowingIT#testRemoveRemoteConnection() test 38709
  • Only issue a deprecation warning if include_type_name is not set. 38825
  • Copy retention leases when trim unsafe commits 37995
  • Fix PreConfiguredTokenFilters getSynonymFilter() implementations 38839
  • Suppress error message when /proc/sys/vm/max_map_count is not exists. 35933
  • Add dedicated retention lease exceptions 38754
  • Introduce retention lease actions 38756
  • Enable removal of retention leases 38751
  • Prefix java formatter patterns with '8' 38712
  • Mute RetentionLeastIT#testRetentionLeaseSyncedOnRemove 38765
  • Specialize pre-closing checks for engine implementations 38702
  • Concurrent file chunk fetching for CCR restore 38495
  • Prevent possible assertion failure in IndicesQueryCache.close 38714
  • Deprecate the low-level REST client on JDK 7 38542
  • Mute IndexFollowingIT.testIndexFallBehind 38618
  • Mute RemoveCorruptedShardDataCommandIT 38616
  • Mute RetentionLeastIT.testRetentionLeasesSyncOnRecovery on 6.7 (#38597) 38601

Changes in 6.6:

  • Skip BWC tests in checkPart1 and checkPart2 38730
  • Use consistent view of realms for authentication 38815
  • Only flush Watcher's bulk processor if Watcher is enabled 38803
  • Fix synchronization in LocalCheckpointTracker#contains 38755
  • Enhance parsing of StatusCode in SAML Responses 38628
  • Fix GeoHash PrefixTree BWC 38584
  • SQL: Relax StackOverflow circuit breaker for constants 38572
  • SQL: Prevent grouping over grouping functions 38649
  • Enable Dockerfile from artifacts.elastic.co 38552

Changes in 6.5:

  • SQL: fall back to using the field name for column label 38842

Changes in Elasticsearch Management UI

Changes in 7.1:

  • refactoring license management server routes 30845
  • refactoring server routes in index management to use common code 30299
  • [Watcher] Add Index, HipChat, PagerDuty, and Jira Action types on the client 30043

Changes in 7.0:

  • Fix width of Watcher table. 30311
  • Fix ILM dependency upon xpack.index_management.enabled setting. 30592
  • [CCR] Put back API integration test for follower indices 30260
  • Rollup job Functional UI test 30280

Changes in 6.7:

  • disabling actions for threshold alerts that have default actionType 31129
  • [CCR] Fix plugin order collision 30596

Changes in 6.6:

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.1:

  • ODBC Installer API for system info DSN access 110

Changes in 7.0:

  • Integration testing: uninstall any old driver before attempting a new install 114
  • Integration testing: print installer log on failure 113
  • Integration tests fixes for current ES 112

Changes in 6.6:

  • Init all record fields 111

Changes in Rally

Changes in 1.0.4:

  • Add node_name in node-stats docs for ... 646
  • Allow collection of jvm gc section in node-stats telemetry device 644

Changes in Rally Tracks

  • Adds a track that indices geopoints as geoshapes (backport to 6) 65
  • Adds a track that indices geopoints as geoshapes 62

Apache Lucene

Lucene 7.7

Lucene 7.7 has been released on February 11th, and Elasticsearch 6.7 now depends on this new version.

Lucene 8.0

We built a first release candidate of 8.0. Unfortunately a Solr bug has been discovered at about the same time that will likely require us to do a 7.7.1 release and postpone 8.0 a bit.

Other