21 December 2018

This Week in Elasticsearch and Apache Lucene - 2018-12-21

By Jake LandisJay ModiYannick WelschZachary TongBill McConaghyTom CallahanAdrien GrandJason Tedor

Holiday Hiatus

A large portion of our engineering team will be spending time with family over the upcoming holiday season. As we expect the coming two weeks to be relatively low-traffic in terms of updates, we will skip this blog for the upcoming two weeks. Expect to see our next update on or around January 11. Whatever your plans are this holiday season, we wish you the very best.

Elasticsearch Update

Highlights

BKD-backed Geo Shapes

BKD-backed geoshapes have been merged! The new BKD geoshapes will be the new default, and users do not need to configure any of the extra settings associated with quadtrees (precision, etc). Configuring the old settings will re-enable the old quadcells.

We have been sifting through our backlog of geo issues, and stumbled on an issue where the default quadcells took 4.5 hours indexing a small collection of 10m precision shapes (and ultimately crashed from an error). We re-ran the test on the new BKD geoshapes, and the time came down to ~1s with no crash :)

For anyone playing along at home, that's four orders of magnitude, or 16,199 times faster. Yes, really!

Infinite DNS caching, begone!

We implemented a change to override the JVM DNS cache policy. Per JVM defaults, when a security manager is present, positive hostname lookups will be cached for the life of the JVM. This can be a poor user experience, with users watching a DNS change propagate through their entire infrastructure with the exception of Elasticsearch. This behavior impacts cluster discovery, cross-cluster replication and cross-cluster search, reindex from remote, snapshot repositories, webhooks in Watcher, external authentication mechanisms, and the Elastic Stack Monitoring Service.

The reason the JVM has this is to guard against DNS cache poisoning attacks. Yet, there is already a defense in the modern world against such attacks: TLS. With proper certificate validation, even if a resolver falls prey to a DNS cache poisoning attack, using TLS would neuter the attack. With Elasticsearch 6.6.0 we will introduce two new system properties for overriding the default JVM DNS cache policy. These properties will ship configured in the default jvm.options file to set the positive DNS cache to sixty seconds, and the negative DNS cache to ten seconds. Note that for users / customers that upgrade from a previous version, these properties will not automatically show up and they need to add them themselves to override the JVM's default behavior.

Cross-Cluster Replication

Our cross-cluster replication UI is merged! We are shipping not one, but two apps: first, a UI to configure remote clusters; second, a UI to configure auto-follow patterns. Recent work on the UI includes adding a detail panel to the auto-follow patterns table. In addition to giving the user more info about the auto-follow pattern, this panel also provides the user with a convenient link to view the follower indices in Index Management.We also added the UI for deleting auto-follow patterns and implemented numerous UX improvements, for example validating that an auto-follow pattern name doesn't collide with an existing one when creating a new auto-follow pattern. Check it out:

In addition to the UI work, we added a rolling upgrade test for CCR, which uncovered two issues: a translog seralization bug, and that the CCR stats api didn't filter by follower indices. We also added more information to the CCR stats endpoints about the remote clusters being auto followed, such as time since last checking metadata and the corresponding metadata version.

We fixed two problems uncovered by the CCR UI team. The first one was a stack overflow error when removing remote cluster connections while auto followers are still active using that remote cluster. We worked on making CCR resilient against missing remote cluster connections. Shard follow tasks and auto followers will now retry instead of failing with an unhandled error. The second one was an issue in the remote cluster infrastructure where the remote cluster info endpoint would fail if any seed for any remote cluster failed to resolve. We changed the remote connection info to display the configured seed nodes, avoiding a hostname resolution.

We implemented the session management on the leader side for the recovery from remote feature, which will be used to keep a context of files to sync open during the file transfer. He also prototyped a modification to the remote client that allows direct connection to a specific remote node in order to avoid extra hops during the file-based recovery.

Typeless or bust!

We have been working through more type deprecations. This week we deprecated _type as field names in queries, inside a reindex, doc create endpoint, get_source and exists_source API, and implemented a cleanup so that search-related endpoints don't throw duplicate deprecations (due to internal optimizations). The next batch of deprecations will tackle the index creation and mappings APIs, which also happen to be the last of the deprecations.

SQL

We added now functions, moved SQL off from TimeZone to ZoneId and fixed translation of like/rlike statement. We also worked on preparing a new 32-bit odbc driver artifact into our release system, and concluded the work on intervals support for parameters and result sets in odbc. We marked SQL as experimental for 6.5 and beta for 6.6, as well as fixing a variety of bugs.

Zen2

Zen2 will ship as the default in our 7.0-alpha-2 release. We’ve also merged the documentation PR for Zen2. Many thanks to everyone who commented and iterated with us on the docs. If you want to learn how to use Zen2 in the (about-to-be-released) 7.0-alpha-2 release, take a peek at the updated documentation!

SSL Settings

The configuration for TLS/SSL within Elasticsearch has had some form of fallback since the first official release. For instance, when configuring an LDAP realm, the value of ssl.cipher_suites would default to the value of xpack.ssl.cipher_suites. Unfortunately, this behavior is often more confusing than helpful, as a change in a fallback setting will cascade to other components. In 6.7.0, we'll deprecate this fallback of SSL configuration and remove it for 7.0. In 7.0, users will need to configure each SSL configuration explicitly.

Ingest Node

This week we had a discussion about shipping the ingest-geoip and ingest-user-agent by default. Several Filebeat modules rely on the geoip and user_agent ingest processors being available, and users have a poor experience managing these plugins. Given the importance and prevalence of these use-cases, we are choosing to package these plugins as modules with all distributions.

In addition, we fixed/added support for newer ingest node features including: support for default pipelines when indexing through an alias (#36231), support for default pipelines when doing bulk upserts (#36618), and support for the drop processor via a on_failure fork of a pipeline (#36686).

Performance Regression

This week our nightly benchmarks detected a major performance regression in indexing throughput. This regression is due to our ongoing work to enable indexing nanosecond timestamps, predicated by migrating from Joda time to the Java 8 java.time API. Joda time has within the possibility to define multiple parsers for a given date time formatter, but the Java 8 java.time API does not have this possibility. To handle this, our first implementation would try one format after another, catching exceptions as we go until we find a parser that succeeds. Throwing exceptions is expensive though, and it ended up being impactful enough to cause this regression. We’ve partially reverted to the old behavior while we continue to work on what will be our long-term fix.

Changes

Changes in Elasticsearch

Changes in 7.0:

  • Deprecate the document create endpoint. 36863
  • Avoid duplicate types deprecation messages in search-related APIs. 36802
  • Fix cluster state persistence for single-node discovery 36825
  • Add script filter to intervals 36776
  • [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach 36751
  • Deprecate types in index API 36575
  • Allow worddelimitergraph_filter to not adjust internal offsets 36699
  • Add typless endpoints for getsource and existsource 36426

Changes in 6.7:

  • Enable IPv6 URIs in reindex from remote 36874
  • SQL: Make sure now() always uses milliseconds precision 36877
  • Use index-prefix fields for terms of length min_chars - 1 36703
  • Core: Deprecate negative epoch timestamps 36793
  • SQL: Enhance Verifier to prevent aggregate or grouping functions from 36799
  • Add X-Forwarded-For to the logfile audit 36427

Changes in 6.6:

  • SQL: Fix bug regarding histograms usage in scripting 36866
  • [CCR] Report error if auto follower tries auto follow a leader index with soft deletes disabled 36886
  • Remote cluster license checker and no license info. 36837
  • Remove indexing_complete when removing policy 36620
  • SQL: protocol returns ISO 8601 String formatted dates instead of Long for JDBC/ODBC requests 36800
  • [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach 36855
  • QueryRescorer should keep the window size when rewriting 36836
  • Core: Revert back to joda's multi date formatters 36814
  • [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach 36743
  • Use SearchRequest copy constructor in ExpandSearchPhase 36772
  • [GEO] Fork Lucene's LatLonShape Classes to local lucene package 36794
  • BREAKING: Enhance Invalidate Token API 35388
  • Invalidate Token API enhancements - HLRC 36362
  • Make the ingest-geoip databases even lazier to load 36679
  • ingest: fix on_failure with Drop processor 36686
  • ingest: support default pipelines + bulk upserts 36618
  • [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default geo_shape indexing approach 35320
  • Core: Deprecate use of scientific notation in epoch time parsing 36691
  • Deprecation check for classic similarity 36577
  • Deprecation check for renamed bulk threadpool settings 36662
  • Add raw sort values to SearchSortValues transport serialization 36617
  • Fix rollup search statistics 36674
  • Watcher deprecate notification service settings 36403
  • Rename seq# powered optimistic concurrency control parameters to ifSeqNo/ifPrimaryTerm 36757
  • SQL: Extend the ODBC metric by differentiating between 32 and 64bit platforms 36753
  • Expose Sequence Number based Optimistic Concurrency Control in the rest layer 36721
  • Deprecation check for discovery configuration 36666
  • Fix duplicate phrase in shrink/split error message 36734
  • [Painless] Add boxed type to boxed type casts for method/return 36571
  • Do not resolve addresses in remote connection info 36671
  • Deprecation check for audit log prefix settings 36661
  • SQL: Fix translation of LIKE/RLIKE keywords 36672
  • Add doc's sequence number + primary term to GetResult and use it for updates 36680
  • [CCR] Add time since last auto follow fetch to auto follow stats 36542
  • Watcher accounts constructed lazily 36656
  • SQL: Concat should be always not nullable 36601
  • SQL: Scripting support for casting functions CAST and CONVERT 36640
  • Add seq no powered optimistic locking support to the index and delete transport actions 36619

Changes in 6.5:

  • SQL: Fix issue with always false filter involving functions 36830
  • Bump 6.5 branch to version 6.5.5 36857
  • Fix Rollup's metadata parser 36791
  • Synchronize WriteReplicaResult callbacks 36770
  • SQL: Fix wrong appliance of StackOverflow limit for IN 36724
  • SQL: Fix issue with complex HAVING and GROUP BY ordinal 36594

Changes in Elasticsearch Management UI

Changes in 6.6:

  • Remove errant slash from Index Management detail panel. 27605
  • [CCR] Refine warning copy in Remote Clusters. 27620
  • [CCR] Remote Clusters and Cross-cluster Replication apps 26777
  • [Rollups] Fix rollup job wizard bug: coerce histogram interval to Number for validation 27413
  • adding new specs for security endpoints without _xpack prefix 27057
  • Fix search profiler 27326
  • adding loading spinner for index management table 27204

Changes in Elasticsearch SQL ODBC Driver

Changes in 6.6:

  • Fix clean target by removing MSI files and leave .gitignore file intact 83
  • Interval type support 80
  • ODBC request mode only added for catalog queries. New client_id request param. 81
  • Server version check 84

Lucene Update

Lucene 7.6

Lucene 7.6 was released and we upgraded Elasticsearch 6.x to this new release so that it will be used in the upcoming Elasticsearch 6.6. This Lucene release most notably features a fix for a potential corruption when index sorting is enabled, and selective indexing for points: the ability to only index a subset of dimensions.

Faster, smaller geo shapes

LatLonShape tessellates shapes into a triangular mesh. One interesting question is how to index those triangles. Until now the approach was to index the minimum bounding rectangle of those triangles as a 4D point and then to additionally record the 6 coordinates of the triangle. Then searching triangles is a two-step process that first uses the index to efficiently find all minimum bounding rectangles that intersect with the query, and then checks whether the triangle actually matches. This encoding was redundant due to the fact that 4 out of the 6 coordinates of each triangle are also coordinates of the minimum bounding rectangle of the triangle. And there are only 6 ways that triangles can share coordinates with their minimum bounding rectangle (MBR):

Either two opposite vertices of the MBR are also vertices of the triangle (triangles 1 and 2), or 1 vertex of the MBR is also a vertex of the triangles and the two other vertices of the triangle are on the opposite edges of the MBR (triangles 3, 4, 5 and 6). So we changed the encoding so that instead of recording 6 coordinates for the triangle, we now record the two coordinates that are not shared with the MBR of the triangle and a number that identifies how to reconstruct the triangle from the MBR and those two coordinates. Nightly benchmarks reported 37% faster indexing and a 40% less disk usage when indexing 61M points from OpenStreetMap.

Other