This Week In Elasticsearch, August 16

Highlights

Automatic cancellation of search requests

We worked on adding support for cancelling requests to the low-level REST client, which unblocks the security manager testing issue we encountered on the search task cancellation PR. We suspect that it's a bug in the HTTP client that is uncovered by the way we reuse HTTP request instance through multiple retries. We're waiting for feedback from the apache async HTTP client dev, either they will fix the bug or we will need to adapt the way we make requests in the low-level client (which would complicate our own code a bit).

SQL

We fixed an issue regarding the deprecation of the generic “interval” parameter for date histograms usage. Also, we fixed another bug that made any date comparison (>, <, >=, !=....) with constants generated by CURRENT_* functions to fail if the date field involved had a different format than the default one.

We have continued the work on CBOR support, bringing in the functionality to unpack the result sets and “forward” them to the application. Along this work the data access pattern for some applications that query data cell-by-cell (like Excel; i.e not “in-bulk”, with bounded arrays) has been optimized. With CBOR, Elasticsearch uses indefinite-length arrays (intended initially for streaming, though not exclusively), which required some further changes in order to avoid double-parsing where possible.

We looked into improving the way SQL does pagination across both scroll and agg requests and raised a PR for it.

Cross Cluster Search (CCS) Docs

We have updated the cross-cluster search docs, featuring new diagrams for “How CCS Works”, re-organized headings, and migrated to top-level nav.

TLS Configuration Changes

We have raised changes that will detect and warn when a server SSL context (like the REST HTTP server) is created with a meaningless combination of verification_mode: none and client_authentication enabled. In this case the server would request a client certificate, but never validate it. That doesn't violate any specs, but is a sign of misconfiguration that could cause administrators to think they have a level of security that they don't. We expect to make this an error in 8.0.

We have been working on detecting other potential misconfigurations for server SSL contexts. We plan to make it an error to configure SSL on the HTTP or transport services without providing a certificate and key (this is already an error for HTTP, but not for transport).

We also intend to make it an error to configure SSL for HTTP or transport, but not explicitly enable or disable it. We have seen cases where an administrator will set up a keystore and truststore, but forget to turn on xpack.security.http.ssl.enabled (or transport), and it wasn't immediately obvious why TLS wasn't working. It will still be possible to enable or disable SSL at will, but if you configure any SSL settings, users should explicitly enable or disable.

Refresh Interval for security

Occasionally we run into cases where a cluster has a template that sets a high refresh interval and is applied to "*", and would end up being applied to the security indices.

This could cause issues as some security APIs do an internal wait-for refresh, which could cause the API to take over 30 seconds to complete, at which point Kibana would timeout.

Understandably, the user would not realise that their Kibana login was failing because their template had an index pattern of "*". We had a discussion about what we could do to detect the problem and either fix it or at least make it obvious to the cluster admin. After a bit of back and forth we realised that the solution was very simple - we just need to set an explicit refresh interval when we create the security indices.

Hadoop

We fixed an integration issue with Cloudera's fork of Spark 2.2+. Spark is elasticsearch-hadoop's most popular integration, and Cloudera is the most popular commercial vendor of Hadoop/Spark. Cloudera's fork introduced a non-passive method signature that is different from the upstream version of Spark. Technically, this fork is branded as Cloudera's CDS (an enhanced Cloudera Spark). For Cloudera consumers on CDH5, CDS is Cloudera blessed way to use a modern version of Spark.

Snapshot Lifecycle Management (SLM)

We added some additional information when retrieving SLM policies. SLM currently takes snapshots in an asynchronous manner; when SLM kicks off a snapshot, we don’t wait to see whether the snapshot is currently running, or if it immediately failed, etc. With #45245 when retrieving any SLM policies, the response will also contain information about currently running snapshots, this info includes the name, state, UUID, and start time as well as any failure information. Since this information is read from the cluster state, there is no external call to retrieve this info from a snapshot repo that could potentially cause users to incur transfer costs.

We also merged #45362 which added a new API (_slm/stats) to retrieve statistics about the SLM’s snapshot successes and failures, as well as information about retention runs, and time spent deleting snapshots. This information is presented globally as well as on a per-policy basis.

Geo

We refactored all of the Geometry classes from lat,lon to x,y in master. We decided to make the change for a few reasons:

  • The current order and naming differs from most of the other libraries and standard that we are using in elasticsearch including JTS, WKT, GeoJSON and naming convention used in geosql functions.
  • We are in process of extending use of Geometry classes in the context of XYShapes and different projections, which makes generic x and y more appropriate than lat and lon.
  • The hierarchy is based on Geometry (not Geography) in the first place, so member naming based on lat and lon doesn't seem to be consistent with the class naming.

It is a breaking java change, but mainly for JDBC geo-sql users. Given that it is a beta feature, and JDBC requires a recompilation anyway, the impact should be relatively minimal.

We have merged support for geo_bounds on shapes in the feature branch, and is now working on geo_grid aggregation for shapes. The circle ingest processor is also close to completion.

Changes in Elasticsearch

Changes in 7.4:
  • Catch AllocatedTask registration failures #45300
  • Update the schema for the REST API specification #42346
  • Add build operating system as build scan tag #45558
  • Remove leftover debugging logic #45556
  • Fix remote cache misses for test tasks #45521
  • Watcher reporting: add email warning if CSV attachment contains values that may be interperted as formulas #44460
  • Add mapper-extras and the RankFeatureQuery in the hlrc #43713
  • Fix bats distro test to build distributions #45529
  • Restore check part 1 and 2 order mistakenly reversed in [#45098] #45522
  • Fix remote cache misses for checkstyle tasks #45512
  • Fix issues with serializing BulkByScrollResponse #45357
  • Add support for cancelling async requests in low-level REST client #45379
  • Add SSL/TLS settings for watcher email #45272
  • Fix bug in copying bytes for socket write #45463
  • Fix location of reaper jar on Windows #45464
  • Retrieve processors instead of checking existence #45354
  • Create index with typeless mapping #45120
  • Geo: Change order of parameter in Geometries to lon, lat #45332
  • Set start of the week to Monday for root locale #43652

Changes in 7.3:
  • Use bundled JDK in Sys V init #45593
  • Fix watcher HttpClient URL creation #45207
  • Fix a bug where mappings are dropped from rollover requests. #45411
  • Make sure to validate the type before attempting to merge a new mapping. #45157
  • [ML-DataFrame] fix starting a batch data frame after stopping at runtime #45340

Changes in 7.2:
  • SQL: adds format parameter to range queries for constant date comparisons #45326

Changes in 6.8:
  • Read git revision directly from repository #45547
  • Prevent Leaking Search Tasks on Exceptions in FetchSearchPhase and DfsQueryPhase #45500
  • Allow _update on write alias #45318

Changes in 8.0:
  • Fix build cache misses caused by embedded reaper jar #45404

Changes in Elasticsearch Hadoop Plugin

Changes in 7.4:
  • Do not retry scroll requests in NetworkClient. #1330
Changes in 6.8:
  • Fix for Cloudera CDS 2.2.0+ Spark credentialsRequired method #1332

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.4:
  • Introduce CBOR format support for REST payloads #169

Changes in Rally

Changes in 1.3.0:
  • BREAKING: Remove lap feature and all references. #739