This Week in Elasticsearch and Apache Lucene - 2019-12-06

Elasticsearch Highlights

Rally

This week we merged elastic/rally#830, which adds a completely new approach to how Elasticsearch clusters are managed in Rally. Up to this point, Rally was only able to spin up uniform clusters, i.e. each cluster node is configured identically but this hinders automating more complex benchmark setups. This PR is a complete game changer and allows users to spin up arbitrarily complex cluster architectures (hot-warm, hot-warm-cold, with or without dedicated master nodes, etc.) for any Elasticsearch version between 2.0 and today’s latest commit on master. This new functionality is also achieved without relying on Rally’s internal actor system that will eventually be replaced to reduce complexity and improve stability. It is already available on the development version of Rally and will be the highlight of the upcoming 1.4.0 release. More examples are available in the documentation.

Snapshots

We merged a significant change to how we handle repository metadata. It uses the cluster state for keeping track of the blob in which the latest valid repository metadata can be found, allowing us to work around S3's eventually consistent nature. With this change, the eventual consistency will no longer make us accidentally overwrite, and thus corrupt in some cases, the repository metadata.

There is one follow-up change left in this effort that will make use of the new information in the cluster state to make a number of repository operations more efficient. The efficiency gains come from not having to list out the repository contents to find the latest version of its metadata and being able to remove a number of fallbacks that we used to work around issues that were resolved automatically by keeping a pointer to the latest metadata in the cluster state.

Analytics

A new histogram field type was merged on master and 7.x at the end of last week! This field type will allow us to do things like store Promethus histograms.

Geo

After a couple of weeks of getting the EdgeTree into a state that we can properly evaluate the differences in performance of TriangleTree and EdgeTree, it seems most benchmarks suggest it would benefit our aggregations to be backed by the TriangleTree instead of the GeometryTree. This goes against our consensus hypothesis when first starting deciding which tree to implement. Thanks to our determinism to demonstrate that it is likely our consensus is misguided, we had the luxury of actually testing both strategies against one another.

To help demonstrate which types of queries (in this case, tile queries) favor each tree, we visualized a precision=9 tiling of Switzerland and mapped the cost of each query. One major failure of the EdgeTree is for queries of tiles that lie just outside (and east of) the shape, but within the shapes bounding box. This results in the EdgeTree traversing O(n) edges. The EdgeTree tends to be slightly faster overall for most queries, but when it is slower than TriangleTree it is much slower which might outweigh its other advantages.

Monitoring

We are discussing the migration strategy for 7.x to 8.0 with respect to internal collectors and metricbeat collectors. We decided to decouple the shape changes of the monitoring documents from the upgrade to reduce the number of moving parts and to allow monitoring to continue to work across versions. We also decided that metricbeat collection should not be a requirement prior to upgrade and will need some adjustments to the internal collectors to ensure a smoother upgrade path.

Snapshot Lifecycle Management in Cloud

We merged a PR that adds Cloud support to SLM. These changes include adequate protections for Cloud-managed policies.

TLS

We have opened a PR for a new “elasticsearch-certutil http" utility.

This sub-command provides a guided process for creating SSL certs for the Elasticsearch HTTP (REST) interface.

As always, It Depends, but most customers are best served by using different certificates for transport and http, because the needs & usage of those interfaces are different.

For transport you want to use certificates to lock down your cluster. Typically that means running a custom Certificate Authority for the cluster so that the nodes trust one another, and no one else.

However, for HTTP you generally want to support access from a variety of clients, in a variety of languages that have their own builtin trusted CAs (or they use the Operating System’s CA list). Ideally you want to use a corporate CA, or if you don’t have one, a single CA for all of your ES clusters so that your clients can be configured once and then be able to connect to all of your clusters. You also need a copy of that CA in formats that are suitable for each client (For most clients that’s PEM, but JVM based clients will typically find PKCS#12 more helpful), and instructions for how to configure those clients.

Over time, we will evolve our instructions to recommend using "certutil cert” for transport level certificates, and “certutil http” for http level certificates.

Apache Lucene Highlights

Lucene 8.4: A branch should be cut later this week, and the first RC will be built shortly afterwards.

Changes in Elasticsearch

Changes in 8.0:

  • Silence lint warnings in server project - part 2 #49728
  • Migrate some of the Docker tests from old repository #49079
  • Add healthchecks to distro docker-compose.yml #49710

Changes in 7.6:

  • Consistent case in CLI option descriptions #49635
  • Fix task input for docker build #49814
  • Use Cluster State to Track Repository Generation #49729
  • Add reusable HistogramValue object  #49799
  • Fix invalid break iterator highlighting on keyword field #49566
  • Fixes a bug in interval filter serialization #49793
  • [Transform] automatic deletion of old checkpoints #49496
  • Scripting: add available languages & contexts API #49652
  • Replicate write actions before fsyncing them #49746

Changes in 7.5:

  • Fix external integ test zip dep to expect a zip #49813
  • Extend systemd timeout during startup #49784
  • [Transform] Fix possible audit logging disappearance after rolling upgrade #49731
  • SQL: fix LOCATE function optional parameter handling  #49666
  • SQL: fix NULL handling for FLOOR and CEIL functions #49644
  • SQL: handle NULL arithmetic operations with INTERVALs #49633

Changes in 6.8:

  • Support es7 node http publish_address format #49279

Changes in Rally

Changes in 1.4.0:

  • Expose API for cluster settings #831
  • Manage Elasticsearch nodes with dedicated subcommands #830
  • Only keep the most recent build log #832