This Week in Elasticsearch and Apache Lucene - 2019-01-11


Freeze action added to Index Lifecycle Management

A new freeze action has been added Index Lifecycle management which will allow users to freeze an index in the cold phase of the lifecycle.

Deprecation Info API work complete for 6.x

We have completed all the deprecation info work for 6.x. This allows users to call an API that will check if they are using deprecated settings mapping, etc. in 6.x that would prevent them from successfully upgrading to 7.0. This API will be used by the migration assistant in Kibana to help users with the upgrade to 7.0.

Users will still need to check the deprecation logs to ensure that their client applications are not using deprecated features in their requests.

Enabling Nanosecond Timestamps

Before Christmas we spotted a significant reduction in indexing throughput when multiple date parsers are used. We now have an immediate fix and also an improvement so Java time is now even faster for this scenario in microbenchmarks.

A new store type for accessing index files

Elasticsearch has an expert setting called that controls how we read index files. Currently the default is to use mmapfs which reads the files using memory-mapping. We recently found that mmapfs does not perform well when updates occur all over a large index (in the TB range).

We have now merged a new hybridfs store type which picks the best method of accessing the index files based on the Lucene file type and resulting access pattern. This store type is available from 6.7.0 onwards and will also be the default in Elasticsearch 7.0.0.

The benefit is dependent on the workload and the index size. Workloads with random accesses (bulk updates and queries) and indices that are large compared to the available page cache benefit most from hybridfs.

hybridfs will not be beneficial in every use case and may reduce indexing throughput for update workloads and some queries when the index size is "small" (compared to the available page cache). We are looking into further possibilities to improve the situation for smaller indices as well.

Cross Cluster Replication Follower Index UI

We made a number of additions to the CCR UI to allow users to manage follower indexes including adding a [table and detail view for displaying follower indices (, adding UI actions to pause, resume and unfollow for a follower index, adding support on the Kibana server for fetching and creating follower indices, and adding the ability to configure advanced settings when creating a follower index.

Speeding up shard peer recoveries

We are working to speed up peer recoveries. The current recovery implementation sends one file chunk at a time, waiting for acknowledgment of the previous chunk before sending the next one. The new approach allows sending N chunks in parallel to more efficiently saturate a network pipe, which can half recovery times when using TLS and even have a significant impact when using plain connections.

Closed replicated indices

We have added unique IDs to cluster blocks to power the 2-phase-commit style close index API. With this, he has now merged the new close index API into master, and is currently backporting it to 6.7, where it will be used to provide a clean transition for indices to be frozen.

Faster Top Hits retrieval

There is a new option in the search request to limit the number of total hits that should be tracked. Instead of a boolean true/false it is now possible to set a numeric value in the track_total_hits option that limits the total hits tracked during the request. This option can be used to implement pagination on requests even when the total number of hits is not accurate.

Reindex from remote and SSL with Security

Current reindex doesn't know anything about the certificates that are used by security and trust can only be configured using the JVM wide system properties. We have been working on a solution to this and have opened the first PR, which creates a new library for common ssl configuration and loading of keys and certificates. This will be followed up by work that adds support for defining custom ssl configuration specifically for reindex.

OpenID Connect Support

We are working on creating an OpenID Connect realm. The basic realm infrastructure and necessary endpoints are up for review.


Changes in Elasticsearch

Changes in 7.0:

Lucene 8

Awesome news, the 8x branch has been cut in preparation for the next major Lucene/Solr 8.0 release. Next step is to remove all deprecations from master and ensure that we have viable alternatives for them.


Work is progressing well adding Contains support for BKD-backed geoshapes, and there is now a patch in review. There is also an effort to decrease I/O pressure when merging BKD segments. We have noticed that when merging large segments for high dimensional points(like LatLonShape), there was a lot of I/O. Ignacio has changed the strategy used to perform the merging of segments and it seems it improves the usage of disk space and it actually improves the indexing throughput significantly especially for high dimensions.

Interval Queries

We have been working on improving scoring for the IntervalQuery. Currently this query uses the same term weighting as SpanQuery, in order to improve the scoring from similar functionality like SpanQueries Alan is experimenting with using a sloppy interval frequency and a saturation function to see if more useful scores can be extracted. We hope to open an issue about this in the next couple of days.

PMC extension

Nick Knize has accepted his invitation to join the Lucene PMC in recognition of his efforts driving Geo forward. Congratulations Nick!