Product release

Elasticsearch 5.4.0 released

Today we are pleased to announce the release of Elasticsearch 5.4.0, based on Lucene 6.5.0. This is the latest stable release.

Latest stable release in 5.x:

You can read about all the changes in the release notes linked above, but there are a few changes which are worth highlighting below.

Batched Reduction of Search Results

When you perform a search request, the node which receives the request becomes the coordinating node which is in charge of forwarding the shard-level requests to the appropriate data nodes, collecting the results, and merging them into a single result set. The memory use on the coordinating node varies according to the number of involved shards. Previous, we had added a 1,000 shard soft limit to try to prevent coordinating nodes from using too much memory.

That said, it is quite easy to reach the 1,000 shard limit, especially with the recent release of Cross Cluster Search. As of 5.4.0, Top-N search results and aggregations are reduced in batches of 512, which puts an upper limit on the amount of memory used on the coordinating node, which has allowed us to set the shard soft limit to unlimited by default.

Accurate Proximity Queries for Multi-Word Synonyms

There has been much work recently on improving Lucene’s handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Multi-word synonyms do this and have long been buggy when used with proximity queries.

Thanks to the recent addition of the synonym_graph token filter token filter as well as improvements to Lucene’s query parsers to translate the token graph into separate queries, such analysis chains are finally handled correctly at search time. Since 5.2, we’ve also added the word_delimiter_graph token filter, and graph-enabled the shingles, cjk, ngram, and common_grams token filters, and the kuromoji_tokenizer. There is also the flatten token filter which needs to be used as the final token filter at index time to convert a graph into a form which can be indexed.

Optimized Query Execution

This release ships with a number of query optimizations. The commonly used range query needs to look at every document in the index, and so can become a bottleneck in query execution. The new range query will automatically choose the more efficient of two query modes, based on the other queries in the search request. See Better Query Planning for Range Queries in Elasticsearch for more info.

On top of that, some nested queries have received a speed boost as we are being cleverer about which filters need to be applied to a particular nested query. For instance, if the field being queried only exists in nested documents, then we no longer need a filter to exclude the parent document.

And finally, large terms queries were slower to parse because of the keyword normalizers added in 5.2. In this release, only fields that have a custom normalizer are normalised.

Other Notable Changes

Some other changes worth mentioning are:

  • We’ve tweaked the default Netty receive predictor size to 64 kB to balance throughput with garbage collection and heap allocation.
  • Date-range queries in the percolator can now use now, which will be calculated at execution time.
  • The unified highlighter gained support for fragment_length.
  • We’re slowly migrating sensitive settings (like S3 and EC2 passwords) to use the secure settings keystore, instead of being stored in the plain text elasticsearch.yaml file.
  • The new single-node discovery type disables bootstrap checks, which makes it easier for Docker users to run tests against Elasticsearch with the TransportClient.

Conclusion

Please download Elasticsearch 5.4.0, try it out, and let us know what you think on Twitter (@elastic) or in our forum. You can report any problems on the GitHub issues page.