Today we are pleased to announce the release of Elasticsearch 5.4.0, based on Lucene 6.5.0. This is the latest stable release.
Latest stable release in 5.x:
You can read about all the changes in the release notes linked above, but there are a few changes which are worth highlighting below.
When you perform a search request, the node which receives the request becomes the coordinating node which is in charge of forwarding the shard-level requests to the appropriate data nodes, collecting the results, and merging them into a single result set. The memory use on the coordinating node varies according to the number of involved shards. Previous, we had added a 1,000 shard soft limit to try to prevent coordinating nodes from using too much memory.
That said, it is quite easy to reach the 1,000 shard limit, especially with the recent release of Cross Cluster Search. As of 5.4.0, Top-N search results and aggregations are reduced in batches of 512, which puts an upper limit on the amount of memory used on the coordinating node, which has allowed us to set the shard soft limit to unlimited by default.
There has been much work recently on improving Lucene’s handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Multi-word synonyms do this and have long been buggy when used with proximity queries.
Thanks to the recent addition of the
synonym_graph token filter token filter as well as improvements to Lucene’s query parsers to translate the token graph into separate queries, such analysis chains are finally handled correctly at search time. Since 5.2, we’ve also added the
word_delimiter_graph token filter, and graph-enabled the
common_grams token filters, and the
kuromoji_tokenizer. There is also the
flatten token filter which needs to be used as the final token filter at index time to convert a graph into a form which can be indexed.
This release ships with a number of query optimizations. The commonly used
range query needs to look at every document in the index, and so can become a bottleneck in query execution. The new
range query will automatically choose the more efficient of two query modes, based on the other queries in the search request. See Better Query Planning for Range Queries in Elasticsearch for more info.
On top of that, some
nested queries have received a speed boost as we are being cleverer about which filters need to be applied to a particular
nested query. For instance, if the field being queried only exists in
nested documents, then we no longer need a filter to exclude the parent document.
And finally, large
terms queries were slower to parse because of the keyword
normalizers added in 5.2. In this release, only fields that have a custom
normalizer are normalised.
Some other changes worth mentioning are:
- We’ve tweaked the default Netty receive predictor size to 64 kB to balance throughput with garbage collection and heap allocation.
Date-range queries in the percolator can now use
now, which will be calculated at execution time.
unifiedhighlighter gained support for
We’re slowly migrating sensitive settings (like S3 and EC2 passwords) to use the secure settings keystore, instead of being stored in the plain text
single-nodediscovery type disables bootstrap checks, which makes it easier for Docker users to run tests against Elasticsearch with the TransportClient.