November 14, 2017

Elasticsearch 6.0.0 GA released

With 2236 pull requests by 333 commiters added since the release of Elasticsearch 5.0.0, we are proud to announce the release of Elasticsearch 6.0.0 GA, based on Lucene 7.0.1.

A big thank you to all the Elastic Pioneers who tested early versions and opened bug reports, and so helped to make this release as good as it is.

Making Upgrades Easier

No Downtime Upgrades

It’s no fun having to do a full cluster restart when upgrading to a new major version. This time we’ve made it better. You can now do a rolling upgrade (without any cluster downtime) from the latest Elasticsearch 5.x (currently 5.6.3) to Elasticsearch 6.x. There are exceptions to this: most notably if you use X-Pack Security without SSL/TLS enabled. TLS between nodes is required in X-Pack Security in 6.0 and the only way to enable it if you aren’t already using it is to do a full cluster restart, which you can choose to do either in 5.x or as part of your upgrade to 6.0. Make sure to read the Stack upgrade docs before beginning the upgrade process.

Search Across Multiple Elasticsearch Clusters

As with previous major version upgrades, Elasticsearch 6.0 will be able to read indices created in 5.x, but not those created in 2.x. However, instead of needing to reindex all of your old indices, you can choose to leave them in a 5.x cluster and to use Cross Cluster Search to search across both your 6.x and 5.x clusters at the same time.

Migration Assistant

The Kibana X-Pack plugin provides a simple UI to help you to reindex old indices, as well as to upgrade your Kibana, Security, and Watcher indices for 6.0. The Cluster Checkup helper runs a series of checks on your existing cluster to help you correct any issues before upgrade. You should also consult your deprecation logs to ensure that you are not using features that have been removed in 6.0.

Resilience and Efficiency

Faster Restarts and Recoveries with Sequence IDs

One of the biggest features in the 6.0 release is sequence IDs, which allows for operations-based shard recovery. Previously, if a node disconnected from the cluster because of a network problem or a node restart, each shard on the node would have to be resynced by comparing segment files with the primary shard and copying over any segments that were different. This could be a long and costly process, that made even rolling restarts of nodes very slow. With sequence IDs, each shard will be able to replay just the operations missing from that shard making the recovery process much more efficient.

Major Improvements for Sparsely Populated Fields

Doc-values provide a fast columnar data store - it’s part of the magic that makes aggregations so fast in Elasticsearch. Previously, a storage slot was reserved for every field in every column. If many fields occurred only in a few documents, this could result in a huge waste of disk space. Now, you pay for what you use. Dense fields will use the same amount of space as before, but sparse fields will be significantly smaller. Not only does this reduce disk space usage, it also reduces merge times and improves query throughput as the file system cache can be better utilised.

Faster Query Times with Sorted Indices

Imagine that you have a large search-heavy index. Searches should be super-fast, but a significant part of every search request is sorting the results into the correct order in order to return just the top 10 best hits. With index sorting, you can pay the price of sorting at index time (30-40% of throughput) instead of at search time. That way, a search can terminate as soon as it has gathered sufficient hits.

To take advantage of this, your documents need to be sorted at index time in the same order as will be used for your primary sort criterion at search time, e.g. by price or timestamp. This means that it won’t work well where your primary sort is on the relevance _score. It also isn’t suitable for searches with aggregations, as aggregations have to examine all documents regardless and can’t terminate early.

However, there is another non-obvious benefit of index sorting. Sorting on low-cardinality fields such as age, gender, is_published, which are commonly used as filters, can result in more efficient searches as all potential matching documents are grouped together.

Search Scalability

Searches across many shards have been made more scalable by adding:

A fast pre-check phase which can immediately exclude any shards that can’t possibly match the query.
Batched reduction of results to reduce memory usage on the coordinating node.
Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster.

Distributed Watch Execution

X-Pack Watcher used to execute all of its watches on the master node, which limited the number of watches that could be run and added stress to the master node. Distributed watch execution moves watch execution to the nodes that hold the shards of the watcher index, so that your watches can scale with your cluster.

Search and Indexing

The biggest adjustment that needs to be made in order to migrate to 6.0 is the requirement that indices have only a single mapping type. This is part of the process to remove mapping types altogether. Multi-type indices created in 5.x will continue to function as before, but new indices may only have a single mapping type. More details about why and how we are removing mapping types can be found in Removal of Types.

We’ve also added some new features:

The significant_text aggregation which is like significant_terms, but works on text fields by re-analysing the _source instead of using masses of heap space for fielddata.
The new ip_range field type field type allows you to index ranges of IPv6 and IPv6 addresses.
The new icu_collation_keyword field type provides support for language specific sort orders.
The _all field has been removed in favour of searching all fields by default in the query_string and simple_query_string queries. This has resulted in a significant disk space savings in many out-of-the-box situations. This behaviour is configurable: a list of default fields can be provided per index.

Security Security Security

There are two important changes coming in X-Pack Security. The first is that we no longer use changeme as a default password as this leaves the forgetful user without security. Instead, we provide a tool to generate and set strong passwords for reserved users the first time the cluster is started.

The second change is that TLS/SSL between nodes is required when security is enabled. With this change, besides encrypting node-to-node communication, we can identify nodes which are allowed to join the cluster by virtue of them possessing a trusted certificate. Rest assured, we provide you with a simple command line tool called certgen to help you generate certificates easily.

Conclusion

Please download Elasticsearch 6.0.0, try it out, and let us know what you think on Twitter (@elastic) or in our forum. You can report any problems on the GitHub issues page.