With 2236 pull requests by 333 commiters added since the release of Elasticsearch 5.0.0, we are proud to announce the release of Elasticsearch 6.0.0 GA, based on Lucene 7.0.1.
A big thank you to all the Elastic Pioneers who tested early versions and opened bug reports, and so helped to make this release as good as it is.
It’s no fun having to do a full cluster restart when upgrading to a new major version. This time we’ve made it better. You can now do a rolling upgrade (without any cluster downtime) from the latest Elasticsearch 5.x (currently 5.6.3) to Elasticsearch 6.x. There are exceptions to this: most notably if you use X-Pack Security without SSL/TLS enabled. TLS between nodes is required in X-Pack Security in 6.0 and the only way to enable it if you aren’t already using it is to do a full cluster restart, which you can choose to do either in 5.x or as part of your upgrade to 6.0. Make sure to read the Stack upgrade docs before beginning the upgrade process.
As with previous major version upgrades, Elasticsearch 6.0 will be able to read indices created in 5.x, but not those created in 2.x. However, instead of needing to reindex all of your old indices, you can choose to leave them in a 5.x cluster and to use Cross Cluster Search to search across both your 6.x and 5.x clusters at the same time.
The Kibana X-Pack plugin provides a simple UI to help you to reindex old indices, as well as to upgrade your Kibana, Security, and Watcher indices for 6.0. The Cluster Checkup helper runs a series of checks on your existing cluster to help you correct any issues before upgrade. You should also consult your deprecation logs to ensure that you are not using features that have been removed in 6.0.
One of the biggest features in the 6.0 release is sequence IDs, which allows for operations-based shard recovery. Previously, if a node disconnected from the cluster because of a network problem or a node restart, each shard on the node would have to be resynced by comparing segment files with the primary shard and copying over any segments that were different. This could be a long and costly process, that made even rolling restarts of nodes very slow. With sequence IDs, each shard will be able to replay just the operations missing from that shard making the recovery process much more efficient.
Doc-values provide a fast columnar data store - it’s part of the magic that makes aggregations so fast in Elasticsearch. Previously, a storage slot was reserved for every field in every column. If many fields occurred only in a few documents, this could result in a huge waste of disk space. Now, you pay for what you use. Dense fields will use the same amount of space as before, but sparse fields will be significantly smaller. Not only does this reduce disk space usage, it also reduces merge times and improves query throughput as the file system cache can be better utilised.
Imagine that you have a large search-heavy index. Searches should be super-fast, but a significant part of every search request is sorting the results into the correct order in order to return just the top 10 best hits. With index sorting, you can pay the price of sorting at index time (30-40% of throughput) instead of at search time. That way, a search can terminate as soon as it has gathered sufficient hits.
To take advantage of this, your documents need to be sorted at index time in the same order as will be used for your primary sort criterion at search time, e.g. by price or timestamp. This means that it won’t work well where your primary sort is on the relevance
_score. It also isn’t suitable for searches with aggregations, as aggregations have to examine all documents regardless and can’t terminate early.
However, there is another non-obvious benefit of index sorting. Sorting on low-cardinality fields such as
is_published, which are commonly used as filters, can result in more efficient searches as all potential matching documents are grouped together.
Searches across many shards have been made more scalable by adding:
- A fast pre-check phase which can immediately exclude any shards that can’t possibly match the query.
- Batched reduction of results to reduce memory usage on the coordinating node.
- Limits to the number of shards which are searched in parallel, so that a single query cannot dominate the cluster.
X-Pack Watcher used to execute all of its watches on the master node, which limited the number of watches that could be run and added stress to the master node. Distributed watch execution moves watch execution to the nodes that hold the shards of the watcher index, so that your watches can scale with your cluster.
The biggest adjustment that needs to be made in order to migrate to 6.0 is the requirement that indices have only a single mapping type. This is part of the process to remove mapping types altogether. Multi-type indices created in 5.x will continue to function as before, but new indices may only have a single mapping type. More details about why and how we are removing mapping types can be found in Removal of Types.
We’ve also added some new features:
significant_textaggregation which is like
significant_terms, but works on text fields by re-analysing the
_sourceinstead of using masses of heap space for fielddata.
ip_rangefield type field type allows you to index ranges of IPv6 and IPv6 addresses.
icu_collation_keywordfield type provides support for language specific sort orders.
_allfield has been removed in favour of searching all fields by default in the
simple_query_stringqueries. This has resulted in a significant disk space savings in many out-of-the-box situations. This behaviour is configurable: a list of default fields can be provided per index.
There are two important changes coming in X-Pack Security. The first is that we no longer use
changeme as a default password as this leaves the forgetful user without security. Instead, we provide a tool to generate and set strong passwords for reserved users the first time the cluster is started.
The second change is that TLS/SSL between nodes is required when security is enabled. With this change, besides encrypting node-to-node communication, we can identify nodes which are allowed to join the cluster by virtue of them possessing a trusted certificate. Rest assured, we provide you with a simple command line tool called
certgen to help you generate certificates easily.