Elasticsearch 1.6.0 released
Today, we are pleased to announce the release of Elasticsearch 1.6.0, based on Lucene 4.10.4. This is the latest stable version of Elasticsearch and is packed with awesome new features:
You can read the full changes list and download Elasticsearch 1.6.0 here.
Prior to this release, restarting a node for maintenance or to do a rolling upgrade would, in many cases, require re-copying all of the data for every shard on the node, whether needed or not. This new synced flush feature ensures that, for sync-flushed indexes, the existing data can be re-used, allowing the cluster to go green much more quickly.
Here’s how it worked before this change: When an existing replica shard recovers from its primary after a node restart, the first step is to compare the segments in the primary to the segments in the replica, and to copy over any segments that are different. The problem is that segment merges on the primary and the replica are independent and so the segments on each shard may be completely different, even though they contain the same data.
With the new synced-flush feature, a
sync_id is written to the primary and replica shards to confirm that the contents of the shards are identical, meaning that recovery can completely skip the step of comparing segments. This greatly increases the speed of recovery.
A synced flush occurs automatically on any idle indices: indices that have not had any index, update, or delete requests during the previous 5 minutes. This is especially useful for the logging use case — yesterday’s index will be automatically synced 5 minutes after indexing stops.
If you need to restart a node or the whole cluster and don’t want to wait for syncing to happen automatically, you can:
- stop indexing (and wait for ongoing requests to stop)
- disable shard allocation
- issue a synced-flush request
- restart the node
- re-enable shard allocation
- wait until the cluster state is green
- resume indexing
NOTE: The "disable allocation" step is essential. Without it, Elasticsearch will immediately try to reallocate the shards on the restarting node to a different node, which requires copying all of the shard data to the new node.
Users with many nodes and indices may have noticed how shard recovery after a full cluster restart can appear to stall for long periods, while nothing appears to happen. During these stalls, even lightweight actions like updating a cluster setting can return an exception or take a long time to have effect. A symptom of this issue is a growing pending tasks queue.
The cause of these delays is the shard allocation process: it reaches out to all data nodes to find out which ones have copies of the shards that need to be allocated. Data nodes with many shards and slow disks can take a long time to respond, especially when ongoing shard recoveries are already using a lot of I/O. Up until this version, the request for shard info was synchronous: cluster state updates blocked while waiting for the information required to continue with the allocation process.
The change in #11262 makes this request for information asynchronous. Cluster state updates are no longer blocked by this task, meaning that pending tasks can be processed much more quickly, and the cluster can be more responsive to changes. The number of ongoing shard info requests is reported as the
number_of_in_flight_fetch key in the cluster-health API.
Additionally, if a shard fails to recover for whatever reason, the cluster will avoid trying to allocate the shard to the same node until the shard has been successfully recovered elsewhere.
Elasticsearch returns all the information that your application may need to make decisions. For instance, a search request will return the
_source field for every hit. But sometimes you just don’t need all of this information. Sometimes transferring all of this extra data across a slow network can be a major source of latency.
Users have asked for special settings to disable this search metadata, and for yet other settings to control the response format of other APIs. The change in #10980 adds the ability to filter any JSON response body down to just the elements that you need, using the
For instance, if all you want from a search request is the
total number of hits and each element in the
hits array, you could specify the following:
To retrieve just the
http_address for each node from the nodes-info API, use a wildcard (
*) to match the node names:
* acts as a wildcard for a single step in the JSON hierarchy, while a double
** will match across multiple levels. Multiple filters can be specified using commas to separate the patterns. See Response filtering for more information.
This release includes a change to tighten up the security around the shared file system repositories used by snapshot-restore. Currently, users of Elasticsearch can write a
.snapshot file to any directory that is writeable by the Elasticsearch process. The change in #11284 makes it mandatory to specify which directories may be used for the repository. Appropriate directories should be specified in the
config/elasticsearch.yml config file, under the
A properly configured Elasticsearch instance is not susceptible to this security issue:
Run Elasticsearch as the
elasticsearchuser, not as
Ensure that the
elasticsearchuser only has write permissions on the
datadirectory and whichever directory should be used for the shared file system repository.
- Use a firewall, proxy, or Shield to prevent snapshot API calls from some or all users
We have been assigned CVE-2015-4165 for this issue.
Elasticsearch version 2.0 and above will depend on Lucene 5, and will no longer be able to read indices containing segments written by Lucene 3 (versions of Elasticsearch before 0.90). These “ancient indices” need to be upgraded to Lucene 4 and marked as 2.0-compatible, otherwise you will not be able to migrate to Elasticsearch 2.0.
The upgrade API can already be used to upgrade all the segments in an index to the latest Lucene format, to take advantage of performance improvements and bug fixes. Now it will also write a setting to mark ancient indices as 2.0-compatible. As a bonus, the
upgrade_only_ancient_segments option will upgrade Lucene 3 segments only, to reduce the work required before migrating.
Kibana users have found highlighting in Elasticsearch problematic for two reasons:
- Specifying field names with a wildcard returns fields which are not appropriate for highlighting (e.g. dates and numeric fields).
- Some old indices contain very large terms (> 32kB) which cause highlighting to fail. In more recent versions, these large terms are rejected at index time.
The change in #11364 fixed both of these problems: a wildcarded field name will only return string fields, and exceptions from too-long-terms will be ignored.
Fast garbage collections are essential for node stability and performance. Allowing even a few bytes of the heap to be swapped out to disk can have a huge impact on garbage collection, and should be avoided at all costs.
Scripts can be specified inline in a request, indexed in the special
.scripts index, or stored in a file in the
config/ directory. Previously, you had to choose between enabling or disabling both inline and indexed scripts together, even if you were able to protect the
.scripts index with a proxy or with Shield.
With the fine-grained script settings added in #10116, you can now enable or disable inline, indexed, and file scripts independently, and per-language. Also, you can, for example, allow scripts in the search API but disable them in the update API.