29 June 2015 News

Elasticsearch 2.0.0.beta1 coming soon!

By Clinton Gormley

Update November 2, 2015: The real deal is here! Learn more about Elasticsearch 2.0 GA.

We are gearing up to release Elasticsearch 2.0.0.beta1 which takes advantages of all of the improvements available in Lucene 5.2.1. This forthcoming release will deliver a few awesome user facing features such as:

Pipeline Aggregations

The ability to run aggregations such as derivatives, moving average, and series arithmetic on the results of other aggregations. This functionality was always do-able on the client-side, but pushing the computation into Elasticsearch makes it easier to build more powerful analytic queries, while simplifying client code considerably. It opens up the potential for predictive analytics and anomaly detection.

Query/Filter merging

Filters are no more. All filters clauses have now become query clauses instead. When used in query context, they have an effect on relevance scoring and, when used in filter context, they simply exclude documents which don’t match, just like filters do today. This restructuring means that query execution can be automatically optimized to run in the most efficient order possible. For instance, slow queries like phrase and geo queries first execute a fast approximate phase, then trim the results with a slower exact phase. In filter context, frequently used clauses will be cached automatically whenever it makes sense to do so.

Configurable store compression

The index.codec setting allows you to choose between the LZ4 compression for speed (default), or DEFLATE for reduced index size (best_compression). This is particularly useful for logging, where old indices can switch to best_compression before being optimized.

Blog posts about the above topics will follow shortly.

Performance and resilience

This headline list of features seems pretty short to warrant a major new release. The reason is that most of the changes in 2.0 are internal features that are not immediately visible to the user.

The themes of this new major version are performance, stability, solidity, predictability, and ease-of-use:

  • Things should work the way you expect them to work, without surprises.
  • Elasticsearch should provide meaningful feedback if you do something wrong.
  • You shouldn’t need to fiddle with low-level settings, when Elasticsearch can make better decisions on your behalf.
  • And above all, your data should be safe.

These goals are by no means complete — there are still many improvements to come — but we have made huge progress already with over 500 new commits in the 2.x branch, such as:

  • Use on-disk doc values by default instead of in-memory fielddata, to reduce heap usage and increase scalability.
  • Reduce heap memory usage during segment merging.
  • Improved compression of norms, previously a big user of heap space.
  • Make writes durable by default by fsyncing the transaction log after every request.
  • All file changes are atomic — no more partially written files.
  • Auto-throttling of merges.
  • Faster phrase and span queries.
  • Compressed bitsets for more efficient filter caching.
  • Cluster state diffs for lighter cluster state updates.
  • Structured readable JSON exceptions.
  • More fine-grained Lucene memory reporting.
  • Bind only to localhost by default, to prevent a dev node joining another cluster unintentionally.
  • Parent/child rewritten to take advantage of optimal query execution.
  • Run with minimal permissions under the Java Security Manager.
  • All core plugins have been moved to the main elasticsearch repository and will be released in sync with each version of Elasticsearch.

Things to know before upgrading

Major version upgrades give us the opportunity to clean out the cruft. As much as possible, we have tried to provide an easy, backwards-compatible upgrade path for each of these changes. However, there are two changes in particular that may require action from you before you are able to upgrade to Elasticsearch 2.0.

The first is to do with field and type mappings. The mapping APIs, today, are too lenient. We rely on users reading up on best practices instead of providing built-in protection. In 2.0, mappings will be stricter and safer, but some changes will not be backward compatible. You can read more about this subject in The Great Mapping Refactoring.

The second change is for our users who have been with us since Elasticsearch 0.20 or before — versions which used Lucene 3.x. Elasticsearch 2.x is based on Lucene 5 and ships with support for reading indices written in Lucene 4.x but not Lucene 3.x.

If you have indices created by Elasticsearch 0.20 or before, you will not be able to start an Elasticsearch 2.x cluster. You will either need to delete these old indices or to upgrade them with the upgrade API in Elasticsearch 1.6.0 or above.

The upgrade API performs two jobs:

  • Rewrites any segments that use an older Lucene format into the latest format.
  • Adds a setting to mark the index as readable by Elasticsearch 2.x.

While it is a good idea to upgrade all segments to the latest version, you can opt to do the least amount of work necessary before upgrading — upgrade just Lucene 3.x segments — by specifying the only_ancient_segments parameter.

Elasticsearch Migration Plugin

We have released the Elasticsearch Migration Plugin to help you check whether you need to upgrade your indices, or take any other action, before migrating to Elasticsearch 2.0. Please download and run it on your cluster before upgrading.

First install the plugin:

./bin/plugin -i elastic/elasticsearch-migration
        

The plugin can be installed on a live cluster, there is no need to restart the node.

Then open the plugin in your browser by following this link:

http://localhost:9200/_plugin/migration 

(Change localhost:9200 to the hostname of node where the plugin is installed.)

If you find any bugs in the Migration plugin, or have suggestions to improve it, please open an issue on the GitHub issue tracker.