Today we are excited to announce the release of Elasticsearch 5.0.0-beta1 based on Lucene 6.2.0. This is the sixth in a series of pre-5.0.0 releases designed to let you test out your application with the features and changes coming in 5.0.0, and to give us feedback about any problems that you encounter.
Open a bug report today and become an Elastic Pioneer.
IMPORTANT: This is a beta release and is intended for testing purposes only. Indices created in this version will not be compatible with Elasticsearch 5.0.0 GA.
DO NOT DEPLOY IN PRODUCTION
- Download Elasticsearch 5.0.0-beta1
- Elasticsearch 5.0.0-beta1 release notes
- Elasticsearch 5.0.0-alpha5 release notes
- Elasticsearch 5.0.0-alpha4 release notes
- Elasticsearch 5.0.0-alpha3 release notes
- Elasticsearch 5.0.0-alpha2 release notes
- Elasticsearch 5.0.0-alpha1 release notes
- Elasticsearch 5.0 breaking changes
Over 300 enhancements and bug fixes have been added since 5.0.0-alpha5 (all of which you can read about in the release notes linked above), but there are three changes in this release that deserve special mention below: huge improvements to indexing performance, switching
geo_point fields to Lucene’s LatLonPoint, and making Painless the new default scripting language.
The Elasticsearch Migration Helper is a site plugin designed to help you to prepare for your migration from Elasticsearch 2.3.x/2.4.x to Elasticsearch 5.0. It comes with three tools:
- Cluster Checkup
- Runs a series of checks on your cluster, nodes, and indices and alerts you to any known problems that need to be rectified before upgrading.
- Reindex Helper
- Indices created before v2.0.0 need to be reindexed before they can be used in Elasticsearch 5.x. The reindex helper upgrades old indices at the click of a button.
- Deprecation Logging
- Elasticsearch comes with a deprecation logger which will log a message whenever deprecated functionality is used. This tool enables or disables deprecation logging on your cluster.
Instruction for install the Elasticsearch migration helper.
This release includes a number of changes which have increased indexing performance by 80% in our append-only two-node benchmarks. The first change benefits the append-only use case where document IDs are auto-generated by Elasticsearch. Because we know that a document with the same ID does not already exist, Elasticsearch can skip the version check and add the document directly. We first tried to enable this optimization two years ago but back then it resulted in adding duplicate documents during shard relocation. Now, the shard relocation and handover process has evolved enought that we can ensure that duplicate documents are not added.
In 2.0, we added the guarantee that the transaction log would be fsync’ed to disk before a write is acknowledged to the user. The fsync was a synchronous call which effectively blocked indexing progress until the call returned. This release changes the fsync call to be asynchronous so that indexing and document replication can continue during fsync, yet it maintains the same guarantees as before. This is a big win for users with spinning disks for whom fsync is a slow operation.
Search in Elasticsearch is near real-time, meaning that a new segment must be written before the documents it contains become visible to search. Real-time GET (retrieving a document by ID) was implemented by maintaining an in memory list of the documents that have been written to the transaction log but not yet been written to a segment, and their offsets in the translog. This added a lot of overhead and complexity for a relatively infrequent use case — most of the documents you GET are already in a segment. Instead, we now maintain just a list of document IDs without translog offsets. If a recently written document is requested, Elasticsearch performs a refresh and returns the document from the new Lucene segment. Removing the offsets from memory frees up more space in the indexing buffer and greatly reduces the amount of young garbage that has to be collected. This does mean that frequent updates to the same document (e.g. a counter — not a recommended use of Elasticsearch) will be slower.
Elasticsearch 2.3 has already seen significant improvements to geopoint search. In this release, the implementation of geo-point fields has been switched from GeoPoint to LatLonPoint with doc values. This change uses a bit more disk space but doubles the speed of geo-distance queries, as can be seen in Lucene’s geo benchmarks.
script.legacy.default_lang, which defaults to Groovy.
- Elasticsearch now uses Log4j2 for logging, which exposes new log management options.
- Deprecation logging is now enabled by default, as the logs are limited by size.
- The update-aliases action now supports deleting an index and adding an alias as a single step, allowing an existing index to be replaced with a newer index+alias atomically.
- Min and max heap sizes now default to 2GB.
Also take a look at the release announcements for Elasticsearch 5.0.0-alpha5, Elasticsearch 5.0.0-alpha4, Elasticsearch 5.0.0-alpha3, Elasticsearch 5.0.0-alpha2, and Elasticsearch 5.0.0-alpha1 to read about features like:
- Sysadmin-friendly index creation
- Java REST client
- Rollover indexing
- Wait for refresh
- Ingest Node
- Painless Scripting
- Instant Aggregations
- Text/Keyword fields replacing String
- Completion Suggester v2
- Settings Validation
- Safety in Production
- Resiliency Improvements
- Percolate Query
- Deleted Index Tombstones
- Dots in field names
- Cluster allocation explain API