UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
Upgrading a system is one thing. Usually it involves taking a backup, doing the upgrade and then verifying everything is OK. Upgrading a system that your software depends on can be quite a different experience. In particular when the task is long overdue.
I have written this guide to help you get an overview of which changes to expect (to your system) when getting ready for a new Elasticsearch version. I cannot guarantee it’s entirely complete, for that you will have to consult the release notes, but it should provide a good start for most scenarios. As always, no amount of planning can make up for not testing before going into production.
At the time of this writing, there are 109 different versions of Elasticsearch published. This might seem daunting, but a lot of them are compatible with one another. Elasticsearch uses a versioning scheme with three numbers, A.B.C. A denotes the major version, B denotes the feature version of that major version and finally C denotes the bug fix version of the feature version.
The idea behind this versioning scheme is that upgrading to the latest bug fix version of a given feature version should be safe for most users. A breaking change in a bug fix release usually will have a straightforward workaround. Also, within one major version, Elasticsearch does its best to ensure that different versions are capable of communicating, making it possible to upgrade without stopping the cluster.
It might seem pedantic to say this, but people work at different levels of abstractions and most people who use old Elasticsearch versions have had to prioritize other things for a while.
Most people connect to Elasticsearch over HTTP - or the REST interface as it is often referred to - but if your client is written in Java or another language running on the Java Virtual Machine you may very well use the transport protocol. There also people connecting with Thrift or even Memcached.
One of the big advantages of using HTTP is the technology independence of the protocol. What this means when upgrading to a newer Elasticsearch version is that your old client will still be able to connect to the new Elasticserach version. That said, there might still be differences in URL’s, the query DSL and other features requiring changes to your client, but you’re safe to move on to the next section.
If your’re using the Java Transport you should upgrade the client library to match the new Elasticsearch version. Even if it’s just a minor bug fix some of the bugs could be on the client side. For larger version upgrades there is also a risk the old client will refuse to connect to a newer cluster.
Most of the time, upgrading the client is as simple as changing a jar dependency, but sometimes refactoring will be required. Most of the time the refactoring is really straight forward, like updating package names in your imports.
For other connection methods like Thrift and Memcached you will have to consult the documentation of the plugin enabling that connection.
After making sure your client is able to connect to the new Elasticsearch version, it is time to verify that your queries are compatible. Dig through the source of your client and create an example query in DSL syntax for every query type you have. If your client already implements the queries with the DSL this will mostly be cut and paste. The transport client on the other hand uses a query builder that in syntax is rather different from the query DSL. The solution is to execute the toString() method like the below example:
System.<span class="fu">out</span>.<span class="fu">println</span>(org.<span class="fu">elasticsearch</span>.<span class="fu">index</span>.<span class="fu">query</span>.<span class="fu">QueryBuilders</span>.<span class="fu">matchQuery</span>(<span class="st">"myField"</span>, <span class="st">"Hello SearchString!"</span>).<span class="fu">toString</span>());
Having these example queries is beneficial for two reasons: first, it provides good documentation for the features in Elasticsearch that you rely on and second, it makes it easy to test them against the new version.
Compile a list of installed plugins in your cluster and check if they also require upgrading. If that is the case, then you should also check their release notes for breaking changes.
<span class="kw">bin/plugin.sh</span> --list
As can be seen in the below list, there was a great many breaking changes in the 0.90 branch. Pre 0.90 was probably no better, but that is out of scope for this article. The good news is that starting with version 1.0 the breaking changes are mostly reservered for the larger releases.
This list is based on the release notes included with each release, but to keep things shorter, I’ve tried my best to only include the issues that might require changes to your application. Potential breaking changes in the operational aspect, like how the default shard allocation algorithm has been changed, has been left out. In other words, use this list to plan the upgrade of your client to a new Elasticsearch version. Don’t use it as an excuse for not testing your entire solution with the new version before rolling out into production.
In the 0.90 branch breaking changes, big and small are spread across the versions and there is little notion of stabillity between bug fixes, even if the really big changes where postponed until 1.0.
minimum_should_matchapplied to wrong query in
- MatchQueryParser doesn’t allow field boosting on query when included in a _GET request #3024
- Make GetField behavior more consistent for multivalued fields. #3015
- Add a
minimum_should_matchparameter when common terms query has only high frequent terms #3188
- Java Client: Renamed
- Java API: Remove RestActions#splitXXX(String) methods #3680
- Flush API: Removed the refresh flag #3689
- Optimize API: Removed the refresh flag #3690
- Handling of the
_parentfield: Rejecting documents without
parentfield set as well as prohibit adding a parent mapping at runtime as well as #3849
- Reject indexing requests which specify a parent, if no parent type is defined #3848
- Completion Suggester: Reject non-integer weights on indexing to prevent rounding #3977
- Remove Index Reader warmer introduced in 0.90.6 as it is not a good default behavior for all use cases. This will be reimplemented as an opt in feature. #4078 & #4079
- Java client:
ElasticSearchIllegalArgumentExceptionon similar errors #4199
- Stats/Infos API: JvmStats now have standard names for gc and memory pools #4661
- Cluster Stats API: Expose min/max file descriptors #4681
Starting with the 1.0 branch Elasticsearch had a new emphasis on stability. All known breaking changes are introduced in the 1.0.0 release.
- Stats and info apis have been changed to be more RESTful, important change for monitoring and operations, but probably not relevant for your client.
- The indices api has been cleaned up, an important change if your client creates new indexes or makes changes to mappings or warmers.
- Wrapping documents in an object to specify type is disabled by default.
- Count, delete-by-query and validate-query requests require the query to be wrapped in a
queryparameter just like the search request. Check if you use any of these requests.
filterparameter in search requests have been renamed to
post_filter. Old version still works, but was changed for good reasons. See Optimizing Elasticsearch Searches for more info.
multi_fieldmapping type has been replaced by a
fieldsparameter on other types, as explained in docs
patternanalyzers have been changed to use an empty stopwords list by default. (Was previously english stopwords.)
- All dates without years use 1970 as default.
- Default unit in geo queries have been changed to meters (previously miles)
edit_distanceparameters are replaced by the single fuzziness parameter.
ignore_missingparameter has been replaced by the parameters:
- Deleting an index requires a name or a pattern.
- Return value ok is removed
- Return values
existsare all changed to
- Field values, in response to the fields parameter, are now always returned as arrays.
fieldsparameter no longer supports
_source.fieldsyntax, use source-filtering
textquery is replaced by
fieldquery is replaced by
- Function score query replaces _boost field for document boost
copy_toparameter replaces the
pathparameter in mappings.
_percolatorindex, read more here if you use percolation
- Query/Get/Update APIs: Allow to control where single fields should be extracted from (source, stored fields or fielddata) #4492
- Query API: Removed
- NodesInfo API: Using plugins instead of singular plugin #5072
- Mapping API: Binary fields are no more stored by default, because its data is already available in the
- Aggregations: aggregation names can now only contain alpha-numeric, hyphen (“-”) and underscore (“_”) characters, due to the enhancement which allows sub-aggregation sorting #5253
No known breaking changes for the fix versions up to and including version 1.0.3.
No known breaking changes up to and including bug fix realease 1.2.4.
- If using the java transport, your client will also have to run on a Java 7 compatible JVM #5421
- Scripting: Disable dynamic scripting by default #5943
- Snapshot/Restore API: Added
PARTIALsnapshot status #5792
- Gateways: Removed deprecated gateway functionality (in favor of snapshot/restore) #5520
- Versioning: Version types
EXTERNAL_GTEtest for version equality in read operation & disallow them in the Update API #5929
- Versioning: A Get request with a version set always validates for equality #5663
- Versioning: Calling the Update API using
EXTERNAL_GTEversion type throws a validation error #5661
- Aggregations: Changed response structure of percentile aggregations #5870
- Cluster State API: Remove index template filtering #4954
- Nodes Stats API: Add human readable JVM start_time and process refresh_interval #5280
- Java API: Unified IndicesOptions constants to explain intentions #6068
The 1.3 branch mostly follows a similar pattern, but there is an exception in the 1.3.3 release. There is a good reason for the exception though. The change removes an unintended feature that allowed users to specify experimental postings formats, with the risk of data loss on a future upgrade. There probably aren’t many people who have used this feature, as it was not documented, but those who did need to reindex their data.
- Analysis: Improvements to
StemmerTokenFilter, stemmers named
dutch_kpare affected. #6452
- Thread pool rejection status code is changed from 503 to 429. This might be relevant if your client uses circuit breakers #6629 #6627.
action.wait_on_mapping_changehas been removed #6648
- Security: Disable JSONP by default #6795
- Analysis: Improvements to
Other than 1.3.3 there are no known breaking changes up to and including version 1.3.5.
The 1.4.0 release is one of the biggest releases, but still not close to as big a change as 1.0.
- Percolation queries can only refer to fields that already exist in the mappings.
- Aliases with filters can only refer to fields that exists in the mappings.
- Read operations are returned by default, even if there is no master. Writes are still not allowed. Halting read operations can be enabled in configuration.
- The MVEL scripting language has been replaced by Groovy and is only available as a plugin
This list might seem long, if your’re planning to upgrade many versions, but chances are that lot of the issues will not affect your client and in many cases it’s not hard to make your client handle both your current version and the new one. Having a staging or test cluster is the way to go in order to play safe.
If on the other hand you are upgrading to a version incompatible with the previous one and you need to avoid downtime during the actual upgrade of the Elasticsearch cluster I recommend this article. The approach is to start with encapsulating access to Elasticsearch in your system behind and interface common to both the implementation targeting the old version and the implementation for the new version. The clue then is to select Elasticsearch cluster and the version of that cluster into a runtime configuration. In many ways this is the silver bullet of upgrade strategies, both in terms of capabilities and implementation cost.