Engineering

Upgrading the Elastic Stack: Planning for success

"Upgrade" can be a four-letter word for admins, so at Elastic, we try to make the upgrade process as simple as possible. Why? Because we pack a ton of goodness into each release, but you can only take advantage of that goodness by being on the latest version of the Elastic Stack. This is also why we make the latest version available on Elastic Cloud the same day that we release.

Why upgrade?

Upgrades are important for several reasons, like features, fixes, and enhancements. Every new version (both major and minor) of the Elastic Stack offers users a chance to get more from their deployment. As an example, here are some of the reasons why you'd want to upgrade to Elastic Stack 7.x from 6.x.

Performance improvements

Version 7.0 brought several performance improvements: Faster top k retrieval, new rank methods, adaptive replica selection, fewer round trips on cross cluster search, and more. And this list of improvements only grows with every minor release (up to version 7.8 at publication).

If you'd like to learn more about how we test our performance improvements, the Rally announcement blog is a good primer on how the benchmarks are taken, the various datasets used, and how you can use it yourself. Rally is the benchmarking tool used for Elasticsearch and Lucene.

New features

The Elastic Stack 7.0 release blog linked above also describes new features. The blogs at elastic.co are organized in several categories. To track features (both new and updated) the release category is the best place to look.

Keeping up with bugs, security fixes, and EOL

Bugs happen. Our huge community of users help us find them and they get fixed. Keeping up to date means getting things fixed.

Staying up to date also means you have the latest fixes for any security issues. New security issues can be submitted and reviewed at our security issues page, and you can also sign up for an announcement RSS feed on that page. And for Elastic Cloud users, security patches are applied by Elastic as they come up.

Finally, keeping up with the latest version means never having to worry about running on an unsupported version. The Elastic product end of life dates page provides details on the maintenance, support, platform support, and support SLAs. It is important that you consider the details in these policies to ensure that you have a supported production environment.

Things to think about while planning an upgrade

Planning and testing are two of the most important things to consider when you are deploying any software. Upgrading the version of the Elastic Stack is no different.

The upgrade process is detailed in the documentation, but the below list provides suggestions from the engineering, services, support, and customer success teams on how to approach the upgrade.

Security

Security is always being improved, so it's important to know what security you currently have implemented, as well as what changes to expect with a new version.

Sticking with a 7.x upgrade as the example, Elasticsearch versions prior to 6.8 did not require TLS when using Elastic Stack security. So if you are currently running version 6.7 or below and using Elastic Stack security, you would need to enable TLS before upgrading. This can mean involving the certificate management team at your company if applicable. If you are currently looking at making this upgrade, here are some assets to get you ready:

  1. Getting started with Elasticsearch security (technical blog)
  2. Configuring SSL, TLS, and HTTPS to secure Elasticsearch, Kibana, Beats, and Logstash (technical blog)
  3. Fundamentals of Securing Elasticsearch (free training)

Inventory

Take care to list all of the integrations (both inputs and outputs) of your system so that you can be sure to have the proper versions of integration libraries etc., and that you can take action if any of your integrations also need to be upgraded. As an example, a logging system may have integrations similar to these:

  1. Syslog feeds
  2. Log files
  3. Container logs
  4. Asset details
  5. A script providing hostname enrichment data
  6. Ticketing data
  7. Outbound PagerDuty alerts

For each of the above, you should identify the integration method, fields used, breaking changes, upgrades needed, and verification steps. Be sure to include any intermediary connections, such as message queues.

Test plan

Testing should be done in your development environment so that you know you are ready for production and you have practiced the steps. When you write your test plan, you should take into consideration:

  1. Enrichment information
  2. Machine learning jobs
  3. Inbound sample data
  4. Live data
  5. Performance
  6. Outbound integrations
  7. Dashboards
  8. Alerts

Breaking changes

New versions can contain breaking changes that may impact you. We don't want you to be caught off guard, so the breaking changes are always published in the documentation. The docs will remind you to check the breaking changes for all of the products that you are using — this is very important. Refer back to your inventory list and make sure that you cover everything in that list.

This note appears in the documentation, but is worth repeating here: Make sure you check the breaking changes for each point release up to the desired new version.

Deprecation

In addition to dealing with breaking changes, you should look at your deprecation logs to make the next upgrade easier. Details on enabling and using the deprecation logs are in the documentation.

Enable monitoring

Monitoring is important at all times for the sake of cluster health. But monitoring data is also very important for planning the upgrade. Prior to the upgrade, it will help you avoid running out of resources, and afterward for detecting issues. For more information, see the monitoring documentation for your current version. Take into consideration that during a rolling upgrade one node at a time is taken out of the cluster, and for this reason you need sufficient headroom.

Health of the existing system

An unhealthy system needs to be carefully evaluated before upgrading. If the system is at the limit of the existing resources (CPU, memory, disk, etc.) then address those issues before upgrading, as during the upgrade process resource demand will temporarily increase. So be sure to address any issues you may have been ignoring, as a moderate issue may become a show stopper during the upgrade.

Implementation options

Do it yourself

After creating your inventory and test plan, reading the breaking changes and deprecation logs, enabling monitoring, and checking the health of your existing system, you can follow along with the upgrade documentation and video. Our community is also very helpful. When you start a conversation at https://discuss.elastic.co you are connecting with Elastic engineers, other customers, and thousands of people just like you. Ask questions, get advice, and give advice.

Plan your implementation with the experts

Elastic Consulting offers service packages to assist you in both the planning and implementation of your upgrade. With these packages, you have access to Elastic experts that can help you make sure the upgrade process goes smoothly. To learn more about these packages, see the Migration package or contact Elastic Consulting directly.

In conclusion ... upgrading is important!

You really do want performance improvements, new features, and bug/security fixes. Planning your upgrade is also important. Your data is valuable — make sure to take your time and do it right the first time. 

Finally, YOU CAN DO IT! It's totally possible to upgrade on your own. But if you need help we're here for you — whether it's our team of experts or our community of thousands of people just like you.

If you want to learn more best practices for upgrading, be sure to watch our Expert tips for upgrading the ELK Stack webinar.