Engineering

How to upgrade Elastic App Search

We highly recommend that all App Search users keep their deployments up to date with the latest available version to have access to new features, security updates, and performance improvements. This guide is designed to help customers through the upgrade process, to minimize the impact of an upgrade on production environments, and to ensure data safety during an upgrade. Finally, the guide helps App Search users troubleshoot any issues that may occur during an upgrade.

Pssst... Just looking for instructions to upgrade from 7.5 to 7.6? Read the guide!

Before you begin

Before attempting an upgrade of an App Search cluster to a newer version, you need to take a few steps to guarantee your data safety and increase the probability of a successful upgrade:

  1. Before you upgrade production servers, test the upgrades in a development environment to familiarize yourself with the process.
    • Using an Elasticsearch snapshot created from a production deployment may be the best option to completely test the upgrade process without risking your service availability or data consistency.
  2. Stop writing to your Elasticsearch cluster:
    • If running App Search version 7.6 or later, enable read-only mode to guarantee a consistent snapshot of your data.
    • For versions before 7.6, manually stop all write/indexing operations to your cluster at the source.
  3. Back up your data with Elasticsearch snapshots. To roll back to an earlier version of App Search, you must have a backup of your data stored in Elasticsearch.

Upgrade process overview

How you upgrade App Search is going to depend on many factors:

  • The mode of deployment, either self-managed or Elastic Cloud
  • Your ability to create Elasticsearch snapshots and recover from them
  • Available resources. Can you spin up a new Elasticsearch cluster and/or new App Search instances while the old infrastructure is running?

We will outline different scenarios for the upgrade process below, but first, here is an overview of the App Search upgrade process to help you navigate the different approaches outlined further in this guide:

  • App Search uses Elasticsearch as its only data store. The Elasticsearch cluster is used for control layer data (engines, settings, etc.), documents, search indexes, API logs, and analytics events.
  • Every time a new instance of App Search starts, it checks every index within the Elasticsearch cluster to see if it needs upgrading to a newer version. When its structure changes, we have to reindex the data stored in an index. 
  • All upgrades are done in a nondestructive way and are performed automatically. Existing indexes are never deleted and we only create new indexes with a copy of the data.
  • If there are multiple App Search instances starting against a single Elasticsearch cluster, they will use a distributed locking mechanism to coordinate the upgrade process and ensure it is done in a consistent way. We create an installation lock record in Elasticsearch to do this.
  • Until the upgrade process is complete, the new version of App Search will not begin to serve API requests.

Given the details of the upgrade process described above, there are a few different options available for performing an upgrade of an App Search deployment based on your ability to handle App Search downtime during an upgrade and to change App Search client configuration after an upgrade:

  1. Simple in-place upgrade (with downtime, no client-side changes) – replace a set of App Search instances with a new set running a newer version. In other words, you would shut down the instance, upgrade the packages, then start it back up and let the instances migrate the data if needed. 
  2. Snapshot-based upgrades (no downtime, client-side changes needed) – create a snapshot of an Elasticsearch cluster used by App Search, restore the snapshot to a new Elasticsearch cluster, and then start App Search on the new cluster.
  3. In-place upgrades in read-only mode (write downtime, no client-side changes) – switch a cluster to read-only mode, start new App Search instances to perform the upgrade, shut down old instances, and remove read-only mode.

Unfortunately, we cannot prescribe the best upgrade path for your specific situation. It will depend on your requirements for App Search service availability, on the capabilities of the platform you are using to manage the deployment, and on available resources, etc. See the details below to better understand the pros and cons of each upgrade path.

In-place upgrade with downtime

If you are able to handle downtime of the App Search service, the easiest upgrade method is an in-place upgrade. We recommend this method to all customers who are able to schedule a maintenance window for their service or for non-mission-critical applications based on App Search.

This upgrade method has a few characteristics that make it uniquely suitable for many deployments:

  • No need for new infrastructure – everything is done in-place and you do not need to provision any new instances of Elasticsearch or App Search.
  • No need to change client configurations – all of your API clients can use the same API endpoints before and after the upgrade since new App Search instances will simply replace the existing ones.

The disadvantages of this method:

  • Downtime required – you have to shut down all of your App Search instances to perform the upgrade and your service will not be available until the upgrade is complete. With proper planning, the downtime period can be reduced significantly, but downtime is still unavoidable.
  • Harder to roll back – if your upgrade fails for any reason, we do not recommend rolling back to the older version of App Search due to potential issues with the partially migrated dataset in Elasticsearch. You would have to restore from a backup to get back to your original state, which would prolong the downtime.

Here are the simple steps you need to take:

  1. Stop ALL of your App Search instances.
  2. Back up your data from Elasticsearch, using snapshots.
  3. Upgrade App Search packages on your servers, or change your container image tags to point to the latest version if you use Docker or Kubernetes.
  4. Start up the new version of App Search, and new instances will take care of coordinating and performing the upgrade before starting up and accepting your API traffic.

Snapshot-based upgrades

For situations where App Search downtime is not acceptable or in cases where you want to ensure a rollback is possible – no matter what happens during an upgrade – the safest way to perform the upgrade is through snapshot-based cloning of a deployment.

As you can see, this process is a lot more involved, but it does guarantee data consistency and allows you to perform a migration without a downtime of the Search API. Here are the advantages of this method:

  • No Search API downtime required – both the old App Search deployment and the new one are able to handle search traffic throughout the upgrade process, meaning your clients should not notice the migration at all.
  • Easy to rollback – if you notice any issues with the new deployment (B) of App Search, you can retry the migration as many times as you need to, since your original deployment is still functional.

The method has a few important disadvantages:

  • The need for additional infrastructure – you need to provision a new Elasticsearch cluster and deploy a set of new App Search instances during the upgrade. This requires some coordination and additional compute resources during the upgrade process.
    • For some deployments it may be possible to scale down Elasticsearch and App Search clusters before using the freed-up hardware to provision new instances, but the details of that operation are beyond the scope of this guide.
  • The need to change the client configuration – your API clients need to be switched to the new deployment endpoint after you perform the upgrade. 
    • Please note: You could proxy your traffic through a load-balancer (ELB, ALB, etc) or a CDN to keep your API endpoint stable while you replace the App Search cluster behind the proxy.

Here are the steps you need to take:

  1. Stop writes into your App Search deployment (A):
    1. Starting with 7.6.0, you can use the App Search read-only feature to block the deployment and ensure a consistent migration.
    2. For all versions below 7.6.0, you should either stop the writes manually (disable your indexing jobs, etc.) or use the Elasticsearch API to put a write lock on all App Search indexes.
  2. Create a backup of your Elasticsearch cluster (A) using snapshots.
  3. Create a new Elasticsearch cluster (B) and restore data from the latest snapshot.
  4. Deploy a new set of App Search instances (B) using the the new Elasticsearch cluster (B) as the data store.
  5. At this point you should have two separate App Search deployments both serving the same data. You should spot-check to make sure the new deployment (B) looks correct to you.
  6. Switch API traffic from the original App Search deployment (A) to the new one (B).
  7. Shut down the old App Search and Elasticsearch clusters.

In-place upgrade with read-only mode (7.6-only)

Finally, for situations where you do not have the ability to provision new infrastructure during an upgrade, but scheduling App Search downtime is not an option, starting with App Search 7.6.1 you can perform an in-place upgrade using App Search’s read-only mode.

This method has the following advantages:

  • No Search API downtime required – both old and new App Search instances are able to handle search traffic throughout the upgrade process, meaning clients should be unaffected by the migration.
  • No need to change client configurations – all of your API clients can use the same API endpoints before and after the upgrade, since new App Search instances will simply replace the existing ones.
  • No need for new infrastructure – everything can be done in-place and you do not need to provision any new instances of Elasticsearch or App Search.

The only comparative disadvantage of this upgrade method is that it makes it harder to roll back if your upgrade fails for any reason. If you experience any issues during the upgrade, we do not recommend enabling writes on the older version of App Search due to potential issues with the partially migrated dataset in Elasticsearch. You would have to restore from a backup to return to your original state, which may require downtime.

Here are the steps you need to take:

  1. Stop writes to your App Search deployment:
    1. If possible, stop your indexing requests at the source (disable indexing jobs, etc.).
    2. Enable read-only mode on your App Search cluster.
  2. Back up your data from Elasticsearch (using snapshots).
  3. Start a new set of App Search instances running the latest version of the product:
    1. You can do a rolling upgrade – stopping, upgrading, and starting one instance at a time.
    2. Or you could provision a full new fleet of App Search instances.
  4. New App Search instances will perform an upgrade and then become available to serve your Search API traffic.
  5. Once new App Search instances are running, stop your old App Search instances.
  6. Finally, remove read-only mode from the cluster and re-enable your indexing jobs. You may need to retry any indexing requests that may have failed during the upgrade.

Upgrades on Elastic Cloud

When deploying App Search on Elastic Cloud, you have access to two separate upgrade paths depending on your tolerance for App Search downtime:

  1. The default upgrade path (via the Cloud Console) is performed in-place via a full cluster restart, meaning App Search will be unavailable for the duration of the upgrade. In future versions of App Search, this mode will allow for in-place upgrades via read-only mode, but as of 7.6.0 downtime is required.
  1. If downtime is unacceptable for your use case, we recommend a snapshot-based upgrade process, which involves taking a snapshot of your deployment (ideally in read-only mode) and then creating a new App Search deployment from the snapshot using a newer App Search version. Please note: The snapshot-based upgrade process requires client-side changes to switch users to another deployment, but guarantees availability of the service during the upgrade process. If you want to keep your client configuration stable, you can proxy your API traffic through a load balancer (ELB, ALB, etc.) or a CDN endpoint.

Please note: If you are using Elastic Cloud or ECE versions 2.4.0 to 2.4.4, there is a known problem with App Search upgrades from versions 7.5 to 7.6 (rolling upgrades could lead to limited data loss if you are actively modifying new engines during the upgrade process). As a workaround, before upgrading the 7.5 App Search instances to 7.6, you can stop the traffic towards those instances by clicking the Stop routing button in the UI. Once all 7.5 instances are no longer routing traffic, it is safe to perform the upgrade by clicking the Upgrade button.

Troubleshooting App Search upgrades

As with any software upgrades, there is always a chance that things will not go according to plan. This means you need to plan and prepare for the possibility of your App Search upgrade failing. You can review the comparison of different upgrade methods above and ensure you are comfortable with the risks associated with your preferred upgrade method before starting the upgrade process.

If you experience an upgrade failure, the information below should help you identify the causes of the issue and allow you to retry the upgrade if needed. App Search should be able to recover from many potential failures, meaning you should be able to retry an upgrade if it fails.

If your App Search upgrade fails, please do the following:

  1. Check the App Search application logs of the new deployment – log/app-search.log or /var/log/app-search/app-search.log – to see the details of what happened.
  2. If the upgrade failed due to an issue with Elasticsearch that you can fix, do that and then re-attempt the upgrade by starting a new App Search instance again.
  3. If the process consistently fails, roll back using a method specific to your preferred upgrade path.
  4. Capture the app-search.log file and file a support case with Elastic if you need further help troubleshooting the upgrade process.