Reindex from a remote clusteredit

You can use reindex from remote to migrate indices from your old cluster to a new 7.4.2 cluster. This enables you move to 7.4.2 from a pre-6.8 cluster without interrupting service.

Elasticsearch provides backwards compatibility support that enables indices from the previous major version to be upgraded to the current major version. Skipping a major version means that you must resolve any backward compatibility issues yourself.

If you use machine learning features and you’re migrating indices from a 6.5 or earlier cluster, the job and datafeed configuration information are not stored in an index. You must recreate your machine learning jobs in the new cluster. If you are migrating from a 6.6 or later cluster, it is a good idea to temporarily halt the tasks associated with your machine learning jobs and datafeeds to prevent inconsistencies between different machine learning indices that are reindexed at slightly different times. Use the set upgrade mode API or stop all datafeeds and close all machine learning jobs.

To migrate your indices:

  1. Set up a new 7.4.2 cluster and add the existing cluster to the reindex.remote.whitelist in elasticsearch.yml.

    reindex.remote.whitelist: oldhost:9200

    The new cluster doesn’t have to start fully-scaled out. As you migrate indices and shift the load to the new cluster, you can add nodes to the new cluster and remove nodes from the old one.

  2. For each index that you need to migrate to the new cluster:

    1. Create an index the appropriate mappings and settings. Set the refresh_interval to -1 and set number_of_replicas to 0 for faster reindexing.
    2. Use the reindex API to pull documents from the remote index into the new 7.4.2 index:

      POST _reindex
      {
        "source": {
          "remote": {
            "host": "http://oldhost:9200",
            "username": "user",
            "password": "pass"
          },
          "index": "source",
          "query": {
            "match": {
              "test": "data"
            }
          }
        },
        "dest": {
          "index": "dest"
        }
      }

      If you run the reindex job in the background by setting wait_for_completion to false, the reindex request returns a task_id you can use to monitor progress of the reindex job with the task API: GET _tasks/TASK_ID.

    3. When the reindex job completes, set the refresh_interval and number_of_replicas to the desired values (the default settings are 30s and 1).
    4. Once reindexing is complete and the status of the new index is green, you can delete the old index.