Product release

Announcing Curator 5

I am excited to announce the release of Curator 5! Let's dive right in with the changes, shall we?

Breaking Changes

There's really only two breaking changes to be aware of:

  1. Curator 5 only works with Elasticsearch 5.x versions.
  2. A tiny API change. If you only ever use Curator as a command-line tool, you won't even know this change is there.

So, why does Curator 5 only work with Elasticsearch 5.x? On one hand, reverse-compatibility is hard. Another difficulty is that sometimes a new feature doesn't work with an older version of Elasticsearch, but that feature is in the docs, even with a big warning that says, "this feature doesn't work with version X." As a result, new users can have a bad experience, as they struggle to make something work for hours, and then ask for help in the forums only to learn that the feature will not work for them due to a version mismatch. To save everyone time and aggravation, Curator is trying to get on the unified release schedule (though it's still a few versions behind).

What's the same?

One of the nice things is that the configuration format remains unchanged. You are free to use curator and curator_cli exactly as before, without having to change any configuration. The change from Curator 3 to Curator 4 was a jarring one for many users. The improvements were effective, however, and there hasn't been a need to change the configuration syntax. Instant upgrade win!

What's new?

Now this is where the exciting part comes! New features!

Reindex

Perhaps the biggest new feature in Curator 5 is the addition of the reindex action. The Reindex API is extremely powerful.

actions:
  1:
    description: "Reindex index1 into index2"
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          index: index1
        dest:
          index: index2
    filters:
    - filtertype: none

This is just an example of a simple, local reindex with manually selected indices. "But," I hear you say, "this is Curator! It should support filtered index selection!" And you're absolutely right. That's accomplished like this, with the REINDEX_SELECTION placeholder:

actions:
  1:
    description: >-
      Reindex all daily logstash indices from March 2017 into logstash-2017.03
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          index: REINDEX_SELECTION
        dest:
          index: logstash-2017.03
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-2017.03.

This example will reindex all of the daily Logstash indices from March 2017 into a single monthly index.

You can add all kinds of extra processing to these reindexing operations. Curator should support all possible configurations, save one only, and that is manual slicing (which is likely to be a pretty rare need, since automatic slicing is available). Want to reindex through an ingest pipeline? No problem? Reindex from remote? Oh, that's the best part!

actions:
  1:
    description: >-
      Reindex remote index1 to local index1
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          remote:
            host: http://otherhost:9200
            username: myuser
            password: mypass
          index: index1
        dest:
          index: index1
    filters:
    - filtertype: none

This example will pull index1 from http://otherhost:9200 with the provided credentials if you have started the local node with the following setting in the elasticsearch.yml file. If the setting was not present when the Elasticsearch node was started, it means that the node must be restarted after this setting has been added (it cannot be done dynamically):

reindex.remote.whitelist: remote_host_or_IP1:9200, remote_host_or_IP2:9200

You must whitelist remote nodes in order to be able to reindex from remote. In this case, remote_host would likely be the same IP or host name as otherhost in the reindex request_body. Curator will test for the presence of the dest index, and if the task successfully completes, but that index is not found, it will log an error guessing that whitelisting is not set up properly.

"But," I hear you say again, "what if I want to use Curator's index filters to select indices on the remote side?" I saw you coming:

actions:
  1:
    description: >-
      Reindex all remote daily logstash indices from March 2017 into local index
      logstash-2017.03
    action: reindex
    options:
      wait_interval: 9
      max_wait: -1
      request_body:
        source:
          remote:
            host: http://otherhost:9200
            username: myuser
            password: mypass
          index: REINDEX_SELECTION
        dest:
          index: logstash-2017.03
      remote_filters:
      - filtertype: pattern
        kind: prefix
        value: logstash-2017.03.
    filters:
    - filtertype: none

This example will reindex all of the daily Logstash indices from March 2017 from otherhost into a single monthly index on the local cluster.

Generally speaking, the Curator should be able to perform a remote reindex from any version of Elasticsearch, 1.4 and newer. Strictly speaking, the Reindex API in Elasticsearch is able to reindex from older clusters, but Curator cannot be used to facilitate this due to Curator's dependency on changes released in 1.4.

However, there is a known bug with Elasticsearch 5.3.0 not being able to reindex from remote clusters older than 2.0. The patch will be available in Elasticsearch 5.3.1. Earlier versions of Elasticsearch 5.x do not suffer from this bug.

There is a ton of documentation regarding what can be put in a request_body, which has even more examples than this.

Rollover

Lots of you have been asking for this feature, and here it is!

action: rollover
description: >-
  Rollover the index associated with index 'name', which should be in the
  form of prefix-000001 (or similar), or prefix-YYYY.MM.DD-1.
options:
  name: aliasname
  conditions:
    max_age: 1d
    max_docs: 1000000
  extra_settings:
    index.number_of_shards: 3
    index.number_of_replicas: 1
  timeout_override:
  continue_if_exception: False
  disable_action: False

The conditions are described in the Rollover API Elasticsearch documentation.

Read more in the Curator documentation.

Date Math in create_index

Many users were eager to be able to create indices in Curator, but were unable to create indices with a future timestamp in the index name. Credit for this actually goes to the Elasticsearch team.

action: create_index
description: "Create index as named"
options:
  name: '<logstash-{now/d+1d}>'
  # ... 

For example, if today's date were 2017-04-07, the name <logstash-{now/d}> will create an index named logstash-2017.04.07. If you wanted to create tomorrow's index, you would use the name <logstash-{now/d+1d}>, which adds 1 day. This pattern creates an index named logstash-2017.04.08. For many more configuration options, read the Elasticsearch date math documentation.

Unset Shard Routing Allocation

action: allocation
description: "Apply shard allocation filtering rules to the specified indices"
options:
  key: tag
  value:
  allocation_type: require
filters:
- filtertype: ...

By leaving value unset, or empty, a previously set value can be unset.

Period Filter

This has been a long requested feature. Now you can select blocks of whole units of time.

 - filtertype: period
   source: name
   range_from: -1
   range_to: -1
   timestring: '%Y.%m.%d'
   unit: weeks
   week_starts_on: sunday

With range_from and range_to, you can select multiple hours, days, weeks, months, or years. Negative numbers indicate the past, and positive numbers indicate the future. In the above example, setting both to -1 means to only select the last whole week, counting Sunday as the first day of the week. If today is 2017-04-07, week 0 is this week, which starts on 2017-04-02. This means that -1 actually gets the week starting on 2017-03-26 and ending on 2017-04-01.

There's too much to put it all in the blog, so be sure to read the documentation for this filter.

Dedicated internal waitforcompletion functionality

In previous versions of Curator, users were obliged to increase client connection timeout values to be very high for long-running actions, like Snapshots. Curator even tried to compensate by automatically increasing those values for the long-running actions. These actions included allocation, cluster_routing, forceMerge, reindex, replicas, restore, and snapshot.

With the exception of forceMerge, these actions will no longer bind a client connection, waiting for the cluster to send a completion message. Instead, they poll to check for completion:

action: snapshot
description: Snapshot selected indices to 'repository' 
options:
  repository:
  # ...
  wait_for_completion: True
  max_wait: 3600
  wait_interval: 10
  # ...
filters:
- filtertype: ...

You still use the wait_for_completion setting, but now with a max_wait and a wait_interval. A max_wait of -1 means to wait forever for it to complete, otherwise specify a number of seconds to give the operation to attempt to wait for completion before giving up. The wait_interval defines how frequently Curator will check to see if the task is complete. Curator does not check to see if wait_interval is less than the timeout value you specify in the curator.yml client configuration file, so don't set it too high.

So why doesn't this work with forceMerge? From the Elasticsearch documentation:

This call will block until the merge is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous force merge is complete.

So for forceMerge, be aware that timeouts can still occur, and set timeout_override accordingly.

Conclusion

This is a feature-laden release for Curator, and I'm excited to bring it to you. As always, if you run into a problem, help is available at https://discuss.elastic.co.

Happy Curating!