I am very pleased to announce the release of Elasticsearch Curator 4.0! So much has changed since version 3. This is a major change in how Curator works. I’ve listened to a lot of feedback, and incorporated many suggestions. I think you’ll find the results compelling. Before we go on, you need to know about the breaking changes from the previous version.
Due to the sweeping nature of the changes, this is the first version of Curator which is not fully reverse compatible with older versions of Elasticsearch. Curator 4 only supports Elasticsearch versions 2.x and the 5.0 pre-releases. It is anticipated that Curator 4 will continue to support Elasticsearch 5.0 releases, though a special Curator 5 may be released which will take advantage of new features set to be released in Elasticsearch 5.
The API is completely different. If you were using the 3.x API, you will perhaps want to stick with that until you’ve tested the new API out. The documentation for the new API is still at http://curator.readthedocs.io
The command line structure is completely different. The new command line only has these few flags:
Dates are all converted to epoch time. Conversions no longer try to pad a full time unit. Either an age is older or younger than the reference epoch time, or it isn’t.
More on these changes in a bit!
How is it different?
Curator 3 and each of its predecessors were designed to be run from cron, so that periodic maintenance could be performed easily. All of the other features added to Curator since the very beginning (which was only index deletion) have been bolted on, resulting in a very complex command-line structure. This was still navigable, but not what I would have called ideal. One of the most requested features was snapshot restore. A look at the configuration flags revealed that 9+ additional flags would have been required to accommodate only most of the options available.
Another frequent request was atomic add and remove alias actions. I puzzled over how to do that with the command-line structure for a long time and realized that it would have resulted in huge, complicated and hard to read command lines. It was time to rethink Curator configuration.
The solution? Configuration files.
Configure all the things!
One of the design decisions for Curator 4 was to use YAML configuration files–two of them, to be precise: one for the client configuration (and logging options), and one for the actions to be performed. Having a default client configuration allows for multiple, different action configuration files to not need to repeat the client information in each of them. If you store the client configuration file as
$HOME/.curator/curator.yml, then you won’t even have to reference it at the command-line!
The action file allows for filter stacking and command chaining.
If you used Curator before version 4, then you know that Curator had a limited number of ways you could combine filters before performing the desired action. Generally, that was limited to regular expression filtering combined with age-based filtering. With Curator 4, you can combine multiple filters together–as many as you like–to restrict which indices to act on. How might this help you?
Let’s say you want to delete Logstash named indices in excess of 30G of total space consumed. This might represent 30 days worth of data with your normal logging. What if some event caused a torrent of log lines to be produced? You might accidentally delete weeks worth of logs. With filter stacking, you could first filter by pattern, to only count Logstash indices. The next filter would be disk space, 30G worth, sorting by age. The third filter, however, is the magic one: Only delete indices older than 30 days. The total stack would mean, “delete Logstash indices in excess of 30G of storage, but only if they’re also older than 30 days.” Neat, eh?
This is what the action file might look like:
actions: 1: action: delete_indices description: >- Delete indices. Find which to delete by first limiting the list to logstash- prefixed indices. Next filter by space, to those indices in excess of 20g of usage. Then further filter those to prevent deletion of anything less than 30 days old. options: continue_if_exception: False disable_action: False filters: - filtertype: pattern kind: prefix value: logstash- - filtertype: space disk_space: 20 use_age: True source: creation_date - filtertype: age source: creation_date direction: older unit: days unit_count: 30
Command chaining means that you don’t have to execute a different Curator command for each action you want to perform. You can use the YAML action file to have multiple commands, one after the other, in the same file. It is a configurable option to have execution halt if an action fails with an exception, or continue even if there is an exception.
There are some new tools in the Curator stable:
One that should almost be considered new since it’s so improved over previous versions is Alias, which now supports simultaneous, atomic add & remove.
Optimize has been renamed to forceMerge, in accordance with Elastic’s API changes.
Well, mostly just improved filters. Filter by space allows you to also filter by age, so that instead of filtering exclusively by space, that you can also filter by age as an extra step in the space filter (not as a stacked filter). Why might this be important? So you delete the oldest indices first, of course!
Speaking of deleting the oldest indices first, filtering by age now offers 3 different ways to determine index age:
* name (which is what all previous versions of Curator used) requires a time or date as part of the index name
* creation_date derives the age from the time that Elasticsearch created the index, as stored in the index metadata
* field_stats calculates the age from the greatest and least values in a specified field. For Curator 4, since this is age calculations, the field type must be mapped as a date.
Also, with regards to age, Curator now converts the name-derived timestamps to epoch time for comparisons, since
field_stats are already in epoch time. This is important, as it means that comparisons do not follow the conventions used in Curator 3. If a timestamp is older than a date, it’s older. If it’s younger, it’s younger. Curator no longer tries to calculate and compensate for a full unit count. Test with the
--dry-run flag before using this to ensure you don’t delete something you want kept.
Also, since all time calculations are relative to epoch time, and are therefore in seconds, time units have been revamped as multiples of seconds:
if unit == 'seconds': multiplier = 1 elif unit == 'minutes': multiplier = 60 elif unit == 'hours': multiplier = 3600 elif unit == 'days': multiplier = 3600*24 elif unit == 'weeks': multiplier = 3600*24*7 elif unit == 'months': multiplier = 3600*24*30 elif unit == 'years': multiplier = 3600*24*365
This means you can use seconds, minutes, hours, days, weeks, months, or even years as valid units. Just remember that Curator 4 doesn’t care that February only has 28 days. If you use months, it is counting 30 days worth of seconds.
Installing and Upgrading
The instructions for installing Curator 4 are at https://www.elastic.co/guide/en/elasticsearch/client/curator/4.0/installation.html.
If using pip, it’s as simple as
pip install -U elasticsearch-curator
deb packages, I highly recommend you uninstall any older versions (include the 4.0 pre-releases), and then follow the installation procedure for 4.0.
What else is new?
There’s too much for me to describe in a single blog post. I’ll continue to write about the new changes in Curator 4 over the coming days. In the meantime, please read the release notes and the online documentation for more information.