IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« Migrate index allocation filters to node roles Start and stop index lifecycle management »

› ›

Resolve lifecycle policy execution errors

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Resolve lifecycle policy execution errors

edit

When ILM executes a lifecycle policy, it’s possible for errors to occur while performing the necessary index operations for a step. When this happens, ILM moves the index to an ERROR step. If ILM cannot resolve the error automatically, execution is halted until you resolve the underlying issues with the policy, index, or cluster.

For example, you might have a shrink-index policy that shrinks an index to four shards once it is at least five days old:

PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 4
          }
        }
      }
    }
  }
}

There is nothing that prevents you from applying the shrink-index policy to a new index that has only two shards:

PUT /my-index-000001
{
  "settings": {
    "index.number_of_shards": 2,
    "index.lifecycle.name": "shrink-index"
  }
}

After five days, ILM attempts to shrink my-index-000001 from two shards to four shards. Because the shrink action cannot increase the number of shards, this operation fails and ILM moves my-index-000001 to the ERROR step.

You can use the ILM Explain API to get information about what went wrong:

GET /my-index-000001/_ilm/explain

Which returns the following information:

{
  "indices" : {
    "my-index-000001" : {
      "index" : "my-index-000001",
      "managed" : true,
      "policy" : "shrink-index",                
      "lifecycle_date_millis" : 1541717265865,
      "age": "5.1d",                            
      "phase" : "warm",                         
      "phase_time_millis" : 1541717272601,
      "action" : "shrink",                      
      "action_time_millis" : 1541717272601,
      "step" : "ERROR",                         
      "step_time_millis" : 1541717272688,
      "failed_step" : "shrink",                 
      "step_info" : {
        "type" : "illegal_argument_exception",  
        "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
      },
      "phase_execution" : {
        "policy" : "shrink-index",
        "phase_definition" : {                  
          "min_age" : "5d",
          "actions" : {
            "shrink" : {
              "number_of_shards" : 4
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1541717264230
      }
    }
  }
}

	The policy being used to manage the index: `shrink-index`
	The index age: 5.1 days
	The phase the index is currently in: `warm`
	The current action: `shrink`
	The step the index is currently in: `ERROR`
	The step that failed to execute: `shrink`
	The type of error and a description of that error.
	The definition of the current phase from the `shrink-index` policy

To resolve this, you could update the policy to shrink the index to a single shard after 5 days:

PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}

Retrying failed lifecycle policy steps

edit

Once you fix the problem that put an index in the ERROR step, you might need to explicitly tell ILM to retry the step:

POST /my-index-000001/_ilm/retry

ILM subsequently attempts to re-run the step that failed. You can use the ILM Explain API to monitor the progress.

« Migrate index allocation filters to node roles Start and stop index lifecycle management »