Resolve lifecycle policy execution errors
editResolve lifecycle policy execution errors
editWhen ILM executes a lifecycle policy, it’s possible for errors to occur
while performing the necessary index operations for a step.
When this happens, ILM moves the index to an ERROR step.
If {ilm-init] cannot resolve the error automatically, execution is halted
until you resolve the underlying issues with the policy, index, or cluster.
For example, you might have a shrink-index policy that shrinks an index to four shards once it
is at least five days old:
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 4
}
}
}
}
}
}
There is nothing that prevents you from applying the shrink-index policy to a new
index that has only two shards:
PUT /my-index-000001
{
"settings": {
"index.number_of_shards": 2,
"index.lifecycle.name": "shrink-index"
}
}
After five days, ILM attempts to shrink my-index-000001 from two shards to four shards.
Because the shrink action cannot increase the number of shards, this operation fails
and ILM moves my-index-000001 to the ERROR step.
You can use the ILM Explain API to get information about what went wrong:
GET /my-index-000001/_ilm/explain
Which returns the following information:
{
"indices" : {
"my-index-000001" : {
"index" : "my-index-000001",
"managed" : true,
"policy" : "shrink-index",
"lifecycle_date_millis" : 1541717265865,
"age": "5.1d",
"phase" : "warm",
"phase_time_millis" : 1541717272601,
"action" : "shrink",
"action_time_millis" : 1541717272601,
"step" : "ERROR",
"step_time_millis" : 1541717272688,
"failed_step" : "shrink",
"step_info" : {
"type" : "illegal_argument_exception",
"reason" : "the number of target shards [4] must be less that the number of source shards [2]"
},
"phase_execution" : {
"policy" : "shrink-index",
"phase_definition" : {
"min_age" : "5d",
"actions" : {
"shrink" : {
"number_of_shards" : 4
}
}
},
"version" : 1,
"modified_date_in_millis" : 1541717264230
}
}
}
}
|
The policy being used to manage the index: |
|
|
The index age: 5.1 days |
|
|
The phase the index is currently in: |
|
|
The current action: |
|
|
The step the index is currently in: |
|
|
The step that failed to execute: |
|
|
The type of error and a description of that error. |
|
|
The definition of the current phase from the |
To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
PUT _ilm/policy/shrink-index
{
"policy": {
"phases": {
"warm": {
"min_age": "5d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
}
}
}
}
Retrying failed lifecycle policy steps
editOnce you fix the problem that put an index in the ERROR step,
you might need to explicitly tell ILM to retry the step:
POST /my-index-000001/_ilm/retry
ILM subsequently attempts to re-run the step that failed. You can use the ILM Explain API to monitor the progress.