As described in Analyzing the past and present, Elastic Stack machine learning features can calculate baselines of normal behavior then extrapolate anomalous events. These baselines are accomplished by generating models of your data.
To ensure resilience in the event of a system failure, snapshots of the machine learning model for each anomaly detection job are saved to an internal index within the Elasticsearch cluster. By default, snapshots are captured approximately every 3 to 4 hours and retained for one day (twenty-four hours). The amount of time necessary to save these snapshots is proportional to the size of the model in memory.
You can use the update anomaly detection jobs API to change
the interval (
background_persist_interval) and retention
model_snapshot_retention_days) of these snapshots.
There are also situations where you might want to revert to using a specific model snapshot. The machine learning features react quickly to anomalous input and new behaviors in data. Highly anomalous input increases the variance in the models and machine learning analytics must determine whether it is a new step-change in behavior or a one-off event. In the case where you know this anomalous input is a one-off, it might be appropriate to reset the model state to a time before this event. For example, you might consider reverting to a saved snapshot after Black Friday or a critical system failure. If you know about such events in advance, you can use calendars and scheduled events to avoid impacting your model.