When indexing time series data like logs or metrics, you can’t write to a single index indefinitely. To meet your indexing and search performance requirements and manage resource usage, you write to an index until some threshold is met and then create a new index and start writing to it instead. Using rolling indices enables you to:
- Optimize the active index for high ingest rates on high-performance hot nodes.
- Optimize for search performance on warm nodes.
- Shift older, less frequently accessed data to less expensive cold nodes,
- Delete data according to your retention policies by removing entire indices.
We recommend using data streams to manage time series data. Data streams automatically track the write index while keeping configuration to a minimum.
Each data stream requires an index template that contains:
Data streams are designed for append-only data, where the data stream name can be used as the operations (read, write, rollover, shrink etc.) target. If your use case requires data to be updated in place, you can instead manage your time series data using index aliases. However, there are a few more configuration steps and concepts:
- An index template that specifies the settings for each new index in the series. You optimize this configuration for ingestion, typically using as many shards as you have hot nodes.
- An index alias that references the entire set of indices.
- A single index designated as the write index. This is the active index that handles all write requests. On each rollover, the new index becomes the write index.
ILM enables you to automatically roll over to a new index based on conditions like the index size, document count, or age. When a rollover is triggered, a new index is created, the write alias is updated to point to the new index, and all subsequent updates are written to the new index.
Rolling over to a new index based on size, document count, or age is preferable to time-based rollovers. Rolling over at an arbitrary time often results in many small indices, which can have a negative impact on performance and resource usage.