29 March 2018 News

Data Rollups in Elasticsearch: You Know, for Saving Space

By Tony Sleva

This post is part of the Elastic{ON} 2018 blog series where we recap specific demos and related deep-dive sessions from the conference. From machine learning forecasting to APM to security analytics with Mr. Robot — check out the list at the bottom of this post. 

Rolling up data has always stirred debate here at Elastic. In many time-series storage and visualization systems, rollups are defined by the reports and dashboards — the UI — they’ll be powering. This sort of pre-definition means you’re stuck with the same report even if you need to change it. This is limiting, and we hate limits. We like unlimits. So with that in mind, we rolled up our sleeves and added rollup functionality to Elasticsearch — with an Elastic twist.

Over the years, we’ve spent a lot of time optimizing and improving Elasticsearch so you don't have to sacrifice search speed for data volume. There is a lot of value to keeping the original, full, raw data: you can slice it and dice it however you want, without compromise.

But while we made Elasticsearch faster and more scalable, the fact remained that rollups are a legitimate requirement for some use cases. High-level dashboards may need decades worth of data, but likely not decades worth of fine-grained data that takes up considerable disk space. And realistically speaking, IT budgets will never scale as well as Elasticsearch clusters and tape storage will always be less expensive than SSD.

With our new rollup functionality, you get to pick all the fields you want rolled up and a new index is created with just the rolled-up data. This new rollup index then lives side by side with the index that it’s being rolled up from. Since rollups are just like any other index — only a lot smaller — you already know how to query and aggregate the data inside. Making things even easier, the Rollup API also has the ability to search both live and rollup data at the same time, returning data from both indices in a single response. Elasticsearch learned some new tricks so you don’t have to.

Then, when your raw data is retired to a roll of tape in a climate controlled facility, your rollup index can live on with your other production indexes. Your dashboards won’t even notice your old index moved out. And if you realize you made a mistake in your rollup, just pull that old index back into production, run a new rollup job, and you’re ready to rock. Or, more likely, you’re ready to report.

Want to learn more about rollups as well as other features on the Elasticsearch horizon? Check out this What's Evolving in Elasticsearch talk from Elastic{ON} 2018.

See what else we covered during the conference in these recaps: