Recap: Elasticsearch Machine Learning Forecasting on Time Series Data

Editor's Note (August 3, 2021): This post uses deprecated features. Please reference the map custom regions with reverse geocoding documentation for current instructions.

This post is part of the Elastic{ON} 2018 blog series where we recap specific demos and related deep-dive sessions from the conference. From machine learning forecasting to APM to security analytics with Mr. Robot — check out the full list below.

Over the last year, machine learning for the Elastic Stack has grown from performing automated anomaly detection on Elasticsearch time series data to forecasting events. Steve Dodson, team lead for machine learning at Elastic, demoed the latest features at Elastic{ON} 2018.

Using a New York City taxi data set (a collection of trip records complete with pick-up and drop-off location, total distance, cost, etc.) Dodson first created a machine learning (ML) job to analyze data stored in Elasticsearch and identify any anomalies in taxi trips. And voila! One appeared: low trip volume on March 14 — the day of a major snowstorm in New York.

Dodson also touched on features designed to treat ML jobs running during scheduled events such as daylight savings time or Black Friday differently, so as to not trigger irrelevant alerts on those days.

But the main highlight in his demo was forecasting — using Elastic machine learning to learn from the past behavior of your Elasticsearch data and forecast what might unfold. Using the same data set, Dodson demoed the forecasting feature.

“There’s 30 million records running,” Dodson explained. “We are aggregating the results, pushing that through the machine learning components, we’re calculating the probability of the current behavior based on what we’ve seen historically and updating the models as the data is being streamed through.”

The result: a visual in the Kibana UI projecting taxi trip times over the next two weeks with the appropriate error margins. (Dodson also breaks down a classic New York taxi journey from Die Hard with a Vengeance. Did Samuel L. Jackson and Bruce Willis really need to tailgate an ambulance to get to Wall Street in time? Watch the demo to find out.)

While taxi route planning was one way to showcase ML for the Elastic Stack during the keynote, these features open up new worlds of opportunities for automating and optimizing the operations space. When will a disk start running out of space in one of your data centers? Will a certain KPI start trending up and to the right? Or down and to the left? What capacity will you need in a couple of weeks based on the current behavior?

In a dedicated Elastic{ON} session, Dodson and ML Tech Lead Sophie Chang take a closer look at how the operational logging use case with ML can unfold (plus, it incorporates Elastic APM). And for those who are interested in a deeper dive, this session from Hendrik Muhs and Tom Veasey explores the modeling techniques and math powering it all.

See what else we covered during the conference in these recaps: