Elasticsearch cybersecurity at the home of the world’s fastest supercomputer

Looking for an alternative to Splunk? Learn how migrating to Elastic from Splunk can help you unify your observability and security data into a single platform, while decreasing your overall costs and admin overhead.

This post is a recap of a community talk given at a recent Elastic{ON} Tour event. Interested in seeing more talks like this? Check out the conference archive or find out when the Elastic{ON} Tour is coming to a city near you.

The fastest supercomputer in the world, Summit, is housed at Oak Ridge National Laboratory (ORNL) within the U.S. Department of Energy (DOE). Researchers at ORNL rely on Summit, which is capable of delivering 200 petaflops of computing power, to facilitate scientific discovery across a range of fields, such as materials science, neutron science, energy, national security, and high-performance computing.

To safeguard the digital information generated by all of this research, ORNL’s cybersecurity team uses Elasticsearch and the Elastic Stack for security information and event management (SIEM). But it wasn’t always that way. At Elastic{ON} Tour in Washington, D.C., Larry Nichols, Cyber Security Engineer & SIEM Admin at ORNL, explained why ORNL transitioned from Splunk to Elasticsearch to improve their ability to manage security for roughly 20,000 endpoints through log monitoring and anomaly detection at scale.

For six years, ORNL’s cybersecurity team used Splunk as their SIEM. But as ORNL’s needs evolved, requiring them to ingest more and more data and run queries on massive indices of over 30 billion documents, they began to run into challenges that led them to explore other solutions. Their Splunk license, which was based on the amount of data they ingested each day, curbed the lab’s ability to add more data. Speed was another barrier. A key goal at ORNL is to facilitate research — yet some searches on their Splunk clusters were taking as long as 15 minutes, diverting valuable time away from data analysis. “Splunk was taking more time than we wanted to spend,” Nichols said. “[T]ime that the analysts could be analyzing data instead of waiting to collect it.”

The decision to switch to Elasticsearch made sense for ORNL: they were no longer limited in the amount of data they could ingest and searches that once took minutes were down to just seconds. “Elastic obviously was appealing because we had the hardware and the resources to expand out, we just didn’t want to spend more money on the ingestion costs,” said Nichols, adding that “[t]he speed was obviously significantly better with Elasticsearch.”

With Elasticsearch, ORNL was able to deploy a SIEM that increased speed and security. Today, the lab’s production architecture runs 25 Elasticsearch nodes, all within Docker, across 25 virtual machines. Their system ingests over two billion documents each day (roughly 1.5 terabytes (TB) of data) and maintains 180 days’ worth of data (over 300 billion documents) across 10 hot nodes and 7 warm nodes. They run three machine learning nodes, three master nodes, and two coordination nodes, with a total disk usage of about 120TB of data. Their development architecture is almost identical, although they maintain fewer logs (about 30 days’ worth). ORNL has a third cluster devoted to research and testing. This cluster, which ingests about 1.5TB of data per day, features six Elasticsearch nodes on physical servers. Their final cluster, a single node, is used to monitor the other three clusters.

In addition to Elasticsearch, other parts of the Elastic Stack have enhanced the lab’s ability to identify and address security issues. With Kibana, the team is able to see at a glance what’s going on with their users in general and focus on specific users as needed. Graph enables them to visualize relational data based off different indices. With this information, if a machine is known to be infected, they can quickly identify any other machines that have interacted with the infected device in order to contain and resolve the issue. More recently, the team has begun using Canvas to create dynamic, infographic-style dashboards to provide their management with high-level views of activity at the laboratory.

To learn more about how ORNL is using Elasticsearch to power research and manage security across the world’s fastest supercomputer, watch Nichols’s full session at Elastic{ON} Tour.

Learn how ORNL is using Elasticsearch to power research and manage security for the world's fastest supercomputer, Summit.