Infrastructure Monitoring at King Abdullah University of Science and Technology

King Abdullah University of Science and Technology is a graduate research university located in Thuwal, Saudi Arabia. Founded in 2009, it provides research and graduate training programs and is the kingdom’s first mixed-gender university.

We met recently with Stanislav Flachs, Manager of the Systems, Storage, Automation & Workflows teams, and Gianluca Castellani, lead for the Automation & Workflows team. The two analyse time-consuming tasks and processes to find ways of improving them. They are both actively managing the Elasticsearch clusters of the university. Flachs and Castellani started using the Elastic Stack so they could connect the data with the visualisation and alerting systems they needed.

Prior to Elastic, what infrastructure issues were the university encountering?

Gianluca Castellani: In the past, we were using different tools to monitor our infrastructure. There was always an issue that persisted. These different systems were not meant to work together. Our previous solutions were scattered and not flexible. We always had to come up with creative ways of connecting the data with the visualization and the alerting we needed. It is crucial for us to have visibility on our systems. Not having a holistic tool to monitor and analyze our operational data was just cumbersome. If you don’t monitor properly, that can have catastrophic consequences on your system. We didn’t have a tool to look efficiently at what we were doing.

Why did you choose the Elastic Stack to help solve your infrastructure issues?

Stanislav Flachs: We considered a few options — looking at commercial and open-source solutions. We finally decided to go for the Elastic Stack, partially because of the licensing model. This was not the only reason, however. The community spirit so characteristic of Elastic really attracted us. We wanted to be part of it and it just resonated with us better. We are all programmers. We are keen to contribute and work on this. Also, our researchers were already familiar with the solution. Thinking further ahead about providing resources internally to our research and business communities, it just made more sense to use something they already adopted. The learning curve with the Elastic Stack was also quite smooth, making the adoption easier.

Castellani: We embraced Elasticsearch for many reasons, such as scalability, easy set up and management, its large user community, flexible and robust data ingestion, and its intuitive data visualization.

How has the university benefited from the Elastic Stack?

Flachs: We are using Elastic to store heterogeneous data and feed other tools already used in the organization. We store end-user communication by writing ad hoc Elasticsearch plugins for Slack and regular emails. Then we can provide dashboards, reports and statistics to the upper management so they can have KPIs about our operational data and our systems. The Elastic Stack is at the heart of our IT infrastructure monitoring, and it is a key tool for our daily activities. We can check alerts, CPU and memory utilization, workflow characterization, and so much more. This allows us to improve our capacity planning, so we can now better plan for our infrastructure upgrades and ultimately we are saving time and money.

What are the next steps with the Elastic Stack?

Castellani: Since we started using it, the Elastic Stack has proven to be an easy-to-use and flexible platform that can host any type of data whilst gathering everything into one place, and it accommodates multiple use cases. We are now working on implementing the security feature to manage the access to the resources at a more granular level. We are also planning on using machine learning to analyze logs from different sources and to find anomalies in our systems. Our goal is to do intrusion detection and foresee any hardware failure.

Today, after more than 1 year from our first cluster, we are still discovering new functionalities, creating new applications and getting new insights from our data. To conclude, we want simply to say that Elasticsearch has changed the way we approach data. And the old saying: ‘When you have a hammer, everything looks like a nail’, has become: ‘When you have Elasticsearch, everything looks like JSON.’

Stanislav Flachs

Stanislav Flachs is a battle-tested veteran of IT operations, passionate about DevOps culture and technological innovation. In his role as Systems & Storage Manager at KAUST, Stanislav is responsible for implementing and operating compute and storage solutions that scale and adapt to evolving user needs. When he is not behind the keyboard, you’ll find Stanislav underwater exploring the blue depths of the Red Sea.

Gianluca Castellani

Gianluca Castellani is a Research Computing “jack of all trades, master of many”. His main area of interest is making large computing systems robust, resilient and efficient using testing, monitoring and automation. His true passions are creating solutions and solving extrinsic problems based on the needs of his colleagues and himself.