Monitoring application and infrastructure across countries and organizations presents a number of challenges. As the company grew, it deployed a complex array of monitoring tools and integrations, but this created many potential points of failure. Small issues could sometimes crash the entire system.
Incident reporting was also inconsistent. In some cases, the IT team only became aware of a problem when an employee reported it. Team members often worked overtime to fix issues, which prevented them from focusing on work that added value to the business.
One IT Manager at the company responsible for the health of the company’s information systems, was determined to improve the monitoring process as well as the collaboration between the AIOps and DevOps teams. In addition to accelerating incident resolution, he wanted to reduce the pressure on his team caused by a steady influx of incident tickets. "Our first goal was to improve observability by finding new ways to identify anomalies and take action to correct them," says the IT Manager. "Another was to reduce manual work, using automated processes to solve certain problems, which would reduce the burden on the DevOps team as well as lower costs."