Security Events Logging at Bell Canada

This post is a recap of a community talk given at a recent Elastic{ON} Tour event. Interested in seeing more talks like this? Check out the Elastic{ON} Tour page to see when a stop is coming to a city near you.

Bell Canada, one of Canada’s largest telecommunications companies, offers mobile phone, television, internet, and landline services to big corporations, small and medium-sized businesses, and individuals across the country. Bell Canada’s security operations center (SOC) covers every Bell office and business unit coast to coast and they rely on logs to detect cybersecurity threats.

Sylvain Proulx, Bell Canada’s Senior Security Manager, says the business units — like Bell TV, Bell Internet, Bell Media, or Bell Mobility — that deliver services to their customers all use different technologies and applications, so the logs they collect are diverse and uncommon. Logs come from routers, firewalls, web logs, OS logs, application logs, and many other devices, some of which get ‘chatty’ and generate a lot of data.

The SOC had performed log and event correlation and incident response and reporting using only an ArcSight Security Information and Event Management (SIEM) solution. But over time, as the volume of logs increased, normalizing many new types of logs from a variety of devices bogged down the system. Their SIEM solution also provided only rule-based detection with no machine learning, so it generated a high ratio of false-positive incidents, which threatened to alert-fatigue their analysts.

Proulx said they’d hit their SIEM’s limit. They found no single vendor solution that would let them ingest more data faster, build threat detection models, and normalize many new types of logs while also retaining ownership of their data. So, the SOC got to work augmenting their ArcSight SIEM with tools like the Elastic Stack to handle high log volume and traffic spikes automatically and generate meaningful security data that wouldn’t overwhelm analysts.

Bell Canada gets data from bare metal servers, virtual machines, and, increasingly, from container infrastructure with Docker and Kubernetes. They needed a log shipper that was simple, lightweight, and straightforward to automate, so they turned to Beats. They use Filebeat and Winlogbeat to ship logs because they’re easy to configure, test, and deploy. Plus, they can version control their configurations and there is no loss of data in case of a network outage.

After the data is queued in Kafka, the SOC must parse and normalize their logs in all their various formats in order to perform security analysis. Running Logstash instances on OpenShift has helped them scale quickly and automatically in case of traffic spikes without dropping logs, and it consumes less resources than multiple virtual machines. An additional advantage they’ve found to having Logstash in a container is that they can easily run it through RSpec for testing before moving to production.

Once the logs are normalized, the SOC stores them in Elasticsearch. Bell Canada’s previous solution was unable to handle increasing log volumes and scale without losing logs. The SOC now does this with Elasticsearch, which allows them to scale quickly and horizontally, making their job a lot easier.

The day events are logged, the SOC searches the data with multiple queries and processes, which puts a heavy load on the cluster, so they’ve implemented a hot-warm architecture with automated deployment of new nodes. The beefier nodes are ingesting and being searched constantly, but when the logs lose their value, they’re shipped to warm nodes for aggregation and lighter analysis. “If you lose a node in Elasticsearch, you can still keep working. Not a problem. You can fix it later,” says Mathew Vandystadt, Bell Canada’s Security Specialist Software Engineer.

Securing data is a top priority for the SOC. Role-based access control (RBAC) is a must, but RBAC can be painful to manage. With the security features of the Elastic Stack, the SOC has control over who has access to the data, they can add a layer of encryption over data transportation, and they can easily perform RBAC management — and it also ties in easily with their existing LDAP, meaning they don’t need to spend extra time redefining group roles and can focus on their security mandate.

Once they have their data where they want it, the SOC analysts can use it to find security incidents — and good visualization is key. Kibana’s straightforward interface means their busy specialists don’t spend a lot of extra time learning new query languages.

“[Kibana] works great in our use cases. It’s simple. We don’t have to do a lot of training and we get a nice visualization.” Mathew Vandystadt, Security Specialist Software Engineer | Bell Canada

Bell Canada found a flexible alerting solution for rule-based detection, but their analysts also needed smart detection that works with different algorithms, so they developed in-house machine learning with open source, ML-centric libraries. Their containers in OpenShift let them easily spin Python containers tied to Kafka or Elasticsearch so all data is accessible. Then they use their ArcSight SIEM for event aggregation and correlation to get a higher ratio of true positives to false positives, sparing analysts from alert bombardment.

They’ve built this whole pipeline using different software from different vendors, made possible because Elastic allows for simple integration with open security protocols. In the future, the SOC plans to merge a Cyber Threat Intelligence platform with their new security architecture. “Having that infrastructure right there allows us to do more than we were a year and a half ago with only the ArcSight solution,” says Proulx.

Ready to learn more about the team’s tools and pipeline? Watch this October 2018 Elastic{ON}Tour presentation and discover how they accomplish their security mission, including how Bell Canada handles long-term log retention and fast forensic data retrieval without using public clouds.

Bell Canada is using Elastic to drive improved security analysis in their SOC.