Diving In The Deep End: Logging and Metrics at DigitalOcean

From server health checks to network monitoring to customer activity events — logs are everywhere at DigitalOcean. In a single day, we collect more than a terabyte of real-time log data over our entire operations infrastructure. Buried in that non-stop stream of data is everything we need to know to keep DigitalOcean's cloud services up and running. This talk covers how we collect, parse, route, store, and make this data available to operations and engineers while keeping things simple enough for a small team to manage.

From server health checks to network monitoring to customer activity events — logs are everywhere at DigitalOcean. In a single day, we collect more than a terabyte of real-time log data over our entire operations infrastructure. Buried in that non-stop stream of data is everything we need to know to keep DigitalOcean's cloud services up and running. This talk covers how we collect, parse, route, store, and make this data available to operations and engineers while keeping things simple enough for a small team to manage.

Brian Knox

Brian currently is a self described gnome fighter/Illusionist at DigitalOcean, Others refer to him as the Tech Lead for the Metrics and Logging team. He is a software engineer with expertise in logging, metrics, distributed messaging and cloud operations, as well as an open source contributor to both the Rsyslog logging daemon and the ZeroMQ distributed messaging protocol. He got his start with Linux 20 years ago, and has worked in a variety of roles including network engineering, systems administration, web application development and systems programming.