Lyft's Wild Ride from Amazon ES to Self-Managed Elasticsearch

From October 2016 to May 2017, Lyft was Amazon's biggest hosted Elasticsearch customer and most frequent support headache. After quickly scaling to the maximum supported size — and then beyond — Lyft found themselves keeping one big support ticket open to address daily (and sometimes hourly) incidents. It was clearly time for a change. After two intense weeks of migration, Lyft became self-hosted, and both companies breathed a sigh of relief.

How did Lyft get there from here? What are the pros and cons of cloud versus self-hosted Elasticsearch deployments? And what has Lyft learned from almost a year of operating their own cluster? Michael Goldsby of Lyft's observability team talks you through the joys and sorrows of scaling a logging cluster from 300,000 to 3 million events per minute.

Michael Goldsby

Software Engineer, Lyft

Michael Goldsby is an infrastructure engineer at Lyft, where he keeps the logging pipeline happy and healthy. Previously he worked at logging-as-a-service startup Loggly, where he learned the ins and outs of distributed data pipelines and logging in Elasticsearch ("you want to do what with our search engine?"). In his free time he is a private pilot and amateur bartender (but not at the same time).