The Challenge

How do you provide insight into and ensure reliability of an email platform?

The Solution

By using the ELK stack to track millions of actionable events every day

Case study highlights

Deliver High Performance

  • Increase query speed by 50x
  • Accelerate indexing from hours to real-time
  • Maintain high availability

Provide actionable insights

  • Consolidate logs from various system components
  • Incorporate advanced search features
  • Deliver results and insights, fast

Tracking 80 million events per day

Mailgun is an email platform for developers to build applications that can send, receive and track email, without requiring the effort and expense of setting up a mail server.

As part of processing customer email, Mailgun logs and stores every event that happens to every email – when ingoing and outgoing email is delivered or rejected, when a message is opened or a link in an email is clicked. If something goes wrong, likean email bounce, a Mailgun customer can look at error codes and other metadata in the log files to diagnose what happened.

"This adds up to 80 million events per day," explains Ralph Meijer, Software Developer, Mailgun. "Those events need to be available to our customers, along with the ability to easily search through them.

"Mailgun already had a system in place before I joined the company, making email logs available, but it was a custom-made solution that was very hard to search through," he adds. "We needed to replace it. That is why we turned to Elasticsearch."

Streamlining log analysis with the ELK Stack

Today Mailgun uses Elasticsearch – along with Logstash and Kibana – to analyze billions of events per month. Mailgun chose Logstash for collecting, parsing and managing logs, which are then stored and made searchable through Elasticsearch. Kibanaprovides the data visualization engine on the front end. Together, these technologies make up the ELK stack.

"Searchable logs enable our customers to always know what is happening to their email," says Meijer. "Before Elasticsearch, we would have to go to each individual component of the system to get the log files. It was really a pain. Now it is very easy to see everything in Elasticsearch. The tool chain of Elasticsearch, Logstash and Kibana is ideal for application logging.

"Once we deployed Elasticsearch, we received many positive reports from our customers, because they finally have an interface into their logs. For many of our customers, this is a critical feature."

The Mailgun control panel now offers an easy-to-use logs tab, listing log events via Elasticsearch in reverse chronological order. Customers can filter the view by domain and severity. They can analyze how and when recipients open their emails. They can track unsubscribes, complaints and mail list issues. They can also search for log entries with a particular tag, keyword or user-defined variable.

In addition to the control panel, Mailgun provides an API so customers can retrieve all of the events through their own system. For example, one customer uses the API to extract all of its log messages from the Mailgun system, and stores them in their own Elasticsearch cluster for enhanced searches using Kibana, the data visualization engine for Elasticsearch.

"In the previous system, we had an API that allowed customers to download all the logs and do the digging on their own," Meijer recalls. "We did not have full text search in the previous system either, and there was no way to search individual fields. Users had to page through their logs. Now, with Elasticsearch, users can search in particular fields, and do full text searches, all within the control panel. Elasticsearch makes this fast and painless."

Improving query speed by 50x

Mailgun receives about 25 queries every second, searching through 6 terabytes of log data.

"We had trouble scaling our in-house solution because we needed to provide our users with such a large number of log events," Meijer explains. "Searches were very slow. It would take more than 10 seconds to see the first page of log events, even for a simple query. Now, Elasticsearch response time is less than a second. Average request time is 0.2 seconds."

Indexing performance is another key advantage for Mailgun. Previously, it could take hours for new logs to be accessible. Elasticsearch indexes new logs in near real-time and makes them immediately available for querying.

Delivering scalability and reliability

Mailgun finds Elasticsearch easy-to-use, scalable and highly-reliable. "Elasticsearch is a natural fit for Mailgun," Meijer says. "It is very easy to get going with Elasticsearch and Logstash. As a programmer I really love the Elasticsearch API, because it gives us control over the internal workings. We can tweak Elasticsearch for our needs."

Scaling up in Elasticsearch is very easy, a significant advantage over the previous system which Mailgun had to configure manually.

"Elasticsearch allows us to scale up simply by adding more nodes, which is tremendously helpful," Meijer says. "Now I can just fire up a new cloud box, point to it, and that's it. The new node is part of the cluster and I never have to worry about it again."

The same efficient Elasticsearch architecture that enables scalability also guarantees reliability.

"In addition to being slow, the old system would experience downtime on a regular basis," Meijer recalls. "Our customers were not very happy with that. But now Elasticsearch provides high reliability.

"A couple days ago we had an internal issue," he adds. "Two of the data nodes could not reach the rest of the cluster. The Elasticsearch cluster simply transferred the load to the other boxes. Once the other two came back up again, Elasticsearch rebalanced the load. No customers even noticed that this happened."

Mailgun benefits using Elasticsearch

Faster Performance

Elasticsearch reduced query response time from 10 to .02 seconds, and accelerated index updating time from hours to real time.

High reliability

Elasticsearch automatically balances load within the cluster, ensuring customers do not experience system downtime.

Customer satisfaction

By providing an easy-to-use control panel, full-text search, and search in particular fields, Elasticsearch has given Mailgun customers a positive user experience.

Easy scalability

Elasticsearch enables Mailgun to easily scale by simply adding more nodes as needed.