Avoiding Traps, Enjoying the Cheese as Mouseflow Uses Elasticsearch
In 2015, the number of worldwide Internet users will surpass 3 billion and include nearly 40% of the world’s population (source: eMarketer). In analytics, the key to success is having data and knowing what to do with it. Our product, Mouseflow, is a website analytics tool that replays visitor behaviour (session replay) and generates heat maps showing where visitors click, move their mouse, scroll, interact, and are physically located. We collect more data from more clients than ever before and scalability is one of our top priorities.
Tackling Scalability Issues as New Accounts Stream In
In the beginning, we used a sharded SQL-based database cluster for storage. This served quite well because our clients were mainly small and medium businesses. After a short time, bloggers wrote reviews and our clients started to refer others. This resulted in a stream of new accounts, some of them enterprise clients tracking millions of events, which stretched the limits of our platform. It became clear that, with growing message queues and data that wasn’t easy to sort or filter, it was time for a change.
We searched for a distributed and redundant database system known for performance and found Elasticsearch. We installed it and started directing copies of inbound events to the cluster. After a few days, we wrote more complex queries and refined our views even further. It was clear that we were onto something: common queries like “find all the sessions from facebook.com, using an iPhone, who abandoned checkout” (which previously took a long time unless we had indices for that exact search) now took less than 1 second to execute. Our freetext search operations were also, naturally, vastly improved.
Clearly Onto Something After Installing Elasticsearch
After further use, Elasticsearch started to show its numerous benefits:
First, Elasticsearch is distributed and redundant, meaning our platform is more reliable and data is safer. We have other data stores used for storage and backup but, since uptime is so important, having redundancy in the search layer is key.
Second, Elasticsearch is based on Lucene which ensures that free text search operations are much faster. This benefits clients who search multiple unknowable terms and expect results in milliseconds.
Third, Elasticsearch has built-in support for aggregations which means we can query data in a more consumable format (used in most of our reports).
Figure 1: The power of filtering with Elasticsearch Aggregations
And, finally, it has a scan and scroll API which lets us stream data where needed to minimise load times (used in our heat maps, see image below) and deliver a better, more performant, platform and overall user experience.
Figure 2: Heat map
These benefits free up developer resources, giving us the flexibility to focus on our product instead of operations tasks. In the last two weeks, we launched The New Mouseflow (our fully revamped user interface with tons of new features) and onboarded over 12,000 private beta users. This would not have been possible without tackling our own scalability issues which, in large part, were solved by moving to Elasticsearch. If your project has similar requirements, we highly recommend taking Elasticsearch for a test-drive.
Lasse Schou is the CEO of Mouseflow, a Denmark-based SaaS tool for performing web analytics and real-time user studies on websites. Lasse has been working with tech start-ups since 2002 and started Mouseflow in 2010 where he saw a need for visualizing online user behavior. Mouseflow is now serving over 45,000 customers in 160 countries and is using Elasticsearch to deliver big-data analysis in real-time.