27 October 2014 Engineering

The People Behind the Projects: DNA

By Suyog Rao

The People Behind The Projects is a new series of first-person blog posts from the individuals who contribute to the fantastic, elastic world of Elasticsearch. We’re a close-knit family at Elasticsearch. The goal of this series is to erode the boundaries between the so-called personal and the professional. We want to share our stories with you. So in this series, Elasticsearchers talk about where they’re from, how they first started engaging with Elasticsearch, and what motivates them.

We hope you enjoy this first post in our series by Logstash Engineering Team Lead Suyog Rao.

Suyog RaoI had been exposed to the world of engineering at a very early age. My father was a mechanical engineer and always had a methodical approach to solving issues. If a toilet at home wasn’t working, he would try to evaluate the root cause of the problem rather than searching for a quick fix. He always had a calm and logical response: “This doesn’t work, let’s figure out why.” As a twelve-year-old, I found this practical approach pretty impressive.

At university, I worked on a bioinformatics research project where we tried to figure out the causal relationship between certain patterns of DNA and diseases in humans. If we clustered those DNA patterns together, we could identify which diseases each pattern correlated with. It was in this lab that my appreciation for computer science and its application to solving problems in the real world began.

I was first introduced to the ELK stack when I was at Loggly. When I started exploring the different versions of the Elasticsearch code, I realized that it hasn’t changed a lot since 2010 when Shay Banon wrote the first version. Unlike other software systems, there hasn’t been a drastic architectural change and things aren’t constantly being written and rewritten. Elasticsearch had this foresight: “Data will grow exponentially. Machines can be rented. Therefore, let us build a search mechanism that can scale.” The elastic—that is, the scale—part of Elasticsearch is incredible. Many people are able to create search algorithms, but the elasticity of the company gives it a major competitive advantage.

If such foresight is part of the DNA of the company, I wanted to be there. So here I am.

One of my favorite features of Elasticsearch is called “allocation awareness.” We created this when we realized that “not all data are created equal.” It allows you to assign a higher-end machine to certain indices that need more computing power, and older, less frequented data to a low-end machine. We still have access to the old data, but it’s of fading relevance.

One of my recent memories of helping a customer was from a mobile analytics company that uses Elasticsearch to gather data about in-app experience, user session information, and API usage. They were migrating from using a NoSQL product to Elasticsearch for providing these real-time analytics and had gone live the previous week. When I received the support call, one of the nodes in their cluster was experiencing high CPU load and the memory usage was very high. Using Marvel, we narrowed the problem to a special use case of theirs where they were doing frequent updates to documents using scripts, which was adding high load on the primary shards. Once I realized this was the root cause of the problem, I wrote a script to balance the primary shards equally across their four machines. So, instead of allowing 80 percent of one CPU to get loaded, we could have CPUs on other machines sharing the workload. The new script allowed us to distribute jobs evenly across all the machines. And now that their urgent request has been addressed, we are taking corrective measures to make sure that this code is scripted in for other similar use cases. I love the quick feedback loop from customers to products, which benefits everyone. It’s all very gratifying.