10 November 2014 Engineering

The People Behind the Projects: The Swiss Army Knife

Von Aaron Mildenstein

The People Behind The Projects is a new series of first-person blog posts from the individuals who contribute to the fantastic, elastic world of Elasticsearch. We’re a close-knit family at Elasticsearch. The goal of this series is to erode the boundaries between the so-called personal and the professional. We want to share our stories with you. So in this series, Elasticsearchers talk about where they’re from, how they first started engaging with Elasticsearch, and what motivates them.

Old ComputerI encountered my first computer in elementary school in Provo, Utah. It was the early 1980s, and the future was in the school library. I learned a tiny bit of BASIC and loaded games from a cassette drive onto a Commodore PET computer. A few years later, it was the Commodore VIC-20. When I started the sixth grade, I transferred to a school funded by the World Institute for Computer Aided Training (WICAT) where various subjects were taught or were enhanced by software running on computer terminals connected to a mini-computer. It was there that I received my first programming lessons in UCSD Pascal. Though I did not yet know it, by then, the die was cast.

It wasn’t all by design, though. I wanted stuff and we weren’t well-off. So I would go to thrift stores, find broken electronics, take them apart, and fix them. I fixed my first television at the age of twelve. The repairman had examined the TV and said, “Well, X, Y, and Z have to be replaced.” So I looked up the parts, went to an electronics store to buy them, replaced a couple of parts, soldered some new parts, turned on the television, and it came on. I learned to gain understanding and knowledge by taking things apart and, where possible, repairing and reassembling them. My dad taught me to do the same with cars. When we needed a car, he would go buy a car that was in disrepair and we would fix it so that we could have a car for less money.

AaronMy own journey with computer science continued on to college, but I didn’t complete a degree. I met the girl who became my wife, and we got married soon after. I started working and never ended up going back to school. After saving up some money, I quit my job and became immersed in Linux for three months. That’s all I did—all day, everyday. From there, I branched out to employment as a software tester and then moved to systems administration. That’s when I started scripting and learning Perl and Python.

A few years ago, I was working at Alcatel-Lucent where we were trying to create a centralized logging platform to better monitor the conditions of our servers and applications. We investigated Splunk, but the sticker shock was supreme, because we were generating an enormous volume of logs per day. We tried to make our own centralized logging system, and that was too painful, slow, and buggy. We gave up on that project, and our logging needs went unmet for a few more years. While vacationing in the summer of 2011, I heard mention of Logstash in a user-group email, and within weeks, I was testing Logstash and Elasticsearch at work. I loved how helpful everyone was in the IRC chat rooms. The company gave me the go-ahead to architect a four-node cluster of Elasticsearch, Logstash, and version one of Kibana, making me one of the first enterprise users of the ELK stack who worked on tracking problems with logs. During those days, I wrote a script called Logstash Index Cleaner to help delete old indexes. This script has evolved through many iterations and is now the Elasticsearch Curator project, for managing time-series indices.

When Logstash was formally incorporated into Elasticsearch, I also joined the company. It’s exhilarating to be in an environment where somebody who didn’t graduate from college but who has the passion to learn this information can debate and contribute to the development of the product alongside team members with PhDs. The focus here is a desire for the technology to be furthered, a desire to make open source as a model succeed.

Recently, I have been supporting one of the leading telecom companies in their development of a log management system that captures and summarizes over a terabyte of logs a day. They have over 45 nodes in production for a Security Event and Threat Analysis (SETA) reporting tool that tracks the security of all customer-facing devices. The tool ensures that the data that the company’s customers share remain confidential. We are currently in the design phase, and therefore I was able to share my knowledge of cluster design, about making the cluster more robust, how to scale the system, what are the ideal number of master nodes, what is the appropriate distribution of shards, how to speed up the ingestion rate, how to best search for X using aggregations.

It’s helpful to talk about these design nuances because Elasticsearch isn’t prescriptive. It does not try to limit what a person can do. There isn’t just one way to do things. You can use it as just a plain search engine. You can aggregate data based on geographic location. You can analyze time-series data. It’s like a Swiss army knife. Restaurant apps use it to search for restaurants “near me.” Businesses use it to allow customers to search for a product, and that’s Elasticsearch. You go to Loggly that’s storing log data and conducting time-series analyses, and that’s Elasticsearch. Because there are so many ways to configure and use the open Elasticsearch architecture, we are discovering that having a conversation in the early stages of a customer deployment allows for a smoother production lifecycle.