News

Logstash User Survey Results

In case you hadn't heard about it, we asked all of our community members to tell us more about how they use Logstash, including praise and pain points. We ran the survey for ~4 weeks and had just shy of 250 respondents! We have been poring over the results and have already learned a lot that will prove to be extremely valuable as we continue to improve Logstash. And today, we're very excited to share the results with you.

In this blog post, we'll describe the survey methodology, share our summary of the results, and provide links to the raw results in case anyone is interested in doing more analysis. (And if you do, make sure to let us know!) 

All charts in this blog are based on survey data analysis and visualization using the Elastic stack. Check out the "Raw Result Data" section at the end of this blog for more information on how to set that up yourself.

Response Highlights

For those of you that can't wait to find out the juicy bits, here are a few highlights:

  • A large number of respondents run Logstash on premise (78%).
  • Most popular operating system among respondents is Linux (94%), followed by Windows (15%).
  • Top Linux distributions in use by the survey respondents are Ubuntu (23%), CentOS (18%), RHEL (12%), and Debian (10%).
  • There is significant use of log forwarders among the respondents, with Logstash Forwarder at 32%, Logstash Agent at 38%, and some usage of log-courier and beaver.
  • 40% of respondents run Logstash without the use of a message broker.
  • 53% of respondents are manually managing Logstash instances. The rest rely on Puppet, Ansible, Chef, and SaltStack, in that order.
  • 52% of respondents do not currently monitor Logstash health. The rest use custom scripts and 3rd party tools (mostly Nagios).

Note: All percentage calculations are based of the total number of survey respondents. Since every question was multiselect, percentages do not add up to 100%.

Want to read about the rest of results? Interested to see it in charts? Read on!

Survey Methodology

We ran this survey for about four weeks. It was promoted on the logstash-users mailing list, the Elastic company blog, the Elastic Twitter account, and at the Elastic{ON} conference. In all, we had 242 respondents to the survey, with the majority coming in after the first official blog and Tweet.

The survey consisted of 17 questions, which combined multi-select and write-in answers (see survey in “preview mode" here). Although all of the questions were optional, most respondents completed almost all of the questions and provided ample write-in feedback. The questions ranged from use case information and environment details to desired future product functionality.

The survey was anonymous, unless the participants chose to share their contact information with our Developer Relations team. In addition, by taking the survey, participants agreed to share the anonymized results of the survey with the community.

Result Details

Without further ado, here are the questions we asked and the answers you gave us.

What are your Logstash use cases?

94% of respondents use Logstash in a log aggregation use case based on Elasticsearch. In addition, 22% also send logs to other systems, and 16% aggregate other time-series data, such as Twitter and call detail records. And the most-interesting-use-case award goes to the author of the following write-in answer: “Aggregate file system contents for incident response and forensic analysis. Also treating disassembled code instructions as "events" for static software analysis." Cool!

In what environment(s) are you running Logstash?

74% of respondents are running Logstash in production, 57% in development, and 16% are just trying it out.

What platform do you deploy on?

78% of respondents deploy Logstash on premise, 22% in AWS, 2% in Azure. Other public cloud providers include Rackspace, Google Compute Engine, and DigitalOcean. OpenStack got some mentions among private cloud technologies.

What operating system do you use?

Linux leads as the operating system of choice for Logstash deployments with 93% of respondents. 16% of respondents are running Logstash on Windows and 7% on Mac OS X. 

What Linux distribution do you use?

Top Linux distributions in use by the survey respondents are Ubuntu (23%), CentOS (18%), RHEL (12%), and Debian (10%).

What is the total throughput of your Logstash infrastructure?

The majority of respondents (55%) reported deployments with <10K events/sec. However, a significant number are running with 50K events/sec, 100K events/sec, 500 events/sec, and even >1M events/sec.

How many Logstash servers are you running in Production?

Most respondents are running between 1 and 10 Logstash servers, with some outliers in the teens and beyond. A couple of users reported several hundred Logstash servers, but we were not sure if those users counted Logstash Agent deployments on endpoints.

What method do you use to get data into Logstash?

Methods for getting data into Logstash vary, but various approaches for shipping logs from endpoints (Logstash Forwarder, Logstash Agent, nxlog, log-courier, Beaver) lead when added together. These are followed by following Logstash inputs in this order: file, syslog, TCP, UDP, Log4j, Twitter, Kafka, and S3.

What queueing software do you use with Logstash?

40% of respondents do not use queueing software to buffer data between different stages of Logstash. Out of the rest, 39% use Redis, 10% use RabbitMQ, and 9% use Kafka. Other mentions include Amazon SQS and ZeroMQ.

How are you managing Logstash instances?

53% of respondents are managing Logstash instances manually. The rest are using Puppet, Ansible, Chef, and Saltstack, in that order. A few respondents also mentioned using home-grown configuration management tools.


How are you monitoring the health of Logstash today?

52% of respondents do not currently monitor Logstash health. The rest use custom scripts and 3rd party tools, mostly Nagios. 

What additional reliability guarantees do you need?

The majority of the respondents that answered this question are interested in all the additional reliability guarantees we listed and suggested some additional ones in the write-in responses. This is great! We're thrilled to see Logstash's evolution to being a mission critical part of your infrastructure. Good news is that data reliability is a big theme in our 2.x roadmap.

What additional performance improvements would you like to see?

Similarly, many of you are interested in additional performance improvements, among which faster event processing, lower resource usage, faster startup time, and improved Elasticsearch insert performance are at the top. This aligns with high-throughput deployments some of you are running, based on answers to Question 5.

Are there additional Logstash plugins you would like to see added?

Many respondents skipped this question, but out of 72 respondents, almost half are interested in an HDFS plugin. Cassandra, MongoDB, and Oracle also figured pretty high and other write-in answers indicated additional interest in better integration with other SQL and NoSQL datastores.

Some of you mentioned intent to contribute plugins to the Logstash ecosystem - we are excited to work with you to integrate your contributions! We greatly improved the plugin contribution process in Logstash 1.5 and provided more detailed documentation on how to submit your contributions.

What Logstash health metrics would you like to see accessible via the API?

An overwhelming majority of respondents are interested in Logstash health metrics via an API. All of our suggestions ranked high, and many of you had other great ideas!

  • average processing time
  • back pressure metrics and alerts
  • buffer fill ratios
  • number of events in pipeline
  • resource utilization, such as memory and CPU
  • total data transfer and data transfer rate
  • per-plugin health metrics

Would you be interested to participate in meetups?

35% of those that responded to this question asked to be contacted by us to speak at meetups. You will hear back from one of our Developer Relations folks soon!

Survey Locations

Based on IP information, the survey enjoyed participation from all parts of the globe, with highest concentration of responses originating in the San Francisco Bay Area, Midwestern United States, and Western Europe.  

Any other comments?

We were delighted to get your general comments at the end of the survey. Many of you simply said “thank you" and “great job", which was very-very much appreciated by the Logstash development team!!

We love all of your comments, so it was really hard to select only a couple to share in this blog. You can read the rest of the comments in the raw data file we provide (see below).

“Have been using ELK stack for ~6 months and feeding many logs and other data into it. This is some seriously useful stuff and we are finding out things we've never seen before. Thank you!"

“Keep up the good work and don't keep the community in the dark! Get some more .eu employees which can answer the mailinglist and on IRC."

“Love using the ELK stack. When asked this week if it could replace Splunk, i said no. And i don't really care, because you shouldn't try to replace it. That said, the killer feature in Splunk is that i can define fields in the search mode AFTER the logs are indexed. I feel that also makes it terribly slow. Keep doing what you're doing…"

“Your team deserves a large number of internet high fives."

“I seriously need some more logstash stickers.  ;-)"

Who doesn't?!? For those of you that gave us your contact information, we'll be reaching out soon to share stickers and other goodies. Thank you, all!

Conclusions

We will continue to pore over the results in search of data that will help us develop a better product for you. However, some conclusions are clear:

  • We need to invest in better forwarders for Logstash
  • We should continue to make Logstash easier to manage
  • We need to provide a way for users to monitor the health of Logstash

The good news is that many of these themes are already part of our roadmap, which you can read about on the Logstash Roadmap page. Stay tuned for more news and announcements about future developments as we work through this feedback.

Raw Result Data

You can download result data here. As promised, all personally-identifiable information has been removed from these files.

We also provide data transformation scripts and configuration files used to load this data into Elasticsearch for analysis. These files were used with Logstash 1.5 RC2, Elasticsearch 1.5, Shield 1.2, and a fork of Kibana 4.0.2 enhanced with percentage data labels (we are tracking an issue to implement similar functionality in the product in the future).

The results pack consists of the following files:

  • LogstashUserSurvey.html - Copy of the survey questions in HTML.
  • data.csv - CSV file based on the export from SurveyMonkey, with one row per respondent.
  • transpose.py - Python script that transforms data.csv to a click stream representation, with one row per survey answer.
  • data_events.csv - Sample output of transpose.py run on provided data.csv. This format is a good fit for population-wide analysis that was used to summarize the rest of the results.
  • csv_events.conf - Logstash configuration used to load data_prive.csv into Elasticsearch.
  • data_users.csv - Slightly cleaned up version of data.csv. This format is a good fit for entity-centric analysis, which was used for the timeline and geographic distribution charts shown above.
  • csv_users.conf - Logstash configuration used to load data_prime_eci.csv into Elasticsearch.
  • export_logstash_survey.json - Kibana 4 saved searches, visualizations, dashboards in JSON format.

Big thanks to all of you that responded and gave us your feedback! If you'd like to be notified the next time we run a user survey on Logstash - or any of Elastic's products - you can sign up here.