23 März 2015

This Week in Logstash: Notes from the Engineering All-Hands

Von Suyog Rao

Welcome to our new blog series This Week in Logstash! In these posts, we'll share the latest happenings in the world of Logstash and its ecosystem. Let's kick off the series with a special this week — The Engineering All Hands Edition. After wrapping up our first user conference 2 weeks ago, all of our engineers convened in our brand new office in Mountain View for a week of discussions and fun activities. We had a packed agenda to discuss development activities across products, engineering culture, community engagement, and unconference style break-out sessions for discussing each project in detail.

We even had an engineering wide hackathon!

At the Logstash break-out session, we discussed the following things:

  • Plugins ecosystem
  • Pipeline semantics
  • Testing infrastructure
  • Dev workflow/GitHub labeling
  • Persistent Queues
  • Documentation Improvements

I'd like to provide notes from few of these discussions here:

Plugins ecosystem

One of the major goals of Logstash 1.5 was to separate plugins to individual entities outside of the core. Plugins today exist as separate repositories in the GitHub logstash-plugins organization. It is our hope that these changes will allow for more community involvement. To this end, we discussed the details of how we would make it easier for developers to create more plugins and contribute to existing ones. We are looking into providing per-repository commit rights to original authors and contributors. We would love for developers to submit their plugins to logstash-plugins — we discussed all the infrastructure we can provide to support this move. Things like continuous integration, auto generation of documentation, and discoverability are some of the benefits of having your plugins hosted in logstash-plugins repository. Watch for a blog post coming soon to cover all these details.

Pipeline semantics

In this session, we discussed providing clean pipeline semantics for plugin developers. With plugins separated from the main core of Logstash, we can provide a clean, documented API to the different pipeline stages that is easy to understand and develop for. Our intention is to make it easy to write plugins in any language (like Java, Clojure, Scala, etc.) that runs on the JVM. We want to standardize the pipeline behavior on errors and non-recoverable conditions across the different stages. We discussed providing an abstraction layer to pipeline internals like threads, intra-stage queues, and dead-letter queues which plugin developers can safely use. We brainstormed the idea of rewriting the LogStash::Event object in Java to allow for efficient serialization across JVM languages.

Testing infrastructure

At Elastic, providing quality software is in the DNA of our engineers. We invest a lot of effort in testing infrastructure across our projects. We discussed how we can bolster Logstash testing to support plugins and add more integration testing. We discussed the use of Docker to bring up external services like Redis and Elasticsearch. This would make setup and teardown easy during integration testing. We discussed RSpec best practices, randomized testing, adding code coverage reports on our Jenkins runs, and so on. We are already tracking throughput performance of Logstash (across releases) using the ELK stack and we would like to expose that to our community. Long term, we would like to expose infrastructure for developers to test their patch against the performance test suite to make sure they don't introduce regressions!

Whiteboarding our testing improvements: 

Dev workflow

In this session we discussed our development workflow — our intention is to document a lightweight process for everyone to use. We want a consistent GitHub experience across all our open source projects, so we are changing Logstash issue and pull request labeling to match Elasticsearch and Kibana labeling, with all the same colors too :). We want to make it easier to navigate through our GitHub issues and pull requests.

All our work at Elastic is peer reviewed, and Pull Requests (PR) are no exceptions. We discussed how we can streamline PR reviews and provide quick feedback. We discussed the details of when a PR is ready to be merged. As a strict policy — no tests, no merge :) For complex features, we require 2 developers to review the code. This has a side benefit of having developers new to an area ramp up on the code internals. Every member of our team enjoys the weekly tree-age (a word play on triage), and merge party, so we will continue to spend our Fridays on IRC triaging issues and Pull Requests. Please join us!

Documentation improvements

Improving documentation is an ongoing theme in all our releases. We discussed documenting in detail the internal architecture, scaling Logstash deployments and production best practices. We want to bring back the Logstash cookbook which was previously maintained by Jordan Sissel. We want to focus on use case driven documentation like "How do I ship Apache web access logs using syslog to Elasticsearch?" and so on.

More to come

It was wonderful to host all our colleagues from different parts of the world in sunny California! If you attended our conference I hope you had a chance to say hi to a few of them — we love interacting with our users and hearing from you.

We have exciting plans for Logstash — stay tuned for weekly updates as we turn these ideas into reality.