11 octobre 2016

Logstash Lines: Getting ready for 5.0.0-rc1

Par Suyog Rao

Welcome back to The Logstash Lines! In these weekly posts, we'll share the latest happenings in the world of Logstash and its ecosystem.

Last week has been a slow week with people recovering from our engineering all-hands in Prague, but this update spans the last 3 weeks:

5.0.0-RC1 fixes:

  • Windows: Fixed signal HUP warnings on startup (Issue 5239).
  • Windows: Enabled logging on Windows by fixing log4j config location (Issue 5971).
  • Configuration errors are now logged to file, not printed to stderr (Issue 5975).
  • Elasticsearch Output: Passwords in Logstash logs partially visible when connection to Elasticsearch fails (Issue 482).

Settings Improvements (5.0.0)

  • All setting validations are now deferred until after setting sources have been processed (flags, logstash.yml, etc)
  • New setting type Bytes for human-writable byte values like '30mb'.
  • Fix for bug: Logstash checks default directory for write permissions always, even when overridden (Issue 6004).

Persistent Queues

We've opened a PR to merge the persistent queue feature branch to master and 5.x. This is still not feature complete, but it's pretty close. Also, long-living feature branches are no good.. The goal of this iteration is to:

  • Prevent inflight data loss upon application/machine crash by aiming at at-least-once delivery semantic. Note that at this point there is no change in the per-event processing/delivery failure where best-effort is done at the plugin level and can result in dropped event. This will be addressed in the future Dead Letter Queue feature, see #5283.
  • Provide backpressure handling within Logstash using a variable length persistent queue. So you don't have to use Redis or Kafka just to handle event surges.

All existing unit tests are passing, and we've added file based integration tests last week.

Open items:

  • Do we enable Persistent Queues by default or expose it as experimental, and behind a feature flag?
  • Run performance benchmarks for common configurations
  • Add queue recovery tests
  • Add configurable thresholds for queue max capacity using byte-size limit or as a disk percentage.

Integration Test Framework

We now have an end-end integration test framework in core which uses a LS binary to test against real services like Kafka, ES and Filebeat. Tests are run locally and do not need a VM. It is super easy to configure — uses travis style files to setup services, and tests are written using RSpec. We already have a few integration tests built using this framework on travis, and they will soon be added to logstash-ci Jenkins platform. With this change, we have integration tests running in every plugin repo and in core to validate code changes.