10 May 2018

Logstash Lines: Google BigQuery output now uses Streaming API

By Monica Sarbu

Welcome to Logstash Lines! With these weekly series, we're keeping you up to date with what's new in Logstash, including the latest commits and releases.

Did you know that Logstash 6.2 is already available? Try it and let us know what you think.

Logstash Output to Google BigQuery now uses Streaming API (4.0.0)

Our friends at Google had made a massive improvement to the Google BigQuery output plugin with a refactor to use BigQuery Streaming API instead file based batching and uploading. The streaming API has multiple benefits, such as allowing real-time incoming data analysis and queries, better resilience during start/stop/restart as work is not bach oriented, and overall better performance and support from the underlying client library. This is a breaking change, updating to 4.0.0 requires that you use an IAM JSON credentials file rather than the deprecated P12 files. Applications using Application Default Credentials (ADC) will continue to work.

Please take note that the change to the Streaming API means that the streaming inserts incur an expense.

PQ Robustness Improvements

We’ve been busy improving the reliability of Logstash’s PQ serialization. For some data types we’ve had issues cleanly serializing and deserializing. This has led to some bugs that we’ve resolved over the past few releases. We have realized that to cleanly resolve these issues we will need to update the PQ file format version in Logstash 6.3.0 . That means that queues created in 6.3.0+ will not be backwards compatible with older versions, meaning that a downgrade would require a queue drain.

Additionally, some older PQs may need to be drained on those old versions before upgrading. When starting 6.3.0 after an upgrade, Logstash will detect that situation and alert you to either revert and drain or delete your queue to run Logstash. You can track our progress on these issues in: logstash#9494

Queue Batching and Timeouts

For a number of inputs, specifically HTTP, it can be desirable to atomically write to the queue with the timeout. If the timeout is exceeded, the write fails, and nothing is written. Additionally, it makes sense for this to be a write of a batch, not a single event, since many plugins are inherently batch oriented. We’ve just landed a PQ based implementation of this in logstash#9498. We still need to implement this for memory based queues before we merge the feature branch. The meta issue tracking the full scope of work is logstash#9398.

Field Reference Regularity Improvements

Logstash’s field reference syntax has always been a bit weird. We’ve opened logstash#9531 to discuss the way forward here. What we’re going to do is fix this behavior, which is a little dangerous because it could process data in a different way than users expect, but make it an opt-in feature for 6.x. We will change the default behavior in 7.0, and remove the legacy parsing.

Other Logstash changes

Repository: elastic/logstash

Documentation

Changes in master:

  • [DOCS] Fixes links to built-in users #9518
  • [DOCS] Removes X-Pack release notes and breaking changes #9509
  • Give an example of a single line Hash. #9505
  • [DOCS] Enables editing links for X-Pack pages #9500

Repositories under elastic/logstash-plugins

logstash-codec-netflow - 3.13.2

  • Fix incorrect definitions of IE 231 and IE 232

logstash-filter-mutate - 3.3.2

  • Fix when converting to `float` and `float_eu`, explicitly support same range of inputs as their integer counterparts; eliminates a regression introduced in 3.3.1 in which support for non-string inputs was inadvertently removed.

logstash-input-file - 4.1.2

  • Fix `require winhelper` error in WINDOWS #184
  • Fix when no delimiter is found in a chunk, the chunk is reread - no forward progress is made in the file #185
  • Fix JAR_VERSION read problem, prevented Logstash from starting #180
  • Fix sincedb write error when using /dev/null, repeatedly causes a plugin restart #182

logstash-input-s3 - 3.3.3

  • Symbolize hash keys for additional_settings hash #148

logstash-output-elasticsearch - 9.1.2

  • No user facing changes, removed unnecessary test dep.

logstash-output-google_bigquery - 4.0.1

  • Documentation cleanup

logstash-output-s3 - 4.1.2

  • Symbolize hash keys for additional_settings hash #179

Repository: elastic/logstash-docs

Changes in versioned_plugin_docs:

  • Added link to v4.0.0 google_bigquery documentation #538
  • Fix broken build #537
  • Fix broken links #536
  • Add branch attribute to resolve doc paths #535