09 November 2015 Engineering

Introducing the de_dot filter

By Aaron Mildenstein

Hello, fellow Logstashers!

We’ve been pleased with the community response to our recent 2.0 release.  As this was a major release, there were some breaking changes, especially in conjunction with the Elasticsearch 2.0 release.  One of these changes was that Elasticsearch 2.0 does not support field names with a . (or dot character) in them.  A few Logstash plugins had to be changed to not use dotted fields to be compatible, like the metrics and elapsed filters.  Unfortunately, many users have no control over the sources of their fields.  This has resulted in a poor user experience where dotted fields existed.  To address this issue, the de_dot filter has been created.  You can find the plugin documentation for it here.

You can install the de_dot filter using the plugin command:

bin/plugin install logstash-filter-de_dot

The de_dot filter will replace dots with underscores by default.  It’s as simple as that!  If I had a dotted field called baked.apple.pie, the de_dot filter would change the field name to baked_apple_pie.  You can also choose the separator with the separator configuration option.  A separator does not have to be a single character, either.  You can make a separator anything you want, except a dot.  Please remember to change your queries and filters in Kibana to match whatever you change in the de_dot filter!

We also added an option to translate dots into nested fields.  If you set nested => true in your configuration, Logstash will ignore the separator and attempt to convert dotted fields into nested fields.  For example, if I had a dotted field called top.level1.level2, it would become [top][level1][level2].  The flexibility is yours to choose.

We’ve provided a fields configuration option to allow you to specify which fields need to be processed.  If no fields are provided, Logstash will check all of your fields for dots.  The de-dot process can be a performance hit to your pipeline, so pre-specifying fields will prevent Logstash from having to check each for dots.  Please note that you won’t need to use a conditional to check for dots in field names, as the plugin does this for you.  The idea behind specifying fields manually is to prevent Logstash from having to check every field, which can result in improved performance.  If you are in a situation where you do not know all possible field names, it may be helpful to tag events which may have dotted fields, and put the de_dot filter within a conditional that checks for that tag.

The fields option also allows you to de_dot nested fields.  The current release of the de_dot filter will only check the top level of an event for dotted fields.  If you have a dotted field in a nested object, the de_dot filter cannot find it without help.  In this case you must use the fields directive to specify a nested field.  This can be done by using the field reference syntax: fields => [ "[top][level.1]", "[top][level.2]" ] will find the nested dotted fields level.1 and level.2.  You can use either the separator or nested options to act on these fields.

We hope that the de_dot filter will help ease your transition to Logstash 2.0 and Elasticsearch 2.0.  If you need help configuring the de_dot filter, please visit our discussion forum at https://discuss.elastic.co/c/logstash.  If you have suggestions or find an issue, please submit a ticket at https://github.com/logstash-plugins/logstash-filter-de_dot.

Happy Logstashing!