24 August 2017 Engineering

Ingest Node to Logstash Config Converter

By Armin Braun

Motivation

Whether you want to use more than just Elasticsearch as a target for your data ingestion, configure a more complex set of transformations than ingest nodes allow for or use Logstash's new persistent queue feature to make your data ingestion pipeline more resilient, Logstash 6.0's new Ingest to Logstash config migration tool has you covered. It allows you to automatically convert Ingest filter configurations into equivalent Logstash filter configurations and thus removes a large part of the manual effort currently required in the migration from Ingest node to Logstash node.

While transforming a simple Ingest configuration like the one shown below to Logstash is relatively easy to do by hand.

{
  "description": "Pipeline to parse Apache logs",
  "processors": [
    {
      "append": {
        "field" : "client",
        "value": ["host1", "host2"]
      }
    }
  ]
}

becomes:

filter {
   mutate {
      add_field => {
         "client" => [
            "host1",
            "host2"
         ]
      }
   }
}
output {
   elasticsearch {
      hosts => "localhost"
   }
}

Converting a complex case like the following example though, is a much more daunting task.

{
  "description": "Pipeline to parse Apache logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{COMBINEDAPACHELOG}"],
        "on_failure" : [
          {
            "set" : {
              "field" : "error",
              "value" : "field does not exist"
            }
          },
          {
            "convert": {
              "field" : "client.ip",
              "type": "integer"
            }
          }
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": [
          "dd/MMM/YYYY:HH:mm:ss Z"
        ],
        "locale": "en"
      }
    },
    {
      "geoip": {
        "field": "clientip",
        "target_field": "geo"
      }
    }
  ]
}

It requires correctly converting field names like client.ip into Logstash bracket notation [client][ip], knowing the syntactic details of the Grok filter in Logstash and so on.

Using the tool though, we easily arrive at:

filter {
   grok {
      match => {
         "message" => "%{COMBINEDAPACHELOG}"
      }
   }
   if "_grokparsefailure" in [tags] {
      mutate {
         add_field => {
            "error" => "field does not exist"
         }
      }
      mutate {
         convert => {
            "[client][ip]" => "integer"
         }
      }
   }
   date {
      match => [
         "timestamp",
         "dd/MMM/YYYY:HH:mm:ss Z"
      ]
      target => "@timestamp"
      locale => "en"
   }
   geoip {
      source => "clientip"
      target => "geo"
   }
}
output {
   elasticsearch {
      hosts => "localhost"
   }
}

Imagine going through all these steps by hand potentially for an ingest node config written by someone else a long time ago and without much prior Logstash experience.

This tool enables you to do just that in a matter of minutes instead of hours, allowing you to quickly prototype Logstash in your environment without the need for a significant time investment.

Usage

The tool is implemented entirely in Javascript and executed via Java's Nashorn scripting engine, making it at least as portable as Logstash itself.

You can run it by using the wrapper script found at `bin/ingest-convert.sh`.

For example:

➜ cat /tmp/ingest.json 
{
  "description": "Pipeline to parse Apache logs",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{COMBINEDAPACHELOG}"],
        "on_failure" : [
          {
            "set" : {
              "field" : "error",
              "value" : "field does not exist"
            }
          },
          {
            "convert": {
              "field" : "client.ip",
              "type": "integer"
            }
          }
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": [
          "dd/MMM/YYYY:HH:mm:ss Z"
        ],
        "locale": "en"
      }
    },
    {
      "geoip": {
        "field": "clientip",
        "target_field": "geo"
      }
    }
  ]
}
➜ bin/ingest-convert.sh --input=file:///tmp/ingest.json --output=file:///tmp/ingest.cfg
➜ cat /tmp/ingest.cfg
filter {
   grok {
      match => {
         "message" => "%{COMBINEDAPACHELOG}"
      }
   }
   if "_grokparsefailure" in [tags] {
      mutate {
         add_field => {
            "error" => "field does not exist"
         }
      }
      mutate {
         convert => {
            "[client][ip]" => "integer"
         }
      }
   }
   date {
      match => [
         "timestamp",
         "dd/MMM/YYYY:HH:mm:ss Z"
      ]
      target => "@timestamp"
      locale => "en"
   }
   geoip {
      source => "clientip"
      target => "geo"
   }
}
output {
   elasticsearch {
      hosts => "localhost"
   }
}

That's all there is to it. One thing to note is that we automatically added an output that assumes a local Elasticsearch host at the default address as the output of the filter. You may have to adjust this section of the generated configuration as well as configure an appropriate Logstash input before deploying the configuration.

If you want to get your configuration up and running on Logstash for the first time even quicker, we've got you covered as well. Simply append the flag --append-stdio when invoking the tool and it will generate a config that works out of the box, using input coming from standard-in and returning output printed to standard-out.

➜  bin/ingest-convert.sh --input=file:///tmp/ingest.json --output=file:///tmp/ingest-stdout.cfg --append-stdio 
➜  cat /tmp/ingest-stdout.cfg 
input {
   stdin {
   }
}
filter {
   grok {
      match => {
         "message" => "%{COMBINEDAPACHELOG}"
      }
   }
   if "_grokparsefailure" in [tags] {
      mutate {
         add_field => {
            "error" => "field does not exist"
         }
      }
      mutate {
         convert => {
            "[client][ip]" => "integer"
         }
      }
   }
   date {
      match => [
         "timestamp",
         "dd/MMM/YYYY:HH:mm:ss Z"
      ]
      target => "@timestamp"
      locale => "en"
   }
   geoip {
      source => "clientip"
      target => "geo"
   }
}
output {
   stdout {
      codec => "rubydebug"
   }
}

Outlook

Since this tool is written completely in Javascript it allows for integration into Logstash's upcoming user interface and an online version, giving you a easier interfaces to convert and edit Logstash configurations. Last but not least trying this feature (or Logstash 6.0 in general) allows you to become an Elastic Pioneer by giving us feedback for an opportunity to win some cool swag.