Accessing Event Data and Fields in the Configurationedit

The logstash agent is a processing pipeline with 3 stages: inputs → filters → outputs. Inputs generate events, filters modify them, outputs ship them elsewhere.

All events have properties. For example, an apache access log would have things like status code (200, 404), request path ("/", "index.html"), HTTP verb (GET, POST), client IP address, etc. Logstash calls these properties "fields."

Some of the configuration options in Logstash require the existence of fields in order to function. Because inputs generate events, there are no fields to evaluate within the input block—​they do not exist yet!

Because of their dependency on events and fields, the following configuration options will only work within filter and output blocks.

Field references, sprintf format and conditionals, described below, will not work in an input block.

Field Referencesedit

It is often useful to be able to refer to a field by name. To do this, you can use the Logstash field reference syntax.

The basic syntax to access a field is [fieldname]. If you are referring to a top-level field, you can omit the [] and simply use fieldname. To refer to a nested field, you specify the full path to that field: [top-level field][nested field].

For example, the following event has five top-level fields (agent, ip, request, response, ua) and three nested fields (status, bytes, os).

{
  "agent": "Mozilla/5.0 (compatible; MSIE 9.0)",
  "ip": "192.168.24.44",
  "request": "/index.html"
  "response": {
    "status": 200,
    "bytes": 52353
  },
  "ua": {
    "os": "Windows 7"
  }
}

To reference the os field, you specify [ua][os]. To reference a top-level field such as request, you can simply specify the field name.

For more detailed information, see Field References Deep Dive.

sprintf formatedit

The field reference format is also used in what Logstash calls sprintf format. This format enables you to refer to field values from within other strings. For example, the statsd output has an increment setting that enables you to keep a count of apache logs by status code:

output {
  statsd {
    increment => "apache.%{[response][status]}"
  }
}

Similarly, you can convert the timestamp in the @timestamp field into a string. Instead of specifying a field name inside the curly braces, use the +FORMAT syntax where FORMAT is a time format.

For example, if you want to use the file output to write to logs based on the event’s date and hour and the type field:

output {
  file {
    path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}"
  }
}

Conditionalsedit

Sometimes you only want to filter or output an event under certain conditions. For that, you can use a conditional.

Conditionals in Logstash look and act the same way they do in programming languages. Conditionals support if, else if and else statements and can be nested.

The conditional syntax is:

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}

What’s an expression? Comparison tests, boolean logic, and so on!

You can use the following comparison operators:

  • equality: ==, !=, <, >, <=, >=
  • regexp: =~, !~ (checks a pattern on the right against a string value on the left)
  • inclusion: in, not in

The supported boolean operators are:

  • and, or, nand, xor

The supported unary operators are:

  • !

Expressions can be long and complex. Expressions can contain other expressions, you can negate expressions with !, and you can group them with parentheses (...).

For example, the following conditional uses the mutate filter to remove the field secret if the field action has a value of login:

filter {
  if [action] == "login" {
    mutate { remove_field => "secret" }
  }
}

You can specify multiple expressions in a single condition:

output {
  # Send production errors to pagerduty
  if [loglevel] == "ERROR" and [deployment] == "production" {
    pagerduty {
    ...
    }
  }
}

You can use the in operator to test whether a field contains a specific string, key, or list element. Note that the semantic meaning of in can vary, based on the target type. For example, when applied to a string. in means "is a substring of". When applied to a collection type, in means "collection contains the exact value".

filter {
  if [foo] in [foobar] {
    mutate { add_tag => "field in field" }
  }
  if [foo] in "foo" {
    mutate { add_tag => "field in string" }
  }
  if "hello" in [greeting] {
    mutate { add_tag => "string in field" }
  }
  if [foo] in ["hello", "world", "foo"] {
    mutate { add_tag => "field in list" }
  }
  if [missing] in [alsomissing] {
    mutate { add_tag => "shouldnotexist" }
  }
  if !("foo" in ["hello", "world"]) {
    mutate { add_tag => "shouldexist" }
  }
}

You use the not in conditional the same way. For example, you could use not in to only route events to Elasticsearch when grok is successful:

output {
  if "_grokparsefailure" not in [tags] {
    elasticsearch { ... }
  }
}

You can check for the existence of a specific field, but there’s currently no way to differentiate between a field that doesn’t exist versus a field that’s simply false. The expression if [foo] returns false when:

  • [foo] doesn’t exist in the event,
  • [foo] exists in the event, but is false, or
  • [foo] exists in the event, but is null

For more complex examples, see Using Conditionals.

The @metadata fieldedit

In Logstash 1.5 and later, there is a special field called @metadata. The contents of @metadata will not be part of any of your events at output time, which makes it great to use for conditionals, or extending and building event fields with field reference and sprintf formatting.

The following configuration file will yield events from STDIN. Whatever is typed will become the message field in the event. The mutate events in the filter block will add a few fields, some nested in the @metadata field.

input { stdin { } }

filter {
  mutate { add_field => { "show" => "This data will be in the output" } }
  mutate { add_field => { "[@metadata][test]" => "Hello" } }
  mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } }
}

output {
  if [@metadata][test] == "Hello" {
    stdout { codec => rubydebug }
  }
}

Let’s see what comes out:

$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
    "@timestamp" => 2016-06-30T02:42:51.496Z,
      "@version" => "1",
          "host" => "example.com",
          "show" => "This data will be in the output",
       "message" => "asdf"
}

The "asdf" typed in became the message field contents, and the conditional successfully evaluated the contents of the test field nested within the @metadata field. But the output did not show a field called @metadata, or its contents.

The rubydebug codec allows you to reveal the contents of the @metadata field if you add a config flag, metadata => true:

    stdout { codec => rubydebug { metadata => true } }

Let’s see what the output looks like with this change:

$ bin/logstash -f ../test.conf
Pipeline main started
asdf
{
    "@timestamp" => 2016-06-30T02:46:48.565Z,
     "@metadata" => {
           "test" => "Hello",
        "no_show" => "This data will not be in the output"
    },
      "@version" => "1",
          "host" => "example.com",
          "show" => "This data will be in the output",
       "message" => "asdf"
}

Now you can see the @metadata field and its sub-fields.

Only the rubydebug codec allows you to show the contents of the @metadata field.

Make use of the @metadata field any time you need a temporary field but do not want it to be in the final output.

Perhaps one of the most common use cases for this new field is with the date filter and having a temporary timestamp.

This configuration file has been simplified, but uses the timestamp format common to Apache and Nginx web servers. In the past, you’d have to delete the timestamp field yourself, after using it to overwrite the @timestamp field. With the @metadata field, this is no longer necessary:

input { stdin { } }

filter {
  grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] }
  date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] }
}

output {
  stdout { codec => rubydebug }
}

Notice that this configuration puts the extracted date into the [@metadata][timestamp] field in the grok filter. Let’s feed this configuration a sample date string and see what comes out:

$ bin/logstash -f ../test.conf
Pipeline main started
02/Mar/2014:15:36:43 +0100
{
    "@timestamp" => 2014-03-02T14:36:43.000Z,
      "@version" => "1",
          "host" => "example.com",
       "message" => "02/Mar/2014:15:36:43 +0100"
}

That’s it! No extra fields in the output, and a cleaner config file because you do not have to delete a "timestamp" field after conversion in the date filter.

Another use case is the CouchDB Changes input plugin (See https://github.com/logstash-plugins/logstash-input-couchdb_changes). This plugin automatically captures CouchDB document field metadata into the @metadata field within the input plugin itself. When the events pass through to be indexed by Elasticsearch, the Elasticsearch output plugin allows you to specify the action (delete, update, insert, etc.) and the document_id, like this:

output {
  elasticsearch {
    action => "%{[@metadata][action]}"
    document_id => "%{[@metadata][_id]}"
    hosts => ["example.com"]
    index => "index_name"
    protocol => "http"
  }
}