16 February 2013 Engineering

Logging Elasticsearch Events with Logstash (and Elasticsearch)

By Zachary Tong

WARNING: This article contains outdated information. We no longer recommend taking its advice.

A popular use of Elasticsearch is to index and store log files, often through a Logstash server. Logstash takes log files, parses them into fields and stores them in Elasticsearch for easy searching/sorting/faceting. A popular front-end for Logstash is Kibana, which lets you easily dig through your mountains of log data.

Hmm…doesn't Elasticsearch log a variety of events itself? Could you configure Elasticsearch to dump it's logs to Logstash…which would then insert them back into Elasticsearch for searching? The answer is yes, and it is surprisingly easy.

At first glance, it seems like the sort of thing that will tear a hole in the fabric of space and time. Or at the very least destroy your cluster.

The fact is, this could be a downright bad idea. But hey, it's possible, simple and fun! Perhaps someone will find this useful and applicable for their setup.

Configuring Elasticsearch

The first step is to configure Elasticsearch so that logs can be piped into Logstash. There are several ways to do this in Log4J, but the method chosen was to pipe the logs to Logstash over a socket.

The advantage of this method is that you can have several ES nodes all sending data over a socket to a single Logstash server. If you chose something else, like monitoring files, the Logstash server would need access to each server to access the file.

There are two modifications to the default `logging.yml` file that you must make(the final logging.yml can be found in this gist). First, tell ES that we are adding a new output logger by appending a “socketappender" label the `rootlogger` line:

rootLogger: INFO, console, file, socketappender

Next, place this configuration code at the bottom of your file:

socketappender:
   type: org.apache.log4j.net.SocketAppender
   port: 9500
   remoteHost: localhost
   layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

This config block says that the label “socketappender" is an `org.apache.log4j.net.SocketAppender` class, connects to `remoteHost` on `port` 9500. You can, of course, change the host and port to suit your fancy. The layout section is copied from prior logging examples in the file, and basically controls how the log output should be formatted.

Save the file and restart Elasticsearch, and you're done with configuring ES. Simple!

Configuring Logstash

I'm going to assume that you have Logstash installed and know how to use it. What we need to do is tell Logstash to act as a Log4J server, so that it can accept socket connections from Elasticsearch. Create a new configuration file and place this inside:

input {
   log4j {
      mode => "server"
      port => 9500
      type => eslogs
   }
}
output {
   stdout { debug => true debug_format => "json"}
   elasticsearch {
      type => eslogs
      cluster => "elasticsearch"
   }
}

This config file makes Logstash act as a Log4J server on port 9500. Any log events accepted over the Log4J server will be set as the “eslogs" type. The output section is fairly typical of an ES-logstash setup – adjust as necessary to connect to your ES cluster. Make sure the “type" is set to whatever was set in the input. Fire up Logstash with this configuration and you are all done!

Conclusion

If everything is setup correctly, ES will now be piping log data into Logstash, which will then pipe it back into your ES cluster. If you have Kibana installed, you can easily start searching your ES log data.

The usefulness of this entire setup is a bit suspicious. On one hand, it gives you the power and flexibility of ES for digging into your ES logs, probably after a crisis has been resolved and you are looking for the root cause. It centralizes and collects all the logs of your cluster into one location for debugging, instead of hunting through a dozen individual log files and trying to piece together the sequence of events.

On the other hand, if nodes are dropping in and out of your cluster due to problems, their log data is by default not going to be available…so this setup would not be overly useful during the actual crisis itself. It also adds more overhead to your cluster in terms of indexing, which could be the last thing you want if your cluster is melting down. It could also generate a significant amount of traffic depending on how you have your logging set (e.g. you probably don't want Debug set on all your nodes).

Regardless, it's a fun little experiment that makes you appreciate the power and flexibility of all the tools described above.