UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.
Packetbeat is the tool that lets you ship your network data into Elasticsearch so you can visualize them in Kibana.
How Does It Work?
Packetbeat taps into the network stack on the machine it's running on, analyzes all the traffic for known protocols and extracts useful information from each packet, before shipping that information into Elasticsearch. Currently it supports HTTP, PostgreSQL, MySQL, Redis and Thrift.
Packetbeat is a competitor to Logstash, you say? Well, yes and no. There is some overlap between Packetbeat and Logstash, they both feed data into Elasticsearch, but their use cases are still slightly different. Packetbeat focuses on network traffic while Logstash focuses on logs. Both are capable of extracting metrics and Packetbeat also has a sibling called Filebeat that can consume log files. Filebeat can either feed logs directly into Elasticsearch or forward them to Logstash for centralised preprocessing.
Installing Packetbeat is very straightforward. Download the package matching your platform from https://www.elastic.co/downloads/beats/packetbeat and install it with your package system or simply extract the archive if you opted for one of the
Once Packetbeat is installed, it is time to edit
packetbeat.yml. The Debian and the RPM packages install this file directly to
/etc/packetbeat/packetbeat.yml, but the generic packages just leave it in the directory you extracted it to.
There are three noteworthy properties in this section,
name attribute is meant to uniquely identify the server Packetbeat is monitoring and it defaults to the host name. You can safely leave it blank or just give a name that makes sense to you.
tags attribute allows you to specify tags that will be applied to all data originating from this server. Use it for what makes sense to you, but tags like test, staging, production, database or application are not uncommon. In the end the tags are useful when you want to filter data from different groups of servers.
A short section, but no less important. This is where you specify how Packetbeat will integrate with the networking stack on the server. At minimum, you should specify which devices to monitor. On my Mac, I used something as simple as this:
interfaces: device: en0
The device names are platform specific and you should choose what matches your platform. Run
ifconfig in your terminal for a list of devices. On Macs your Wi-Fi and your ethernet card are usually named either
en1 and Linux
eth1 are common. There is also a special name called
any where Packetbeat will connect with all available devices.
Packetbeat supports a few more options in this section, each with different trade-offs that are out of scope for this article, but before deploying into production, have a look at the interfaces section in the guide.
Packetbeat currently supports monitoring HTTP, PostgreSQL, MySQL, Redis and Thrift. For all protocols there is one mandatory parameter,
ports. For demo purposes, declaring ports and protocols in use is usually sufficient, but in particular for the HTTP protocol there are a few things to be aware of.
hide_keywords parameter allows you to specify form fields in GET or POST requests that should not be indexes. This is important since some fields may contain sensitive data like passwords in clear text.
A very common setup is to have a standard HTTP server, like Apache or Nginx in front of your application server and then let this server handle things like SSL termination. One disadvantage of this is that it hides the client IP that request originated from. Luckily, both Apache and Nginx and most other servers you would want to use in such a scenario support adding the client IP in a header on each request. Packetbeat can then be configured with the
real_ip_header parameter to extract this information.
Terminating SSL before it reaches the application server is also the only way to let Packetbeat monitor services provided over HTTPS.
Like Logstash, Packetbeat supports multiple outputs and most frequently people use the Elasticsearch output. In this example I've defined the necessary parameters for connecting to a cluster on Found. Specifically this includes:
output: elasticsearch: enabled: true host: 20ea7fisk409df2c3b523a2b131d.eu-west-1.aws.found.io port: 9243 save_topology: true protocol: https username: readwrite password: <password>
This section is optional and it is disabled by default, but on Linux systems it will allow monitoring traffic between processes on the same host. It will also populate the
client_proc fields. All that is required is to give the process a name and a grep filter so that Packetbeat can identify the process id.
The complete reference for
packetbeat.yml is available in The Guide.
The Deb and RPM packages include launch scripts so you can launch Packetbeat like any other service in your distribution. For the generic packages you might have to specify the location of
packetbeat.yml, like this:
sudo ./packetbeat -c ./packetbeat.yml -v
-v for verbose is also useful the first time after a change to
packetbeat.yml. Privileged access (sudo) is also required for sniffing packets.
Visualizing The Data
Packetbeat feeds data into Elasticsearch in a format that is timestamped and suitable for visualizing with Kibana. You can use either Kibana 3 or 4.
Packetbeat does not come with any Kibana dashboards out of the box, but in all likelihood, you most likely want to build your own, a dashboard targeting your data and the questions you have. If you have used Kibana previously, all you really need to know to get started is the index pattern to use and that is: "[packetbeat-]YYYY.MM.DD". If you've never used Kibana before, I recommend An Introduction to the ELK stack. There is also a live demo of Kibana 4 with data fed by packetbeat.