13 July 2016 Engineering

Using Beats with Amazon AWS

By Dara Gies

Ever-increasing quantities and varieties of data are being created and captured. Application, system and user behavior logs, network packets, sensor metrics... you name it. The availability of these data drives a diverse and vast set of use cases ranging from intrusion detection to user behavior analysis, network monitoring, machine monitoring, and remote sensor monitoring, just to name a few.

Beats are open source, purpose-built, lightweight, and efficient agents that acquire and feed data natively to Elasticsearch. Optionally, Beats can feed data to Logstash for further refinement before being forwarded to Elasticsearch.

As part of the Elastic Stack, Beats is a telemetry architecture, enabling the capture and transmission of measurements and other data from remote sources to Elasticsearch for analysis, aggregation, and search. Anywhere there is change, such as production and consumption, growth and demise, acceleration and deceleration, data has significance. GPS-driven tractors capture soil depths, soil fertility, and ground temperatures, all of which influence agricultural strategy. Ocean buoys record wave amplitudes that may help to warn of impending catastrophes or inform of welcome beach conditions. Automobiles capture thousands of metrics ranging from tire pressure to average speed to fuel consumption that facilitate an understanding of a vehicle's abilities and failures, as well as the owner's driving behaviors. Network sensors capture machine communication messages that help with understanding usage patterns that drive security and infrastructure policies.

In the previous article Running Elasticsearch on AWS we configured a three-node Elasticsearch cluster on AWS EC2. It was fairly straightforward and didn't take much time or effort. We will expand on the Running Elasticsearch on AWS example and configure Beats to feed the Elasticsearch cluster running on AWS EC2 instances. This will demonstrate how simple it is to configure a centralized machine resource monitoring solution.

First, we'll provide an overview of Beats, review available Beats and the environments in which they run - Part I. Then we'll do a step-by-step example installing, configuring, and verifying Beats output - Part II.

Part I - Beats Overview

Beats are purpose-built lightweight data shippers, or agents, that run on remote machines and feed Elasticsearch instances. Beats make it easy to get data into Elasticsearch. Beats are available on a number of operating systems such as Debian, Redhat, Linux and Mac.

There are several available Beats including Filebeat, Metricbeat, Packetbeat, Winlogbeat and Topbeat. Each Beat has a specific purpose or multiple purposes that are logically related, allowing each Beat to focus on its specific task and do it well.

Filebeat tails logs and can ship data to Logstash for further refinement, or directly to Elasticsearch for analysis and search. Filebeat can be installed on any machine that has applications that generate log data, such as a database or application server.

Metricbeat (Alpha) captures operating system metrics as services such as Apache web server and Redis. Metricbeat does everything Topbeat does and much more in that it captures operating system metrics such as per-process CPU, memory, and storage use as well as common application messages, such as web servers. Metricbeat will replace Topbeat, so you might consider using Metricbeat for development.

Packetbeat captures web, database, and other network protocols, enabling Kibana real-time analytics. Packetbeat is extensible, enabling the addition of new protocols, metrics and analytics.

Winlogbeat captures Windows event log system, application and security data, enabling monitoring of Windows machines.

Topbeat captures per-process memory, CPU and disk usage statistics. Topbeat can be installed across a set of machines and indexed into a central Elasticsearch cluster, enabling centralized machine resource monitoring. Topbeat is being replaced by Metricbeat, currently in Alpha, and the change is discussed in this article.

If you have metric data not addressed by available Beats, there's also Libbeat, a framework for developing new Beats.

The Beats Community is growing and developers are contributing everything from pingbeat, for capturing ping roundtrips, to redisbeat, for Redis monitoring, and to hsbeat, for capturing Java HotSpot VM performance counters.

Part II - Configuring Beats

In this example we will install Topbeat and Packetbeat on a development machine, in this case a Macbook Pro, and feed an Elasticsearch cluster running on AWS. Beats can be installed on any machine running a supported operating system that has network connectivity with the Elasticsearch cluster.

Configure Topbeat

Download Topbeat to a directory on the machine that will be running the Beats.

Download and extract the Topbeat archive using your preferred method, for example:

curl -L -O <a href="https://download.elastic.co/beats/topbeat/topbeat-1.2.3-darwin.tgz">https://download.elastic.co/beats/topbeat/topbeat-...</a>
tar xzvf topbeat-1.2.3-darwin.tgz

get_topbeat.png

This will create a folder named topbeat-1.2.3-darwin that contains topbeat, topbeat.yml and topbeat.template.yml.

Locate and open topbeat.yml in your preferred editor. There is a hosts property that tells Topbeat where Elasticsearch is located. Locate the hosts parameter and update it to refer to the AWS public hostname, for example:

topbeat_hosts.png

Next, find the template property, which defines the index template that defines the field mappings required by Topbeat. Topbeat will load the index template file topbeat.template.json to Elasticsearch. Uncomment the template property along with the name and path properties that immediately follow.

topbeat_template.png

Then, run Topbeat with the command:

sudo ./topbeat -e -c topbeat.yml

To verify documents are being indexed, we'll use curl to make a search request:

curl -XGET 'http://ec2-50-17-114-78.compute-1.amazonaws.com:9200/topbeat-2016.07.07/_search?pretty'

topbeat_request.png

The curl request has several components. The first is the AWS EC2 public hostname "ec2-50-17-114-78.compute-1.amazonaws.com". This should be changed to your EC2 instance's public hostname. The second component "topbeat-2016.07.06" is the index. Lastly, is the search request "_search" with an optional "pretty" argument to format the JSON response.

If you want to dive deeper into Topbeat, a comprehensive Topbeat Reference is available and it covers topics such as configuration and security.

Configure Packetbeat

Download Packetbeat to a directory on the machine that will be running the Beats. Download and extract the Packetbeat archive using your preferred method, for example:

curl -L -O<a href="https://download.elastic.co/beats/packetbeat/packetbeat-1.2.3-darwin.tgz">https://download.elastic.co/beats/packetbeat/packe...</a>
tar xzvf packetbeat-1.2.3-darwin.tgz

get_packetbeat.png

This will create a folder named packetbeat-1.2.3-darwin that contains three files: packetbeat, packetbeat.yml, and packetbeat.template.yml.

Locate and open packetbeat.yml in your preferred editor. There is a hosts property that tells Packetbeat where Elasticsearch is located. Locate the hosts parameter and update it to refer to the AWS public hostname, for example:

packetbeat_hosts.png

Next, find the template property, which defines the index template that defines the field mappings required by Topbeat. Topbeat will load the index template file topbeat.template.json to Elasticsearch. Uncomment the template property along with the name and path properties that immediately follow.

packetbeat_template.png

sudo ./packetbeat -e -c packetbeat.yml

To verify documents are being indexed, we'll use curl to make a search request:

curl -XGET 'http://ec2-50-17-114-78.compute-1.amazonaws.com:9200/packetbeat-2016.07.07/_search?pretty'

packetbeat_request.png

If you want to dive deeper into Packetbeat, a comprehensive Packetbeat Reference is available and it covers topics such as configuration and security.

Part III - Summary

Installing, configuring and running Beats is very easy. Beats address a broad set of use cases. They eliminate the challenge of acquiring data and feeding Elasticsearch. 

Beats can feed any network accessible Elasticsearch cluster, whether it resides on Amazon AWS EC2, Microsoft Azure, dedicated iron and even Elastic Cloud.

In the next installment of the AWS blog series, we'll cover installing Kibana on AWS EC2 and visualize the Beats indexed in this example.