Monitor Amazon Web Services (AWS) with Amazon Data Firehose

Monitor Amazon Web Services (AWS) with Amazon Data Firehoseedit

Amazon Data Firehose is a popular service that allows you to send your service logs and monitoring metrics to Elastic in minutes without a single line of code and without building or managing your own data ingestion and delivery infrastructure.

What you’ll learnedit

In this tutorial, you’ll learn how to:

Install AWS integration in Kibana
Create a delivery stream in Amazon Data Firehose
Specify the destination settings for your Firehose stream
Send data to the Firehose delivery stream

Before you beginedit

Create a deployment in AWS regions (including gov cloud) using our hosted Elasticsearch Service on Elastic Cloud. The deployment includes an Elasticsearch cluster for storing and searching your data, and Kibana for visualizing and managing your data.

Step 1: Install AWS integration in Kibanaedit

Install AWS integrations to load index templates, ingest pipelines, and dashboards into Kibana. In Kibana, navigate to Management > Integrations in the sidebar. Find the AWS Integration by browsing the catalog.
Navigate to the Settings tab and click Install AWS assets. Confirm by clicking Install AWS in the popup.
Install Amazon Data Firehose integration assets in Kibana.

Step 2: Create a delivery stream in Amazon Data Firehoseedit

Go to the AWS console and navigate to Amazon Data Firehose.
Click Create Firehose stream and choose the source and destination of your Firehose stream. Unless you are streaming data from Kinesis Data Streams, set source to Direct PUT and destination to Elastic.
Provide a meaningful Firehose stream name that will allow you to identify this delivery stream later.

For advanced use cases, source records can be transformed by invoking a custom Lambda function. When using Elastic integrations, this should not be required.

Step 3: Specify the destination settings for your Firehose streamedit

From the Destination settings panel, specify the following settings:
- Elastic endpoint URL: Enter the Elastic endpoint URL of your Elasticsearch cluster. To find the Elasticsearch endpoint, go to the Elastic Cloud console and select Connection details. Here is an example of how it looks like: https://my-deployment.es.us-east-1.aws.elastic-cloud.com.
- API key: Enter the encoded Elastic API key. To create an API key, go to the Elastic Cloud console, select Connection details and click Create and manage API keys. If you are using an API key with Restrict privileges, make sure to review the Indices privileges to provide at least "auto_configure" & "write" permissions for the indices you will be using with this delivery stream.
- Content encoding: For a better network efficiency, leave content encoding set to GZIP.
- Retry duration: Determines how long Firehose continues retrying the request in the event of an error. A duration of 60-300s should be suitable for most use cases.
- Parameters:
  - es_datastream_name: This parameter is optional and can be used to set which data stream documents will be stored. If this parameter is not specified, data is sent to the logs-awsfirehose-default data stream by default.
  - include_cw_extracted_fields: This parameter is optional and can be set when using a CloudWatch logs subscription filter as the Firehose data source. When set to true, extracted fields generated by the filter pattern in the subscription filter will be collected. Setting this parameter can add many fields into each record and may significantly increase data volume in Elasticsearch. As such, use of this parameter should be carefully considered and used only when the extracted fields are required for specific filtering and/or aggregation.
  - set_es_document_id: This parameter is optional and can be set to allow Elasticsearch to assign each document a random ID or use a calculated unique ID for each document. Default is true. When set to false, a random ID will be used for each document which will help indexing performance.
    
    In the Backup settings panel, it is recommended to configure S3 backup for failed records. It’s then possible to configure workflows to automatically retry failed records, for example by using Elastic Serverless Forwarder.

Step 4: Send data to the Firehose delivery streamedit

You can configure a variety of log sources to send data to Firehose streams directly for example VPC flow logs. Some services don’t support publishing logs directly to Firehose but they do support publishing logs to CloudWatch logs, such as CloudTrail and Lambda. Refer to the AWS documentation for more information.

For example, a typical workflow for sending CloudTrail logs to Firehose would be the following:

Publish CloudTrail logs to a Cloudwatch log group. Refer to the AWS documentation about publishing CloudTrail logs.
Create a subscription filter in the CloudWatch log group to the Firehose stream. Refer to the AWS documentation about using subscription filters.

We also added support for sending CloudWatch monitoring metrics to Elastic using Firehose. For example, you can configure metrics ingestion by creating a metric stream through CloudWatch. You can select an existing Firehose stream by choosing the option Custom setup with Firehose. For more information, refer to the AWS documentation about the custom setup with Firehose.

For more information on Amazon Data Firehose, you can also check the Amazon Data Firehose Integrations documentation.

« Monitor Amazon Web Services (AWS) with Beats Monitor Virtual Private Cloud (VPC) Flow Logs »