IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

« AWS CloudWatch input Azure eventhub input »

› › ›

AWS S3 input

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

AWS S3 input

edit

Use the aws-s3 input to retrieve logs from S3 objects that are pointed by messages from specific SQS queues. This input can, for example, be used to receive S3 server access logs to monitor detailed records for the requests that are made to a bucket.

When processing a S3 object which pointed by a SQS message, if half of the set visibility timeout passed and the processing is still ongoing, then the visibility timeout of that SQS message will be reset to make sure the message does not go back to the queue in the middle of the processing. If there are errors happening during the processing of the S3 object, then the process will be stopped and the SQS message will be returned back to the queue.

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.ap-southeast-1.amazonaws.com/1234/test-s3-queue
  credential_profile_name: elastic-beats
  expand_event_list_from_field: Records

The aws-s3 input supports the following configuration options plus the Common options described later.

`api_timeout`

edit

The maximum duration of the AWS API call. If it exceeds the timeout, the AWS API call will be interrupted. The default AWS API call timeout for a message is 120 seconds. The maximum is half of the visibility timeout value.

`buffer_size`

edit

The size in bytes of the buffer that each harvester uses when fetching a file. This only applies to non-JSON logs. The default is 16 KiB.

`content_type`

edit

A standard MIME type describing the format of the object data. This can be set to override the MIME type that was given to the object when it was uploaded. For example: application/json.

`encoding`

edit

The file encoding to use for reading data that contains international characters. This only applies to non-JSON logs. See encoding.

`expand_event_list_from_field`

edit

If the fileset using this input expects to receive multiple messages bundled under a specific field then the config option expand_event_list_from_field value can be assigned the name of the field. This setting will be able to split the messages under the group value into separate events. For example, CloudTrail logs are in JSON format and events are found under the JSON object "Records".

{
    "Records": [
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-14T00:51:00Z",
            "awsRegion": "us-east-1",
            "eventID": "EXAMPLE8-9621-4d00-b913-beca2EXAMPLE",
        },
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-14T00:52:00Z",
            "awsRegion": "us-east-1",
            "eventID": "EXAMPLEc-28be-486c-8928-49ce6EXAMPLE",
        }
    ]
}

Note: When expand_event_list_from_field parameter is given in the config, aws-s3 input will assume the logs are in JSON format and decode them as JSON. Content type will not be checked. If a file has "application/json" content-type, expand_event_list_from_field becomes required to read the JSON file.

`file_selectors`

edit

If the SQS queue will have events that correspond to files that Filebeat shouldn’t process file_selectors can be used to limit the files that are downloaded. This is a list of selectors which are made up of regex and expand_event_list_from_field options. The regex should match the S3 object key in the SQS message, and the optional expand_event_list_from_field is the same as the global setting. If file_selectors is given, then any global expand_event_list_from_field value is ignored in favor of the ones specified in the file_selectors. Regex syntax is the same as the Go language. Files that don’t match one of the regexes won’t be processed. content_type, parsers, include_s3_metadata,max_bytes, buffer_size, and encoding may also be set for each file selector.

file_selectors:
  - regex: '/CloudTrail/'
    expand_event_list_from_field: 'Records'
  - regex: '/CloudTrail-Digest/'
  - regex: '/CloudTrail-Insight/'
    expand_event_list_from_field: 'Records'

`fips_enabled`

edit

Enabling this option changes the service name from s3 to s3-fips for connecting to the correct service endpoint. For example: s3-fips.us-gov-east-1.amazonaws.com.

`include_s3_metadata`

edit

This input can include S3 object metadata in the generated events for use in follow-on processing. You must specify the list of keys to include. By default none are included. If the key exists in the S3 response then it will be included in the event as aws.s3.metadata.<key> where the key name as been normalized to all lowercase.

include_s3_metadata:
  - last-modified
  - x-amz-version-id

`max_bytes`

edit

The maximum number of bytes that a single log message can have. All bytes after max_bytes are discarded and not sent. This setting is especially useful for multiline log messages, which can get large. This only applies to non-JSON logs. The default is 10 MiB.

`max_number_of_messages`

edit

The maximum number of messages to return. Amazon SQS never returns more messages than this value (however, fewer messages might be returned). Valid values: 1 to 10. Default: 5.

`parsers`

edit

This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.

This option expects a list of parsers that non-JSON logs go through.

Available parsers:

multiline

In this example, Filebeat is reading multiline messages that consist of XML that start with the <Event> tag.

filebeat.inputs:
- type: aws-s3
  ...
  parsers:
    - multiline:
        pattern: "^<Event"
        negate:  true
        match:   after

See the available parser settings in detail below.

`multiline`

edit

Options that control how Filebeat deals with log messages that span multiple lines. See Multiline messages for more information about configuring multiline options.

`queue_url`

edit

URL of the AWS SQS queue that messages will be received from. Required.

`visibility_timeout`

edit

The duration that the received messages are hidden from subsequent retrieve requests after being retrieved by a ReceiveMessage request. This value needs to be a lot bigger than Filebeat collection frequency so if it took too long to read the S3 log, this SQS message will not be reprocessed. The default visibility timeout for a message is 300 seconds. The maximum is 12 hours.

`aws credentials`

edit

In order to make AWS API calls, aws-s3 input requires AWS credentials. Please see AWS credentials options for more details.

AWS Permissions

edit

Specific AWS permissions are required for IAM user to access SQS and S3:

s3:GetObject
sqs:ReceiveMessage
sqs:ChangeMessageVisibility
sqs:DeleteMessage

S3 and SQS setup

edit

Enable bucket notification: any new object creation in S3 bucket will also create a notification through SQS. Please see create-sqs-queue-for-notification for more details.

Parallel Processing

edit

Multiple Filebeat instances can read from the same SQS queues at the same time. To horizontally scale processing when there are large amounts of log data flowing into an S3 bucket, you can run multiple Filebeat instances that read from the same SQS queues at the same time. No additional configuration is required.

Using SQS ensures that each message in the queue is processed only once even when multiple Filebeat instances are running in parallel. To prevent Filebeat from receiving and processing the message more than once, set the visibility timeout.

The visibility timeout begins when SQS returns a message to Filebeat. During this time, Filebeat processes and deletes the message. However, if Filebeat fails before deleting the message and your system doesn’t call the DeleteMessage action for that message before the visibility timeout expires, the message becomes visible to other Filebeat instances, and the message is received again. By default, the visibility timeout is set to 5 minutes for aws-s3 input in Filebeat. 5 minutes is sufficient time for Filebeat to read SQS messages and process related s3 log files.

Metrics

edit

This input exposes metrics under the HTTP monitoring endpoint. These metrics are exposed under the /dataset path. They can be used to observe the activity of the input.

Metric	Description
`sqs_messages_received_total`	Number of SQS messages received (not necessarily processed fully).
`sqs_visibility_timeout_extensions_total`	Number of SQS visibility timeout extensions.
`sqs_messages_inflight_gauge`	Number of SQS messages inflight (gauge).
`sqs_messages_returned_total`	Number of SQS message returned to queue (happens on errors implicitly after visibility timeout passes).
`sqs_messages_deleted_total`	Number of SQS messages deleted.
`sqs_message_processing_time`	Histogram of the elapsed SQS processing times in nanoseconds (time of receipt to time of delete/return).
`s3_objects_requested_total`	Number of S3 objects downloaded.
`s3_bytes_processed_total`	Number of S3 bytes processed.
`s3_events_created_total`	Number of events created from processing S3 data.
`s3_objects_inflight_gauge`	Number of S3 objects inflight (gauge).
`s3_object_processing_time`	Histogram of the elapsed S3 object processing times in nanoseconds (start of download to completion of parsing).

Common options

edit

The following configuration options are supported by all inputs.

`enabled`

edit

Use the enabled option to enable and disable inputs. By default, enabled is set to true.

`tags`

edit

A list of tags that Filebeat includes in the tags field of each published event. Tags make it easy to select specific events in Kibana or apply conditional filtering in Logstash. These tags will be appended to the list of tags specified in the general configuration.

Example:

filebeat.inputs:
- type: aws-s3
  . . .
  tags: ["json"]

`fields`

edit

Optional fields that you can specify to add additional information to the output. For example, you might add fields that you can use for filtering log data. Fields can be scalar values, arrays, dictionaries, or any nested combination of these. By default, the fields that you specify here will be grouped under a fields sub-dictionary in the output document. To store the custom fields as top-level fields, set the fields_under_root option to true. If a duplicate field is declared in the general configuration, then its value will be overwritten by the value declared here.

filebeat.inputs:
- type: aws-s3
  . . .
  fields:
    app_id: query_engine_12

`fields_under_root`

edit

If this option is set to true, the custom fields are stored as top-level fields in the output document instead of being grouped under a fields sub-dictionary. If the custom field names conflict with other field names added by Filebeat, then the custom fields overwrite the other fields.

`processors`

edit

A list of processors to apply to the input data.

See Processors for information about specifying processors in your config.

`pipeline`

edit

The Ingest Node pipeline ID to set for the events generated by this input.

The pipeline ID can also be configured in the Elasticsearch output, but this option usually results in simpler configuration files. If the pipeline is configured both in the input and output, the option from the input is used.

`keep_null`

edit

If this option is set to true, fields with null values will be published in the output document. By default, keep_null is set to false.

`index`

edit

If present, this formatted string overrides the index for events from this input (for elasticsearch outputs), or sets the raw_index field of the event’s metadata (for other outputs). This string can only refer to the agent name and version and the event timestamp; for access to dynamic fields, use output.elasticsearch.index or a processor.

Example value: "%{[agent.name]}-myindex-%{+yyyy.MM.dd}" might expand to "filebeat-myindex-2019.11.01".

`publisher_pipeline.disable_host`

edit

By default, all events contain host.name. This option can be set to true to disable the addition of this field to all events. The default value is false.

AWS Credentials Configuration

edit

To configure AWS credentials, either put the credentials into the Filebeat configuration, or use a shared credentials file, as shown in the following examples.

Configuration parameters

edit

access_key_id: first part of access key.
secret_access_key: second part of access key.
session_token: required when using temporary security credentials.
credential_profile_name: profile name in shared credentials file.
shared_credential_file: directory of the shared credentials file.
role_arn: AWS IAM Role to assume.
endpoint: URL of the entry point for an AWS web service. Most AWS services offer a regional endpoint that can be used to make requests. The general syntax of a regional endpoint is protocol://service-code.region-code.endpoint-code. Some services, such as IAM, do not support regions. The endpoints for these services do not include a region. In aws module, endpoint config is to set the endpoint-code part, such as amazonaws.com, amazonaws.com.cn, c2s.ic.gov, sc2s.sgov.gov.

Supported Formats

edit

The examples in this section refer to Metricbeat, but the credential options for authentication with AWS are the same no matter which Beat is being used.

Use access_key_id, secret_access_key, and/or session_token

Users can either put the credentials into the Metricbeat module configuration or use environment variable AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and/or AWS_SESSION_TOKEN instead.

If running on Docker, these environment variables should be added as a part of the docker command. For example, with Metricbeat:

$ docker run -e AWS_ACCESS_KEY_ID=abcd -e AWS_SECRET_ACCESS_KEY=abcd -d --name=metricbeat --user=root --volume="$(pwd)/metricbeat.aws.yml:/usr/share/metricbeat/metricbeat.yml:ro" docker.elastic.co/beats/metricbeat:7.11.1 metricbeat -e -E cloud.auth=elastic:1234 -E cloud.id=test-aws:1234

Sample metricbeat.aws.yml looks like:

metricbeat.modules:
- module: aws
  period: 5m
  access_key_id: ${AWS_ACCESS_KEY_ID}
  secret_access_key: ${AWS_SECRET_ACCESS_KEY}
  session_token: ${AWS_SESSION_TOKEN}
  metricsets:
    - ec2

Environment variables can also be added through a file. For example:

$ cat env.list
AWS_ACCESS_KEY_ID=abcd
AWS_SECRET_ACCESS_KEY=abcd

$ docker run --env-file env.list -d --name=metricbeat --user=root --volume="$(pwd)/metricbeat.aws.yml:/usr/share/metricbeat/metricbeat.yml:ro" docker.elastic.co/beats/metricbeat:7.11.1 metricbeat -e -E cloud.auth=elastic:1234 -E cloud.id=test-aws:1234

Use credential_profile_name and/or shared_credential_file

If access_key_id, secret_access_key and role_arn are all not given, then filebeat will check for credential_profile_name. If you use different credentials for different tools or applications, you can use profiles to configure multiple access keys in the same configuration file. If there is no credential_profile_name given, the default profile will be used.

shared_credential_file is optional to specify the directory of your shared credentials file. If it’s empty, the default directory will be used. In Windows, shared credentials file is at C:\Users\<yourUserName>\.aws\credentials. For Linux, macOS or Unix, the file is located at ~/.aws/credentials. When running as a service, the home path depends on the user that manages the service, so the shared_credential_file parameter can be used to avoid ambiguity. Please see Create Shared Credentials File for more details.

Use role_arn

role_arn is used to specify which AWS IAM role to assume for generating temporary credentials. If role_arn is given, filebeat will check if access keys are given. If not, filebeat will check for credential profile name. If neither is given, default credential profile will be used. Please make sure credentials are given under either a credential profile or access keys.

If running on Docker, the credential file needs to be provided via a volume mount. For example, with Metricbeat:

docker run -d --name=metricbeat --user=root --volume="$(pwd)/metricbeat.aws.yml:/usr/share/metricbeat/metricbeat.yml:ro" --volume="/Users/foo/.aws/credentials:/usr/share/metricbeat/credentials:ro" docker.elastic.co/beats/metricbeat:7.11.1 metricbeat -e -E cloud.auth=elastic:1234 -E cloud.id=test-aws:1234

Sample metricbeat.aws.yml looks like:

metricbeat.modules:
- module: aws
  period: 5m
  credential_profile_name: elastic-beats
  shared_credential_file: /usr/share/metricbeat/credentials
  metricsets:
    - ec2

Use AWS credentials in Filebeat configuration

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123/test-queue
  access_key_id: '<access_key_id>'
  secret_access_key: '<secret_access_key>'
  session_token: '<session_token>'

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123/test-queue
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'

Use IAM role ARN

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123/test-queue
  role_arn: arn:aws:iam::123456789012:role/test-mb

Use shared AWS credentials file

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123/test-queue
  credential_profile_name: test-fb

AWS Credentials Types

edit

There are two different types of AWS credentials can be used: access keys and temporary security credentials.

Access keys

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the two parts of access keys. They are long-term credentials for an IAM user or the AWS account root user. Please see AWS Access Keys and Secret Access Keys for more details.

IAM role ARN

An IAM role is an IAM identity that you can create in your account that has specific permissions that determine what the identity can and cannot do in AWS. A role does not have standard long-term credentials such as a password or access keys associated with it. Instead, when you assume a role, it provides you with temporary security credentials for your role session. IAM role Amazon Resource Name (ARN) can be used to specify which AWS IAM role to assume to generate temporary credentials. Please see AssumeRole API documentation for more details.

Here are the steps to set up IAM role using AWS CLI for Metricbeat. Please replace 123456789012 with your own account ID.

Step 1. Create example-policy.json file to include all permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "sqs:ReceiveMessage"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "sqs:ChangeMessageVisibility",
            "Resource": "arn:aws:sqs:us-east-1:123456789012:test-fb-ks"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "sqs:DeleteMessage",
            "Resource": "arn:aws:sqs:us-east-1:123456789012:test-fb-ks"
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole",
                "sqs:ListQueues",
                "tag:GetResources",
                "ec2:DescribeInstances",
                "cloudwatch:GetMetricData",
                "ec2:DescribeRegions",
                "iam:ListAccountAliases",
                "sts:GetCallerIdentity",
                "cloudwatch:ListMetrics"
            ],
            "Resource": "*"
        }
    ]
}

Step 2. Create IAM policy using the aws iam create-policy command:

$ aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json

Step 3. Create the JSON file example-role-trust-policy.json that defines the trust relationship of the IAM role

{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Principal": { "AWS": "arn:aws:iam::123456789012:root" },
        "Action": "sts:AssumeRole"
    }
}

Step 4. Create the IAM role and attach the policy:

$ aws iam create-role --role-name example-role --assume-role-policy-document file://example-role-trust-policy.json
$ aws iam attach-role-policy --role-name example-role --policy-arn "arn:aws:iam::123456789012:policy/example-policy"

After these steps are done, IAM role ARN can be used for authentication in Metricbeat aws module.

Temporary security credentials

Temporary security credentials has a limited lifetime and consists of an access key ID, a secret access key, and a security token which typically returned from GetSessionToken. MFA-enabled IAM users would need to submit an MFA code while calling GetSessionToken. default_region identifies the AWS Region whose servers you want to send your first API request to by default. This is typically the Region closest to you, but it can be any Region. Please see Temporary Security Credentials for more details. sts get-session-token AWS CLI can be used to generate temporary credentials. For example. with MFA-enabled:

aws> sts get-session-token --serial-number arn:aws:iam::1234:mfa/your-email@example.com --token-code 456789 --duration-seconds 129600

Because temporary security credentials are short term, after they expire, the user needs to generate new ones and modify the aws.yml config file with the new credentials. Unless live reloading feature is enabled for Metricbeat, the user needs to manually restart Metricbeat after updating the config file in order to continue collecting Cloudwatch metrics. This will cause data loss if the config file is not updated with new credentials before the old ones expire. For Metricbeat, we recommend users to use access keys in config file to enable aws module making AWS api calls without have to generate new temporary credentials and update the config frequently.

IAM policy is an entity that defines permissions to an object within your AWS environment. Specific permissions needs to be added into the IAM user’s policy to authorize Metricbeat to collect AWS monitoring metrics. Please see documentation under each metricset for required permissions.

« AWS CloudWatch input Azure eventhub input »