Filebeat

The filebeat section specifies a list of prospectors that Filebeat uses to locate and process log files. Each prospector item begins with a dash (-) and specifies prospector-specific configuration options, including the list of paths that are crawled to locate log files.

Here is a sample configuration:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    # Each - is a prospector. Below are the prospector specific configurations
    -
      # Paths that should be crawled and fetched. Glob based paths.
      # For each file found under this path, a harvester is started.
      paths:
        - "/var/log/apache/httpd-*.log"
      # Type to be published in the 'type' field. For Elasticsearch output,
      # the type defines the document type these entries should be stored
      # in. Default: log
      document_type: apache
    -
      paths:
        - /var/log/messages
        - "/var/log/*.log"

Options

paths

A list of glob-based paths that should be crawled and fetched. Filebeat starts a harvester for each file that it finds under the specified paths. You can specify one path per line. Each line begins with a dash (-).

input_type

One of the following input types:

  • log: Reads every line of the log file (default)
  • stdin: Reads the standard in

The value that you specify here is used as the input_type for each event published to Logstash and Elasticsearch.

exclude_lines

A list of regular expressions to match the lines that you want Filebeat to exclude. Filebeat drops any lines that match a regular expression in the list. By default, no lines are dropped.

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by exclude_lines.

The following example configures Filebeat to drop any lines that start with "DBG".

exclude_lines: ["^DBG"]

include_lines

A list of regular expressions to match the lines that you want Filebeat to include. Filebeat exports only the lines that match a regular expression in the list. By default, all lines are exported.

If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by include_lines.

The following example configures Filebeat to export any lines that start with "ERR" or "WARN":

include_lines: ["^ERR", "^WARN"]

If both include_lines and exclude_lines are defined, Filebeat executes include_lines first and then executes exclude_lines. So, for example, to export all Apache log lines except the debugging messages (DBGs), you can use:

 include_lines: ["apache"]
 exclude_lines: ["^DBG"]

exclude_files

A list of regular expressions to match the files that you want Filebeat to ignore. By default no files are excluded.

The following example configures Filebeat to ignore all the files that have a gz extension:

  exclude_files: [".gz$"]

fields

Optional fields that you can specify to add additional information to the output. For example, you might add fields that you can use for filtering log data. Fields can be scalar values, arrays, dictionaries, or any nested combination of these. All scalar values will be interpreted as strings. By default, the fields that you specify here will be grouped under a fields sub-dictionary in the output document. To store the custom fields as top-level fields, set the fields_under_root option to true.

fields:
    level: debug
    review: 1

fields_under_root

If this option is set to true, the custom fields are stored as top-level fields in the output document instead of being grouped under a fields sub-dictionary. If the custom field names conflict with other field names added by Filebeat, the custom fields overwrite the other fields.

ignore_older

If this option is specified, Filebeat ignores any files that were modified before the specified timespan. You can use time strings like 2h (2 hours) and 5m (5 minutes). The default is 24h.

scan_frequency

How often the prospector checks for new files in the paths that are specified for harvesting. For example, if you specify a glob like /var/log/*, the directory is scanned for files using the frequency specified by scan_frequency. Specify 1s to scan the directory as frequently as possible without causing Filebeat to scan too frequently. The default setting is 10s.

document_type

The event type to use for published lines read by harvesters. For Elasticsearch output, the value that you specify here is used to set the type field in the output document. The default value is log.

harvester_buffer_size

The size in bytes of the buffer that each harvester uses when fetching a file. The default is 16384.

max_bytes

The maximum number of bytes that a single log message can have. All bytes after max_bytes are discarded and not sent. This setting is especially useful for multiline log messages, which can get large. The default is 10MB (10485760).

multiline

Options that control how Filebeat deals with log messages that span multiple lines. Multiline messages are common in files that contain Java stack traces.

The following example shows how to configure Filebeat to handle a multiline message where the first line of the message begins with a bracket ([).

multiline:
    pattern: ^\[
    negate: true
    match: after

Filebeat takes all the lines that do not start with [ and combines them with the previous line that does. For example, you could use this configuration to join the following lines of a multiline message into a single event:

[beat-logstash-some-name-832-2015.11.28] IndexNotFoundException[no such index]
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:566)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:133)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:77)
    at org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:75)

You specify the following settings under multiline to control how Filebeat combines the lines in the message:

pattern
Specifies the regular expression pattern to match.
negate
Defines whether the pattern is negated. The default is false.
match

Specifies how Filebeat combines matching lines into an event. The settings are after or before. The behavior of these settings depends on what you specify for negate:

Setting for negate Setting for match Result

false

after

Consecutive lines that match the pattern are appended to the previous line that doesn’t match.

false

before

Consecutive lines that match the pattern are prepended to the next line that doesn’t match.

true

after

Consecutive lines that don’t match the pattern are appended to the previous line that does match.

true

before

Consecutive lines that don’t match the pattern are prepended to the next line that does match.

The after setting is equivalent to previous in Logstash, and before is equivalent to next.

max_lines
The maximum number of lines that can be combined into one event. If the multiline message contains more than max_lines, any additional lines are discarded. The default is 500.
timeout
After the specified timeout, Filebeat sends the multiline event even if no new pattern is found to start a new event. The default is 5s.

Here’s an example configuration that shows the regular expression for a slightly more complex example:

multiline:
    pattern: "^[[:space:]]+(at|...)|^Caused by:"
    negate: false
    match: after

In this example, the pattern matches the following lines:

  • a line that begins with spaces followed by the word at or ...
  • a line that begins with the words Caused by:

You could use this configuration to join the following lines from a Java stack trace into a single event:

Exception in thread "main" java.lang.IllegalStateException: A book has a null property
       at com.example.myproject.Author.getBookIds(Author.java:38)
       at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
Caused by: java.lang.NullPointerException
       at com.example.myproject.Book.getId(Book.java:22)
       at com.example.myproject.Author.getBookIds(Author.java:35)
       ... 1 more

tail_files

If this option is set to true, Filebeat starts reading new files at the end of each file instead of the beginning. When this option is used in combination with log rotation, it’s possible that the first log entries in a new file might be skipped. The default setting is false.

You can use this setting to avoid indexing old log lines when you run Filebeat on a set of log files for the first time. After the first run, we recommend disabling this option, or you risk losing lines during file rotation.

backoff

The backoff options specify how aggressively Filebeat crawls new files for updates. You can use the default values in most cases.

The backoff option defines how long Filebeat waits before checking a file again after EOF is reached. The default is 1s, which means the file is checked every second if new lines were added. This enables near real-time crawling. Every time a new line appears in the file, the backoff value is reset to the initial value. The default is 1s.

max_backoff

The maximum time for Filebeat to wait before checking a file again after EOF is reached. After having backed off multiple times from checking the file, the wait time will never exceed max_backoff regardless of what is specified for backoff_factor. Because it takes a maximum of 10s to read a new line, specifying 10s for max_backoff means that, at the worst, a new line could be added to the log file if Filebeat has backed off multiple times. The default is 10s.

backoff_factor

This option specifies how fast the waiting time is increased. The bigger the backoff factor, the faster the max_backoff value is reached. The backoff factor increments exponentially. The minimum value allowed is 1. If this value is set to 1, the backoff algorithm is disabled, and the backoff value is used for waiting for new lines. The backoff value will be multiplied each time with the backoff_factor until max_backoff is reached. The default is 2.

force_close_files

By default, Filebeat keeps the files that it’s reading open until the timespan specified by ignore_older has elapsed. This behaviour can cause issues when a file is removed. On Windows, the file cannot be fully removed until Filebeat closes the file. In addition no new file with the same name can be created during this time.

You can force Filebeat to close the file as soon as the file name changes by setting the force_close_files option to true. The default is false. Turning on this option can lead to loss of data on rotated files in case not all lines were read from the rotated file.

spool_size

The event count spool threshold. This setting forces a network flush if the number of events in the spooler exceeds the specified value.

filebeat:
  spool_size: 2048

publish_async

If enabled, the publisher pipeline in filebeat operates in async mode preparing a new batch of lines while waiting for ACK. This option can improve load-balancing throughput at the cost of increased memory usage. The default value is false.

idle_timeout

A duration string that specifies how often the spooler is flushed. After the idle_timeout is reached, the spooler is flushed even if the spool_size has not been reached.

filebeat:
  idle_timeout: 5s

registry_file

The name of the registry file. By default, the registry file is put in the current working directory. If the working directory changes for subsequent runs of Filebeat, indexing starts from the beginning again.

filebeat:
  registry_file: .filebeat

config_dir

The full path to the directory that contains additional prospector configuration files. Each configuration file must end with .yml. Each config file must also specify the full Filebeat config hierarchy even though only the prospector part of the file is processed. All global options, such as spool_size, are ignored.

The config_dir option MUST point to a directory other than the directory where the main Filebeat config file resides.

filebeat:
  config_dir: path/to/configs

encoding

The file encoding to use for reading files that contain international characters. See the encoding names recommended by the W3C for use in HTML5.

Here are some sample encodings from W3C recommendation:

  • plain, latin1, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk, hz-gb-2312,
  • euc-kr, euc-jp, iso-2022-jp, shift-jis, and so on

The plain encoding is special, because it does not validate or transform any input.