Common Expression Language inputedit
This functionality is in technical preview and may be changed or removed in a future release. Elastic will apply best effort to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
Use the cel
input to read messages from a file path or HTTP API with a variety of payloads using the Common Expression Language (CEL) and the mito CEL extension libraries.
CEL is a non-Turing complete language that can perform evaluation of expression in inputs, which can incled with the mito extension library file and API endpoints.
The cel
input periodically runs a CEL program that is given an execution environment that may be configured by the user, and publishes the set of events that result from the program evaluation.
Optionally the CEL program may return cursor states that will be provided to the next execution of the CEL program.
The cursor states may be used to control the behaviour of the program.
This input supports:
-
Auth
- Basic
- OAuth2
- Retrieval at a configurable interval
- Pagination
- Retries
- Rate limiting
- Proxying
Example configurations:
filebeat.inputs: # Fetch your public IP every minute. - type: cel interval: 1m resource.url: https://api.ipify.org/?format=json program: | bytes(get(state.url).Body).as(body, { "events": [body.decode_json()] })
filebeat.inputs: - type: cel resource.url: http://localhost:9200/_search state: scroll: 5m program: | ( !has(state.cursor) || !has(state.cursor.scroll_id) ? post(state.url+"?scroll=5m", "", "") : post( state.url+"/scroll?"+{"scroll_id": [state.cursor.scroll_id]}.format_query(), "application/json", {"scroll": state.scroll}.encode_json() ) ).as(resp, bytes(resp.Body).decode_json().as(body, { "events": body.hits.hits, "cursor": {"scroll_id": body._scroll_id}, }))
Executionedit
The execution environment provided for the input includes includes the function, macros and global variables provided by the mito and ext.Strings libraries.
A single JSON object is provided as an input accessible through a state
variable.
state
contains a string url
field and may contain arbitrary other fields configured via the input’s state
configuration.
If the CEL program saves cursor states between executions of the program, the configured state.cursor
value will be replaced by the saved cursor prior to execution.
On start the state
is will be something like this:
{ "url": <resource address>, "cursor": { ... }, ... }
The state.url
field will be present and may be an HTTP end-point or a file path.
It is the responsibility of the CEL program to handle removing the scheme from a file URL if it is present.
The state.url
field may be mutated during execution of the program, but the mutated state will not be persisted between restarts The state.url
field must be present in the returned value to ensure that it is available in the next evaluation unless the program has the resource address hard-coded in or it is available from the cursor.
Additional fields may be present at the root of the object and if the program tolerates it, the cursor value may be absent. Only the cursor is persisted over restarts, but all fields in state are retained between iterations of the processing loop except for the produced events array, see below.
If the cursor is present the program should perform and process requests based on its value. If cursor is not present the program must have alternative logic to determine what requests to make.
After completion of a program’s execution it should return a single object with a structure looking like this:
{ "events": [ {...}, ... ], "cursor": [ {...}, ... ], "url": <resource address>, "status_code": <HTTP request status code if a network request>, "header": <HTTP response headers if a network request>, "rate_limit": <HTTP rate limit map if required by API>, "want_more": false }
The |
|
If |
|
If |
|
The evaluation is repeated with the new state, after removing the events field, if the "want_more" field is present and true, and a non-zero events array is returned. |
The status_code
, header
and rate_limit
values may be omitted if the program is not interacting with an HTTP API end-point and so will not be needed to contribute to program control.
Debug state loggingedit
The CEL input will log the complete state after evaluation when logging at the DEBUG level.
This will include any sensitive or secret information kept in the state
object, and so DEBUG level logging should not be used in production when sensitive information is retained in the state
object.
CEL extension librariesedit
As noted above the cel
input provides function, macro and global variables to extend the language.
-
File — the file extension is initialized with MIME handlers for "application/gzip", "application/x-ndjson" and "application/zip".
-
Limit — the rate limit extension is initialized with Okta (as "okta") and the Draft Rate Limit (as "draft") policies.
-
MIME — the MIME extension is initialized with MIME handlers for "application/gzip", "application/x-ndjson" and "application/zip".
-
Regexp — the regular expression extension is initialized with the patterns specified in the user input configuration via the
regexp
field.
In addition to the extensions provided in the packages listed above, a global variable useragent
is also provided which gives the user CEL program access to the filebeat user-agent string.
Additionally, it supports authentication via Basic auth, HTTP Headers or oauth2.
Example configurations with authentication:
filebeat.inputs: - type: cel resource.url: http://localhost request.transforms: - set: target: header.Authorization value: 'Basic aGVsbG86d29ybGQ='
filebeat.inputs: - type: cel auth.oauth2: client.id: 12345678901234567890abcdef client.secret: abcdef12345678901234567890 token_url: http://localhost/oauth2/token resource.url: http://localhost
filebeat.inputs: - type: cel auth.oauth2: client.id: 12345678901234567890abcdef client.secret: abcdef12345678901234567890 token_url: http://localhost/oauth2/token user: user@domain.tld password: P@$$W0₹D resource.url: http://localhost
Input stateedit
The cel
input keeps a runtime state between requests. This state can be accessed by the CEL program and may contain arbitrary objects.
The state must contain a url
string and may contain any object the user wishes to store in it.
All objects are stored at runtime, except cursor
, which has values that are persisted between restarts.
Configuration optionsedit
The cel
input supports the following configuration options plus the
Common options described later.
interval
edit
Duration between repeated requests. It may make additional pagination requests in response to the initial request if pagination is enabled. Default: 60s
.
program
edit
The CEL program that is executed each polling period. This field is required.
state
edit
state
is an optional object that is passed to the CEL program on the first execution. It is available to the executing program as the state
variable. It is made available to subsequent executions of the program during the life of input as the returned value of the previous execution, but with the state.events
field removed. Except for the state.cursor
field, state
does not persist over restarts.
state.cursor
edit
The cursor is an object available as state.cursor
where arbitrary values may be stored. Cursor state is kept between input restarts and updated after each event of a request has been published. When a cursor is used the CEL program must either create a cursor state for each event that is returned by the program, or a single cursor that reflect the cursor for completion of the full set of events.
filebeat.inputs: # Fetch your public IP every minute and note when the last request was made. - type: cel interval: 1m resource.url: https://api.ipify.org/?format=json program: | bytes(get(state.url).Body).as(body, { "events": [body.decode_json().with({ "last_requested_at": has(state.cursor) && has(state.cursor.last_requested_at) ? state.cursor.last_requested_at : now })], "cursor": {"last_requested_at": now} })
regexp
edit
A set of named regular expressions that may be used during a CEL program’s execution using the regexp
extension library. The syntax used for the regular expressions is RE2.
filebeat.inputs: - type: cel # Define two regular expressions, 'products' and 'solutions' for use during CEL execution. regexp: products: '(?i)(Elasticsearch|Beats|Logstash|Kibana)' solutions: '(?i)(Search|Observability|Security)'
auth.basic.enabled
edit
When set to false
, disables the basic auth configuration. Default: true
.
Basic auth settings are disabled if either enabled
is set to false
or
the auth.basic
section is missing.
auth.basic.user
edit
The user to authenticate with.
auth.basic.password
edit
The password to use.
auth.oauth2.enabled
edit
When set to false
, disables the oauth2 configuration. Default: true
.
OAuth2 settings are disabled if either enabled
is set to false
or
the auth.oauth2
section is missing.
auth.oauth2.provider
edit
Used to configure supported oauth2 providers.
Each supported provider will require specific settings. It is not set by default.
Supported providers are: azure
, google
.
auth.oauth2.client.id
edit
The client ID used as part of the authentication flow. It is always required
except if using google
as provider. Required for providers: default
, azure
.
auth.oauth2.client.secret
edit
The client secret used as part of the authentication flow. It is always required
except if using google
as provider. Required for providers: default
, azure
.
auth.oauth2.user
edit
The user used as part of the authentication flow. It is required for authentication
- grant type password. It is only available for provider default
.
auth.oauth2.password
edit
The password used as part of the authentication flow. It is required for authentication
- grant type password. It is only available for provider default
.
user and password are required for grant_type password. If user and
password is not used then it will automatically use the token_url
and
client credential
method.
auth.oauth2.scopes
edit
A list of scopes that will be requested during the oauth2 flow. It is optional for all providers.
auth.oauth2.token_url
edit
The endpoint that will be used to generate the tokens during the oauth2 flow. It is required if no provider is specified.
For azure
provider either token_url
or azure.tenant_id
is required.
auth.oauth2.endpoint_params
edit
Set of values that will be sent on each request to the token_url
. Each param key can have multiple values.
Can be set for all providers except google
.
- type: cel auth.oauth2: endpoint_params: Param1: - ValueA - ValueB Param2: - Value
auth.oauth2.azure.tenant_id
edit
Used for authentication when using azure
provider.
Since it is used in the process to generate the token_url
, it can’t be used in
combination with it. It is not required.
For information about where to find it, you can refer to https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal.
auth.oauth2.azure.resource
edit
The accessed WebAPI resource when using azure
provider.
It is not required.
auth.oauth2.google.credentials_file
edit
The credentials file for Google.
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.
auth.oauth2.google.credentials_json
edit
Your credentials information as raw JSON.
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.
auth.oauth2.google.jwt_file
edit
The JWT Account Key file for Google.
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.
auth.oauth2.google.jwt_json
edit
The JWT Account Key file as raw JSON.
Only one of the credentials settings can be set at once. If none is provided, loading default credentials from the environment will be attempted via ADC. For more information about how to provide Google credentials, please refer to https://cloud.google.com/docs/authentication.
auth.oauth2.google.delegated_account
edit
Email of the delegated account used to create the credentials (usually an admin). Used in combination
with auth.oauth2.google.jwt_file
or auth.oauth2.google.jwt_json
.
resource.url
edit
The URL of the HTTP API. Required.
resource.timeout
edit
Duration before declaring that the HTTP client connection has timed out. Valid time units are ns
, us
, ms
, s
, m
, h
. Default: 30s
.
resource.ssl
edit
This specifies SSL/TLS configuration. If the ssl section is missing, the host’s CAs are used for HTTPS connections. See SSL for more information.
resource.keep_alive.disable
edit
This specifies whether to disable keep-alives for HTTP end-points. Default: true
.
resource.keep_alive.max_idle_connections
edit
The maximum number of idle connections across all hosts. Zero means no limit. Default: 0
.
resource.keep_alive.max_idle_connections_per_host
edit
The maximum idle connections to keep per-host. If zero, defaults to two. Default: 0
.
resource.keep_alive.idle_connection_timeout
edit
The maximum amount of time an idle connection will remain idle before closing itself. Valid time units are ns
, us
, ms
, s
, m
, h
. Zero means no limit. Default: 0s
.
resource.retry.max_attempts
edit
The maximum number of retries for the HTTP client. Default: 5
.
resource.retry.wait_min
edit
The minimum time to wait before a retry is attempted. Default: 1s
.
resource.retry.wait_max
edit
The maximum time to wait before a retry is attempted. Default: 60s
.
resource.redirect.forward_headers
edit
When set to true
request headers are forwarded in case of a redirect. Default: false
.
resource.redirect.headers_ban_list
edit
When redirect.forward_headers
is set to true
, all headers except the ones defined in this list will be forwarded. Default: []
.
resource.redirect.max_redirects
edit
The maximum number of redirects to follow for a request. Default: 10
.
resource.rate_limit.limit
edit
The value of the response that specifies the maximum overall resource request rate.
resource.rate_limit.burst
edit
The maximum burst size. Burst is the maximum number of resource requests that can be made above the overall rate limit.
resource.tracer.filename
edit
It is possible to log HTTP requests and responses in a CEL program to a local file-system for debugging configurations.
This option is enabled by setting the resource.tracer.filename
value. Additional options are available to
tune log rotation behavior.
Enabling this option compromises security and should only be used for debugging.
resource.tracer.maxsize
edit
This value sets the maximum size, in megabytes, the log file will reach before it is rotated. By default logs are allowed to reach 1MB before rotation.
resource.tracer.maxage
edit
This specifies the number days to retain rotated log files. If it is not set, log files are retained indefinitely.
resource.tracer.maxbackups
edit
The number of old logs to retain. If it is not set all old logs are retained subject to the resource.tracer.maxage
setting.
resource.tracer.localtime
edit
Whether to use the host’s local time rather that UTC for timestamping rotated log file names.
resource.tracer.compress
edit
This determines whether rotated logs should be gzip compressed.
Common optionsedit
The following configuration options are supported by all inputs.
enabled
edit
Use the enabled
option to enable and disable inputs. By default, enabled is
set to true.
tags
edit
A list of tags that Filebeat includes in the tags
field of each published
event. Tags make it easy to select specific events in Kibana or apply
conditional filtering in Logstash. These tags will be appended to the list of
tags specified in the general configuration.
Example:
filebeat.inputs: - type: cel . . . tags: ["json"]
fields
edit
Optional fields that you can specify to add additional information to the
output. For example, you might add fields that you can use for filtering log
data. Fields can be scalar values, arrays, dictionaries, or any nested
combination of these. By default, the fields that you specify here will be
grouped under a fields
sub-dictionary in the output document. To store the
custom fields as top-level fields, set the fields_under_root
option to true.
If a duplicate field is declared in the general configuration, then its value
will be overwritten by the value declared here.
filebeat.inputs: - type: cel . . . fields: app_id: query_engine_12
fields_under_root
edit
If this option is set to true, the custom
fields are stored as top-level fields in
the output document instead of being grouped under a fields
sub-dictionary. If
the custom field names conflict with other field names added by Filebeat,
then the custom fields overwrite the other fields.
processors
edit
A list of processors to apply to the input data.
See Processors for information about specifying processors in your config.
pipeline
edit
The ingest pipeline ID to set for the events generated by this input.
The pipeline ID can also be configured in the Elasticsearch output, but this option usually results in simpler configuration files. If the pipeline is configured both in the input and output, the option from the input is used.
keep_null
edit
If this option is set to true, fields with null
values will be published in
the output document. By default, keep_null
is set to false
.
index
edit
If present, this formatted string overrides the index for events from this input
(for elasticsearch outputs), or sets the raw_index
field of the event’s
metadata (for other outputs). This string can only refer to the agent name and
version and the event timestamp; for access to dynamic fields, use
output.elasticsearch.index
or a processor.
Example value: "%{[agent.name]}-myindex-%{+yyyy.MM.dd}"
might
expand to "filebeat-myindex-2019.11.01"
.
publisher_pipeline.disable_host
edit
By default, all events contain host.name
. This option can be set to true
to
disable the addition of this field to all events. The default value is false
.