Common Expression Language: Improve data collection in Elastic with CEL input

Agent Builder is available now GA. Get started with an Elastic Cloud Trial, and check out the documentation for Agent Builder here.

Elastic Agent integrations allow users to ingest data into Elasticsearch from a wide range of sources. They combine collection logic, ingest pipelines, dashboards, and other artifacts into a package that can be installed and managed from the Kibana web interface.

Integrations configure Filebeat inputs to do the data collection. To collect data from HTTP APIs, we’ve often used the HTTP JSON input. However, even basic listing APIs can differ greatly in the details, and the HTTP JSON input's model of YAML-configured transformations can make it awkward and sometimes impossible to express the required collection logic.

The Common Expression Language (CEL) input was introduced to allow more flexible interaction with HTTP APIs. CEL is a language designed to be embedded in applications that require a fast, safe, and extensible way to express conditions and data transformations. The CEL input lets an integration builder write one expression that can read settings, keep track of its own state, make requests, process responses, and ultimately return events ready to ingest.

In this article, we’ll look at how CEL differs from other programming languages, how we’ve extended it for the CEL input, and the flexibility and power that gives you to express your data collection logic.

CEL and how it works in the input

CEL is an expression language. It has no statements. When you write CEL, you don’t tell it what to do by writing statements, you tell it what value to produce by writing an expression. Every CEL expression produces a value, and smaller expressions can be combined into a larger expression to produce a result according to more complex rules. Later, we’ll see how to use expressions for things that may be written with statements in other languages.

CEL is intentionally a non-Turing complete language. It doesn’t allow unbounded loops. Later, we’ll see how you can process lists and maps using macros, but by avoiding unbounded loops, the language guarantees predictable and limited execution time for individual expressions.

The CEL input is configured with a CEL program (an expression) and some initial state. The state will be provided as input to the program. The program is evaluated to produce an output state. If the output state includes a list of events, those will be removed and published. The rest of the output state will be used as the input for the next evaluation. If the output state includes one or more events and the flag want_more: true, the next evaluation will be performed immediately; otherwise, it will sleep for the rest of the configured interval time before continuing. Here’s a simplified diagram of the input’s control flow:

Common Expression Language (CEL) input control flow

The output of each evaluation will be passed forward as the input to the next evaluation, for as long as the input runs. Output data under the key "cursor" will be persisted to disk and reloaded after the input is restarted, but the rest of the state will not be preserved across restarts.

The CEL language itself has limited functionality and avoids side effects, but it is extensible. The cel-go implementation adds some functionality, such as optional syntax and types. The Mito library builds on cel-go and adds more functionality, including the ability to make HTTP requests. The CEL input uses Mito’s version of CEL.

Working with Mito

To build or debug an integration using the CEL input, the most important thing to understand is what output state your CEL program will produce for a given input state. During development, it can be cumbersome to have your CEL program run by the input, surrounded by the full Elastic stack. One way to achieve a faster feedback loop is to use Mito’s command-line tool, which will let you run a CEL program directly and see the output it produces for a given input.

Mito is written in Go and can be installed as follows:

When you run a CEL program with Mito, you typically give it two files: a JSON file with the initial input state, and another file with the source code of your CEL program:

For easier copy and pasting, the examples in this article are written as single commands that have the shell create temporary files on the fly, by wrapping the content of each file in <(echo '...content...'). In your own development, working with actual files will be easier.

Fetching issues data from GitHub

The following example includes a full CEL program that will fetch data about issues from the GitHub API. Its initial input state has a URL for the API endpoint, and some information about how it should handle pagination. The CEL program uses the data in the input state to generate a request. It will decode the response, produce events from it, and return them as part of its output state.

Its first evaluation produces the following output:

The events will be removed, and when run in the CEL input, they’ll be published for ingestion. The rest of the output will be provided to the next CEL program evaluation as its input state.

To understand how that CEL program works, we’ll look at some smaller CEL examples and discuss more details of how the CEL input operates.

CEL basics

In the CEL language, there are no statements; there are only expressions. Every successful CEL expression evaluates to a final value. Here’s one of the smallest CEL expressions you can write, along with its output:

Many simple expressions are intuitive. Mathematical operations are only supported on values of the same type (for example, int with int), so convert types as you need (here from int to double):

There are no variables in the CEL language, but an expression can be given a name and used in a larger expression with the help of Mito’s as macro. In this example, the expression (1 + 1) evaluates to the value 2, and .as(n, ...) gives that value the name n for use in the expression "one plus one is "+string(n):

It's also possible to accumulate information in a map and use it later in the expression, as demonstrated here using with:

Look at that example again. Notice that the nested part, ({ "data": data, "size": size(data), }), gives us the shape of the final value. It’s a map with the keys "data" and "size". The values for those keys depend on data, which is defined by the outer part of the expression. Reading CEL expressions from the inside out can help to quickly see what they’ll return.

CEL has no control flow statements, like if, but conditional branching can be done with the ternary operator:

Unbounded loops and recursion are not supported, as CEL is not a Turing complete language. That makes execution time predictable and proportional to the size of the input data and the expression complexity.

Although unbounded loops are not possible in individual CEL expressions, you can process lists and maps using macros like map:

In this section, we’ve covered:

Strings, numbers, lists, and maps.
String concatenation.
Mathematical operations.
Type casting.
Conditionals.
Naming sub-expressions.
Processing collections.

Next, we’ll look at how to make HTTP requests.

Requests

Mito extends CEL with the ability to make HTTP requests:

Requests can be explicitly constructed before they’re executed. That makes it possible to use different HTTP methods and to add headers and a body.

In this example, we build a URL with the help of format_query, add a header to the request, and parse the response body with decode_json. When given the -log_requests option, Mito will log detailed information in JSON format about each request and response.

Managing state and evaluations

Now that we’ve covered how to make requests and the CEL basics required to produce our desired output state, let’s take a closer look at what we should put into the output state and how that lets us direct later processing.

An integration’s CEL program needs to make sure its output state is suitable for use as the input of the next evaluation. Configuration sets the initial state, and that should be repeated in the output with any appropriate changes. An easy way to do that is to use state.with({ ... }), to repeat the state map with some overrides. A common pattern for small programs is to wrap the whole program in state.with(), so that state propagation doesn’t have to be repeated in each branch that generates output data (for example, success, errors).

When there are state values that are initialized by an evaluation rather than hard-coded in the initial input state, the program will need to check for an existing value before setting the initial one. That’s something that the support for optional syntax and types can help with. By using a question mark before the field name in a map key, the access becomes optional: It may or may not resolve to a value, but further optional accesses are possible and it’s easy to supply a default if no value is present:

In that example, the counter value read from state is cast to int because all numbers are serialized in the state as floating point numbers, in keeping with conventions established by JSON and JavaScript’s Number type. It should also be noted that "want_more": true is honored here by Mito, but when run in the CEL input, the evaluation would only be repeated if the output also contains events.

It’s a requirement of CEL programs run by the CEL input that they return an "events" key in their output map. Its value can be a list of event maps, an empty list, or a single event map. The single event case is usually used for errors. The event will be published by the input, but its value will also be logged, and if it sets an error.message value, that will be used to update the integration’s Fleet health status. If your program produces a single non-error event, it’s best to wrap it in a list.

Take another look at the output of our GitHub issues program from earlier:

The program effectively managed its state, by:

Repeating initial state values in url, per_page, and max_pages.
Adding state that should be persisted across restarts in cursor.page.
Returning events ready to publish in the events list.
Requesting immediate re-evaluation with want_more: true.

Now that you understand optional access and state management, as well as CEL basics and HTTP requests, the full GitHub issues program should be readable. Try running it with Mito and experimenting with some changes.

Review and resources

In this article, we looked at what the CEL language is and how it has been extended in the Mito library for use in the CEL input. We saw the flexibility of CEL in an example program that fetches issues information from the GitHub API, and went through all the details necessary to understand that program, covering access to settings in the initial state, interaction with HTTP APIs, returning events to be ingested, and managing the state for later program executions.

To learn more and build integrations using the CEL input, there are a number of resources worth exploring:

And perhaps the most valuable resource for building integrations with the CEL input is the CEL code of existing Elastic integrations, which can be found on GitHub:

cel.yml.hbs files in the Elastic integrations repository - GitHub

Report an issue