Grok processor

Serverless Technical Preview

The Grok processor parses unstructured log messages and extracts fields from them. It uses a set of predefined patterns to match the log messages and extract the fields. The Grok processor is very powerful and can parse a wide variety of log formats.

You can provide multiple patterns to the Grok processor. The Grok processor will try to match the log message against each pattern in the order they are provided. If a pattern matches, the fields will be extracted and the remaining patterns will not be used. If a pattern does not match, the Grok processor will try the next pattern. If no patterns match, the Grok processor will fail and you can troubleshoot the issue. Refer to generate patterns for more information.

Start with the most common patterns first and then add more specific patterns later. This reduces the number of runs the Grok processor has to do and improves the performance of the pipeline.

This functionality uses the Elasticsearch Grok pipeline processor. Refer to Grok processor in the Elasticsearch docs for more information.

The Grok processor uses a set of predefined patterns to match the log messages and extract the fields. You can also define your own pattern definitions by expanding the Optional fields section. This will allow you to define your own patterns and use them in the Grok processor. The patterns are defined in the following format:

		{
  "MY_DATE": "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}"
}

	

Where MY_DATE is the name of the pattern. The above pattern can then be used in the processor

%{MY_DATE:date}

Generate Patterns

Requires an LLM Connector to be configured. Instead of writing the Grok patterns by hand, you can use the Generate Patterns button to generate the patterns for you.

generated patterns

Patterns can be accepted by clicking the plus icon next to the pattern. This will add the pattern to the list of patterns to be used in the Grok processor.

How does the pattern generation work?

Under the hood, the 100 samples on the right hand side are grouped into categories of similar messages. For each category, a Grok pattern is generated by sending a few samples to the LLM. Matching patterns are then shown in the UI.

Note

This can incur additional costs, depending on the LLM connector you are using. Typically a single iteration uses between 1000 and 5000 tokens, depending on the number of identified categories and the length of the messages.