Luca Wintergerst

Streams Processing: Stop Fighting with Grok. Parse Your Logs in Streams.

Learn how Streams Processing works under the hood and how to use it to build, test, and deploy parsing logic on live data quickly.

Streams Processing: Stop Fighting with Grok. Parse Your Logs in Streams.

With Streams, Elastic's new AI capability in 9.2, we make parsing your logs so simple, it's no longer a concern. In general your logs are messy, lots of fields, some understood, some unknown. You have to constantly keep up with the semantics and pattern match to properly parse them. In some cases, even fields you know have different values or semantics. For instance,

timestamp
is the ingest time, not the event time. Or you can't even filter by
log.level
or
user.id
because they're buried inside the
message
field. As a result, your dashboards are flat and not useful.

Fixing this used to mean leaving Kibana, learning Grok syntax, manually editing ingest pipeline JSON or a complicated Logstash config, and hoping you didn't break parsing for everything else.

We built Streams to fix this, and much more. It's your one place for data processing, built right into Kibana, that lets you build, test, and deploy parsing logic on live data in seconds. It turns a high-risk backend task into a fast, predictable, interactive UI workflow. You can use AI to generate automated GROK rules from a sample of logs, or build them easily with the UI. Let's walk through an example

A Quick Walkthrough

Let's fix a common "unstructured" log right now.

  1. Start in Discover. You find a log that isn't structured. The
    @timestamp
    is wrong, and fields like
    log.level
    aren't being extracted, so your histograms are just a single-color bar.

  1. Inspect the log. Open the document flyout (the "Inspect a single log event" view). You'll see a button: "Parse content in Streams" (or "Edit processing in Streams"). Click it.

  1. Go to Processing. This takes you directly to the Streams processing tab, pre-loaded with sample documents from that data stream. Click "Create your first step."

  1. Generate a Pattern. The processor defaults to Grok. You don't have to write any. Just click the "Generate Pattern" button. Streams analyzes 100 sample documents from your stream and suggests a Grok pattern for you. By default, this uses the Elastic Managed LLM, but you can configure your own.

  1. Accept and Simulate. Click "Accept." Instantly, the UI runs a simulation across all 100 sample documents. You can make changes to the pattern or adjust field names, and the simulation re-runs with every keystroke.

When you're happy, you save it. Your new logs will now be parsed correctly.

Powerful Features for Messy, Real-World Logs

That's the simple case. But real-world data is rarely that clean. Here are the features built to handle the complexity.

The Interactive Grok UI

When you use the Grok processor, the UI gives you a visual indication of what your pattern is extracting. You can see which parts of the

message
field are being mapped to which new field names. This immediate feedback means you're not just guessing. Autocompletion of GROK patterns and instant pattern validation are also part of it.

The Diff Viewer

How do you know what exactly changed? Expand any row in the simulation table. You'll get a diff view showing precisely which fields were added, removed, or modified for that specific document. No more guesswork.

End to End Simulation and Detecting Failures

This is the most critical part. Streams doesn't just simulate the processor; it simulates the entire indexing process. If you try to map a non-timestamp string (like the

message
field) directly to the
@timestamp
field, the simulation will show a failure. It detects the mapping conflict before you save it and before it can create a data-mapping conflict in your cluster. This safety net is what lets you move fast.

Conditional Processing

What if one data stream contains a large variety of logs? You can't use one Grok pattern for all.

Streams has conditional processing built for this. The UI lets you build "if-then" logic. The UI shows you exactly what percentage of your sample documents are skipped or processed by your conditions. Right now, the UI supports up to 3 levels of nesting, and we plan to add a YAML mode in the future for more complex logic.

Changing Your Test Data (Document Samples)

A random 100-document sample isn't always helpful, especially in a massive, mixed stream from Kubernetes or a central message broker.

You can change the document sample to test your changes on a more specific set of logs. You can either provide documents manually (copy-paste) or, more powerfully, specify a KQL query to fetch 100 specific documents. For example:

service.name : "data_processing"
, to fetch 100 additional sample documents to be used in the simulation. Now you can build and test a processor on the exact logs you care about.

How Processing Works Under the Hood

There’s no magic. In simple terms, it's a UI that makes our existing best practices more accessible. As of version 9.2, Streams runs exclusively on Elasticsearch ingest pipelines. (We have plans to offer more than that, stay tuned)

When you save your changes, Streams appends processing steps by:

  1. Locating the most specific
    @custom
    ingest pipeline for your data stream.
  2. Adding a single
    pipeline
    processor to it.
  3. This processor calls a new, dedicated pipeline named
    <stream-name>@stream.processing
    , which contains the Grok, conditional, and other logic you built in the UI.

You can even see this for yourself by going to the Advanced tab in your Stream and clicking the pipeline name.

Processing in OTel, Elastic Agent, Logstash, or Streams? What to Use?

This is a fair question. You have lots of ways to parse data.

  • Best: Structured logging at the Source. If you control the app writing the logs, make it log JSON in the right format of your choice. This will always stay the best way to do logging, but it's not always possible.
  • Good, but not all the time: Elastic Agent + Integrations: If there is an existing integration for collecting and parsing your data, Streams won't do it any better. Use it!
  • Good for tech savvy users: OTel at the Edge. Use OTel (with OTTL) to set yourself up for the future.
  • The easy Catch-All: In Streams. Especially when using an Integration that primarily just ships the data into Elastic, Streams can add a lot of value. The Kubernetes Logs integration is a good example of this where an Integration is used, but most logs aren't parsed automatically as they may be from a wide variety of pods.

Think of Streams as your universal "catch-all" for everything that arrives unstructured. It's perfect for data from sources you don't control, for legacy systems, or for when you just need to fix a parsing error right now without a full application redeploy.

A quick note on schemas: Streams can handle both ECS (Elastic Common Schema) and OTel (OpenTelemetry) data. By default, it assumes your target schema is ECS. However, Streams will automatically detect and adapt to the OTel schema if your Stream's name contains the word “otel”, or if you're using the special Logs Stream (currently in tech preview). You get the same visual parsing workflow regardless of the schema.

All processing changes can also be made using a Kibana API. Note that the API is still in tech preview while we mature some of the functionality.

Summary

Parsing logs shouldn't be a tedious, high-stakes, backend-only task. Streams moves the entire workflow from a complex, error-prone approach to an interactive UI right where you already are. You can now build, test, and deploy parsing logic with instant, safe feedback. This means you can stop fighting your logs and finally start using them. The next time you see a messy log, don't ignore it. Click "Parse in Streams" and fix it in 60 seconds.

Check out more log analytics articles in Elasitc Observability Labs.

Try out Elastic. Sign up for a trial at Elastic Cloud.

Share this article