Security Automation with Elastic Workflows: From Alert to Response

The daily loop

An alert fires. You open it. You read through the details. You gather context from the surrounding activity. You check for related signals across your environment. You decide what it means and what to do next. Sometimes you escalate. Sometimes you close it and move on.

You do this dozens of times a day. The steps are almost always the same. The data you need is already in your SIEM. The actions you take are predictable. But the work is still manual.

This is the kind of work that automation should handle. Not because it's hard, but because it's repetitive, and every minute spent on repetitive manual triage is a minute not spent on the alerts that actually need a human.

Elastic Workflows brings that automation into the SIEM itself. No separate tool. No integration to build. Your detection rule fires, and a workflow runs, with direct access to your alerts, cases, and security data.

This blog post walks through building a security playbook with Workflows, step by step. We'll start simple and build up to a workflow that runs when an alert fires, checks threat intel, gathers context, creates cases, notifies the team, and brings in AI when the investigation calls for it.

If you're new to Workflows, the introductory technical deep dive blog and video cover the core concepts of Workflows. This post focuses on applying these concepts in a security context.

Quick orientation

Workflows are YAML definitions that run inside Kibana. You define what should happen, and the platform handles execution. At a high level, a workflow is composed of three main parts: triggers (when it runs), steps (what it does), and data flow (how information moves between steps).

Triggers decide when the workflow runs. An alert trigger runs on a detection. A scheduled trigger runs on a cadence. A manual trigger runs on demand. A workflow can have more than one.

Steps define what the workflow does. They run in order and can use outputs from earlier steps. They can query data in Elasticsearch, update alerts and cases in Kibana, and call external systems like sending a Slack message or scanning a hash on VirusTotal. They can also apply logic such as conditionals or loops, and use AI for tasks like summarizing text, prompting an LLM, or invoking agents when deeper reasoning is needed.

This is the toolkit. With these primitives, you can build workflows that take a signal, gather context, and drive a response.

Building a security playbook

We'll build an alert triage workflow incrementally. Each section adds a capability, and by the end, you'll have a working playbook that handles the full triage loop.

Start with the trigger

Security workflows start with an event. It could be an alert, a case update, a user action, or a scheduled check. The workflow takes that signal, gathers context, and decides what to do next.

We’ll start with alert triage. It’s the most common path, and it shows the full loop end to end. Each section adds a capability, and by the end, you’ll have a working playbook.

Here’s a minimal workflow with an alert trigger:

name: Alert Triage Playbook
description: Enriches alerts, checks threat intel, creates a case, and notifies the team.
enabled: true
tags:
  - security
  - triage

triggers:
  - type: alert

steps:
  # we'll build these out

The alert trigger connects this workflow to detection rules. You link a specific rule to this workflow from the rule's Actions settings in Kibana. When the rule fires, the workflow runs and receives the full alert context through the event variable. That includes event.alerts (the alert documents), event.rule (the rule metadata), and every field on the alert.

From here, you start adding steps.

Check threat intel

The first real step: take the file hash from the alert and check it against VirusTotal. Workflows have a built-in VirusTotal connector, so you don't need to construct HTTP requests or manage API keys in your YAML (connector credentials like VirusTotal API keys or Slack tokens are configured once in the connector under Stack Management > Connectors):

  - name: check_virustotal
    type: virustotal.scanFileHash
    connector-id: "my-virustotal"
    with:
      hash: "{{ event.alerts[0].file.hash.sha256 }}"
    on-failure:
      retry:
        max-attempts: 2
        delay: 3s
      continue: true

Every step in a workflow follows a simple, consistent structure. It starts with a name, which gives the step a clear identity, and a type, which defines the action being performed. In this case, the step calls the VirusTotal file hash scan capability. Because this is a connector-backed action, it also includes a connector-id, which tells the workflow which configured integration to use, including its credentials.

The with block is where you pass inputs into the step. Each step type defines the parameters it accepts. Here, you provide the file hash to scan. Rather than hardcoding values, workflows use a built-in templating engine powered by LiquidJS. The {{ }} syntax lets you reference data from the execution context, so the hash is pulled directly from the alert that triggered the workflow.

Finally, the on-failure block defines how the step behaves if something goes wrong. In this case, it retries twice with a short delay and continues execution even if the lookup fails. This is important in production workflows, where a transient external API issue should not block the entire triage process.

Gather context with ES|QL

Next, query for related alerts on the same host. ES|QL runs directly against your security indices, so there's no API bridging or credential management:

  - name: related_alerts
    type: elasticsearch.esql.query
    with:
      query: |
        FROM .alerts-security*
        | WHERE host.name == "{{ event.alerts[0].host.name }}"
        | WHERE @timestamp > NOW() - 24 hours
        | STATS
            alert_count = COUNT(*),
            rules_triggered = VALUES(kibana.alert.rule.name),
            users_involved = VALUES(user.name)
      format: json

This tells you whether the host has been generating other alerts, which rules triggered, and which users were involved. That context is included in the case description and informs the severity assessment later.

The same approach works for any enrichment that touches data in Elasticsearch: looking up a user's first-seen date, checking how many times a hash has appeared in your logs, or pulling the process tree from endpoint data. If the data is in your cluster, ES|QL can get it.

Branch on findings

Now the workflow needs to decide what to do. If VirusTotal flagged the file as malicious, create a case and respond. If not, close the alert as a false positive:

  - name: check_malicious
    type: if
    condition: steps.check_virustotal.output.stats.malicious > 5
    steps:
      # true positive path: steps below
    else:
      - name: close_false_positive
        type: kibana.SetAlertsStatus
        with:
          status: closed
          reason: false_positive
          signal_ids:
            - "{{ event.alerts[0]._id }}"

The if step evaluates a condition and runs different steps depending on the result. The false positive path closes the alert in a single step. The true positive path continues below.

Create a case

When the alert is confirmed malicious, open a case with context from previous steps:

      - name: create_case
        type: kibana.createCase
        with:
          title: "Malware Detected: {{ event.alerts[0].file.hash.sha256 }}"
          description: |
            Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

            **Detection:** {{ event.rule.name }}
            **User:** {{ event.alerts[0].user.name }}
            **VirusTotal:** {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file
            **Related alerts (24h):** {{ steps.related_alerts.output.values[0][0] }} 
              alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules
          owner: securitySolution
          severity: high
          tags:
            - automation
            - malware
          settings:
            syncAlerts: false
          connector:
            id: none
            name: none
            type: ".none"
            fields: null

Liquid templating pulls data from the alert (event), from the VirusTotal results (steps.check_virustotal.output), and from the ES|QL query (steps.related_alerts.output). Every field from every previous step is available to every subsequent step.

Notify the team

Send a Slack message so the team knows a confirmed case is open:

      - name: notify_team
        type: slack
        connector-id: "security-alerts"
        with:
          message: |
            Malware confirmed on {{ event.alerts[0].host.name }}.
            VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.
            Case created: {{ steps.create_case.output.id }}

Slack is one option. Jira, ServiceNow, PagerDuty, Microsoft Teams, email, and Opsgenie are all supported as connector steps.

The complete workflow

Here's the full workflow assembled:

name: Alert Triage Playbook
description: Enriches alerts, checks threat intel, creates a case, and notifies the team.
enabled: true
tags:
  - security
  - triage

triggers:
  - type: alert

steps:
  - name: check_virustotal
    type: virustotal.scanFileHash
    connector-id: "my-virustotal"
    with:
      hash: "{{ event.alerts[0].file.hash.sha256 }}"
    on-failure:
      retry:
        max-attempts: 2
        delay: 3s
      continue: true

  - name: related_alerts
    type: elasticsearch.esql.query
    with:
      query: |
        FROM .alerts-security*
        | WHERE host.name == "{{ event.alerts[0].host.name }}"
        | WHERE @timestamp > NOW() - 24 hours
        | STATS
            alert_count = COUNT(*),
            rules_triggered = VALUES(kibana.alert.rule.name),
            users_involved = VALUES(user.name)
      format: json

  - name: check_malicious
    type: if
    condition: steps.check_virustotal.output.stats.malicious > 5
    steps:
      - name: create_case
        type: kibana.createCase
        with:
          title: "Malware Detected: {{ event.alerts[0].file.hash.sha256 }}"
          description: |
            Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

            **Detection:** {{ event.rule.name }}
            **User:** {{ event.alerts[0].user.name }}
            **VirusTotal:** {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file
            **Related alerts (24h):** {{ steps.related_alerts.output.values[0][0] }} 
              alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules
          owner: securitySolution
          severity: high
          tags:
            - automation
            - malware
          settings:
            syncAlerts: false
          connector:
            id: none
            name: none
            type: ".none"
            fields: null

      - name: notify_team
        type: slack
        connector-id: "security-alerts"
        with:
          message: |
            Malware confirmed on {{ event.alerts[0].host.name }}.
            VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.
            Case created: {{ steps.create_case.output.id }}

    else:
      - name: close_false_positive
        type: kibana.SetAlertsStatus
        with:
          status: closed
          reason: false_positive
          signal_ids:
            - "{{ event.alerts[0]._id }}"

That's the triage loop, automated. Alert fires, threat intel checked, context gathered, decision made, case created, team notified. Every execution is logged and auditable.

This is a starting point. The traditional-triage.yaml in the Elastic Workflows library on GitHub goes further: it isolates the host, looks up the on-call analyst, creates a dedicated Slack channel, assigns the case, and posts a rich incident summary. Same patterns, more steps.

Adding AI to the playbook

The workflow above handles a defined path. If the hash is malicious, do X; otherwise, do Y. That covers a lot of triage work. But not every alert fits a clean branching condition, and not every case description should be a list of raw fields.

Workflows include AI steps that handle the parts where structured logic runs out. There are three, and they work together.

Classify: let AI drive the branching

Instead of branching on a VirusTotal score threshold, use ai.classify to categorize the alert. It considers the full alert context, not just a single number:

  - name: classify_alert
    type: ai.classify
    with:
      input: "${{ event }}"
      categories:
        - malware
        - phishing
        - lateral_movement
        - data_exfiltration
        - false_positive
      instructions: |
        Classify this security alert based on the alert details,
        rule name, and affected entities.
      includeRationale: true

The output is structured: steps.classify_alert.output.category returns a single string like "malware" or "false_positive". That drives the if condition directly. The rationale explains why, and you can include it in the case for audit purposes.

Summarize: write case descriptions that adapt

Rather than templating raw field values into a case description, use ai.summarize to generate a readable overview. Run it once before case creation for the initial description, and once after the agent investigation to update the description with the full picture:

  - name: initial_summary
    type: ai.summarize
    with:
      input: "${{ event }}"
      instructions: |
        Write a one-paragraph overview of this security alert.
        State what was detected, on which host, by which user, and the severity.
        Do not include recommendations. Just the facts.
      maxLength: 300

The summary adapts to whatever fields are present on the alert, so you don't need to account for every possible field combination in your Liquid templates. Use steps.initial_summary.output.content in the case description and the Slack notification.

Agent: investigate what the playbook can't

The ai.agent step invokes an Agent Builder agent. Unlike classify and summarize, an agent has access to tools. It can query your indices, check threat intel, correlate signals across data sources, and reason about what it finds:

  - name: escalate_to_agent
    type: ai.agent
    agent-id: "security-agent"
    create-conversation: true
    with:
      message: |
        Investigate this alert. Search for related activity on this host,
        check for persistence mechanisms and lateral movement,
        and determine the full scope of the incident.
        Alert: {{ event | json }}
        Classification: {{ steps.classify_alert.output.category }}
        VirusTotal: {{ steps.check_virustotal.output | json }}
        Related alerts: {{ steps.related_alerts.output | json }}
    timeout: 10m

The agent processes the input, calls whatever tools it needs, and returns its findings. The workflow waits, then continues with the next steps: adding the investigation to the case, notifying the team, and updating the case description with a concise summary of what the agent found.

Setting create-conversation: true persists the conversation, so the workflow can fetch the agent's reasoning trail and add it to the case as a structured comment with clickable links to each query it ran. And the analyst gets a direct link to pick up the conversation with the agent if they want to dig deeper.

Putting it together

In the full version of this workflow, the three AI steps work in sequence:

Classify the alert to drive the triage decision
Summarize the alert for the initial case description and Slack notification
Agent investigates the full scope: persistence, lateral movement, IOCs, affected systems
Summarize again, this time distilling the agent's findings into a concise, updated case description

The case starts with a clean factual overview and evolves into a comprehensive summary as the investigation completes. The agent's full analysis and reasoning trail live as case comments for analysts who want the details.

The complete workflow, including the AI investigation pipeline with reasoning trails, clickable Discover links, and follow-up Slack notifications, is available in the Elastic Workflows library on GitHub.

Workflows as agent tools

The integration between Workflows and Agent Builder works in both directions. Workflows can call agents (as shown above). And agents can call workflows.

When you expose a workflow as a tool in Agent Builder, an agent can invoke it during a conversation. The agent decides what needs to happen, and the workflow handles the execution reliably and repeatably.

This is the pattern demonstrated in the Chrysalis APT blog post: a two-step workflow hands the entire Attack Discovery to an agent, and the agent calls workflow-backed tools to verify malware hashes, search logs, check the on-call schedule, create a case, and spin up a Slack channel. The workflow is the trigger and the safety net. The agent is the brain.

Agents reason. Workflows execute. Together they cover the full range from judgment to action.

Open by design

Not every team starts from zero. Some already have automation running in Tines, Splunk SOAR, Palo Alto XSOAR, or another platform. Workflows don't ask you to replace any of your existing tools.

The idea is straightforward: use Workflows for the parts of your automation that are native to Elastic. Alert triage, enrichment from your own indices, case management, and alert status updates. These touch your Elastic data directly, and a native workflow will always be simpler and faster than an external tool making API calls back into Elastic.

For everything else, connectors bridge the gap. We have native connectors for Tines, Resilient, Swimlane, TheHive, D3 Security, Torq, and XSOAR. A workflow can kick off a Tines story, push an incident to Resilient, or trigger any external system via HTTP. Your existing tools handle cross-platform orchestration. Workflows handle what's native. As the capability grows, you can consolidate at your own pace. Nobody's forcing a migration.

What's here and what's next

Workflows is available today. Here's what you can build with it today:

Alert triggers connect workflows to detection and alerting rules
Case and alert management through named Kibana steps (kibana.createCase, kibana.SetAlertsStatus, kibana.addCaseComment, and more)
Direct data access via Elasticsearch search and ES|QL
39 workflow-compatible connectors covering threat intel (VirusTotal, AbuseIPDB, GreyNoise, Shodan, URLVoid, AlienVault OTX), ticketing (Jira, ServiceNow), communication (Slack, Teams, PagerDuty, email), SOAR platforms (Tines, Resilient, Swimlane, TheHive, and others), and AI providers
AI steps for classification, summarization, prompts, and Agent Builder invoking Elastic Agents/Skils
YAML authoring with autocomplete, validation, and step testing in Kibana
50+ example workflows on GitHub, including security-specific templates for detection, enrichment, and response

What's coming:

Visual workflow builder for drag-and-drop authoring
In-product template library to browse and install workflows directly in Kibana
Human-in-the-loop approvals that pause workflows for human input via Slack, email, or the Kibana UI
Natural language authoring where AI helps translate intent into working workflows

Today, authoring is YAML-based. If you've written detection rules or configured CI/CD pipelines, the learning curve is gentle. The editor has built-in autocomplete, validation, and step testing, and the example library gives you templates to start from. A visual builder is coming to make this accessible to a wider audience.

Get started

Elastic Workflows is available now. To start building:

Start an Elastic Cloud trial or enable Workflows in your existing deployment under Stack Management > Advanced Settings
Explore the Workflows documentation
Browse the Elastic Workflow Library on GitHub for security templates you can adapt
Read the introductory technical deep dive for core concepts
See the Chrysalis APT blog for a complete Attack Discovery + Workflows + Agent Builder walkthrough

Start with the workflow that would save you the most time tomorrow.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.