Security
Financial Services

How Visa's first Elastic Workflow cut alert triage from 10–20 minutes to seconds with a controlled, human-on-the-loop AI step

At a glance

  • Min → sec
    Per-fire triage time on the mainframe default-account detection
  • 4
    Automated pipeline stages: detection, enrichment, AI validation, webhook delivery
  • 5 min
    Schedule cadence, with a 10-minute lookback window
  • 0
    Analyst pivots before the alert reaches IR
  • 1st
    Production Workflows pipeline at Visa, designed as a reusable pattern across other detections

As part of a broader modernization of its security operations from a legacy SIEM to Elastic, Visa's cybersecurity engineering team built its first agentic AI workflow in the SOC: a four-stage pipeline that uses a constrained, human-on-the-loop AI step to produce IR-ready cases. For a high-stakes mainframe detection that previously required a manual second query, triage time dropped from 10–20 minutes to seconds, and the same controlled, auditable pattern is now reusable across other detections.

Summary

Visa is migrating from a legacy SIEM to Elastic, and during that migration the cybersecurity engineering team built its first Elastic Workflows pipeline as a proof of concept. The chosen detection, a high-stakes mainframe identity detection, previously required analysts to pivot from the alert into a follow-up search to identify the responsible user, with results that varied significantly by analyst experience with mainframe logs. The team chained two Elasticsearch Query Language (ES|QL) queries, added a constrained AI step that produces a structured summary for the IR team, and used a webhook to deliver the case directly into their IR ticketing system. When the alert fires, triage now completes in seconds instead of 10 to 20 minutes, and the same four-stage pattern is now ready to apply across other detections.

When the alert is just the start of the work

Visa secures one of the largest payments environments in the world. As part of a broader modernization from a legacy SIEM to Elastic, the cybersecurity engineering team set out to build something Visa didn’t have yet at scale: a controlled, defensible way to use AI inside security operations. The bar was specific. Any AI step had to be auditable, narrow in scope, and verifiably anchored in data the team controlled. The first place that bar got tested was a high-stakes mainframe identity detection where every fire cost the incident response team 10 to 20 minutes of manual context-gathering before investigation could even begin.

When that detection fired in the legacy SIEM, the incident response team had to log in, run a second search, and work out which user was behind the activity. The detection identified the event. The analyst still had to identify the person.

"The alert would happen. The IR team would get the alert, and they would have to log into the legacy system and run a second search. Go through the data. Make sure that they find the correct terminal for the user. But we don’t know who actually did it. So you have to run the second search, and then figure out who was the last person to be on that terminal, and then they go to the mainframe team. All of that was just sort of time-consuming."

– Visa cybersecurity engineering team

The team's framing was simple: If every actionable alert needs a second query before the IR team can act, the pipeline is incomplete. Move the second query upstream, into the pipeline itself, and the IR team gets a case that is already enriched, summarized, and ready to hand off to the mainframe team. for review.

How Visa got here: A migration with room to be creative

Visa's broader project is the migration of detection logic from a legacy SIEM to Elastic. That kind of migration tends to be heads-down rule conversion. The team made deliberate room for engineers to try new platform capabilities alongside the conversion work.

"Now, with Elastic, we learned that we can go even further. The technical limit, I don’t even consider before. Since we saw what the tool can offer, a person can be creative and come up with ideas, how to improve our processes, how we can be more efficient."

– Visa cybersecurity engineering team

A cybersecurity engineer on the team took Elastic Workflows training, picked a detection they already knew well, and worked with their Elastic solutions architect to build the first version. The Elastic team generated representative test data from samples the engineer provided, built a starter workflow from scratch, and handed it back as a template the engineer could extend.

What existing approaches could not offer was native automation that lived inside the same security platform as the data. At Visa, SOAR-style orchestration is owned by a separate automation team and runs on a separate platform from the detections themselves. With Elastic Workflows, the entire pipeline runs in one place: detection, enrichment, validation, and delivery, all native, all in one YAML file the engineer can read and edit in one screen.

Workflows was also the right starting point for a specific reason. Visa needed complete control over what the AI step received, what it produced, and how every decision could be verified. For a regulated financial services environment, that auditability is the prerequisite for trusting AI in security operations. Workflows gave the team a pattern they could fully quantify and defend, with each step visible in a single YAML file, before extending into other agentic capabilities on Elastic's roadmap.

Before: Alert, then second search, then handoff

In the previous workflow, the sequence was alert, then investigate. The detection fired. The IR team logged into the legacy SIEM. They ran the follow-up search to map the event back to a terminal and a user. Mainframe logs are not a forgiving environment for that work.

Three things made the before-state costly:

  • The second search was unavoidable. Only one person can be assigned to a terminal at a time, but terminal values are reused. To identify the user behind a flagged event, an analyst had to query for the closest prior login to that terminal, on that mainframe partition, around the time of the event.
  • Analyst skill mattered a great deal. Experienced analysts who knew the query language and the mainframe event codes could do it quickly. Newer analysts, or analysts less familiar with mainframe logs, took noticeably longer. The result was variance in both time and quality.
  • Mainframe event codes are gnarly. There is no single path to map an event to an outcome. Analysts often had to interpret system codes whose meaning depended on context, which slowed the work and made the result harder to standardize.

Time and quality were both going to the second search. The detection had identified the event; the analyst was still doing the investigation that should have been part of the detection's output.

Architecture: A 4-stage pipeline running on a 5-minute schedule

The new pipeline runs as four chained stages inside Elastic Workflows:

  1. Detection (primary ES|QL query): On a five-minute timer with a ten-minute lookback window, the workflow runs a fast, narrow ES|QL query against mainframe logs, looking for the targeted identity activity. The detection is intentionally precise; the team does not expect more than one or two hits per run, and historically the alert fires only a few times per year.
  2. Enrichment (secondary ES|QL query): For each event returned, the workflow runs a follow-up ES|QL query that takes the terminal value and the LPAR (mainframe partition) from the primary event and then searches for the closest prior login to that terminal. Because terminals are reused but only one user is assigned at a time, the closest prior login identifies the user behind the activity. Verification is quick because the same queries that produced the result are right there for the engineer to inspect.
  3. AI validation: The enriched event is passed to a large language model (LLM)-backed step. The model receives a constrained prompt: the assumption that only one person can use a terminal at a time, the primary alert details, and the result of the secondary query. Its job is to verify that the data supports the conclusion and to produce a structured summary that includes the user’s name and ID, what the original alert detected, and why this person is the identified actor. The model does not decide whether to escalate; the IR team does.
  4. Delivery (webhook): The summary is delivered via webhook to the IR ticketing system. By the time the ticket is created, the IR engineer is reading a case, not assembling one.

The architectural choice that matters most here is that the entire pipeline runs natively inside the same platform as the security data. There is no SOAR sitting on the side, no separate orchestration system to maintain, and no glue between the detection and the workflow.

Technical highlights

  • ES|QL primary query on a five-minute timer with a ten-minute lookback
  • ES|QL secondary query takes terminal and LPAR from the primary event and finds the closest prior login
  • Detection chosen for the high analyst cost of the manual second-query step it previously required
  • LLM-backed validation step receives a constrained, structured prompt, not free-form analysis
  • Context windowing: only the last 5–15 minutes of relevant records are passed to the model, reducing tokens sent and tightening the model's working context
  • Model is intentionally not named in this draft; Visa is moving between providers and the platform is model-agnostic
  • Workflow is a single YAML file authored and tested in one Kibana screen, like a code editor
  • Webhook delivers the structured summary directly into the IR ticketing system

The capabilities

Chained ES|QL detection and enrichment in a single workflow

Treating detection and enrichment as one chained flow, rather than two analyst-mediated steps, is what made the rest of the pipeline possible. ES|QL's piped model lets the team express the flow declaratively: The primary query identifies the flagged event, and the secondary query layers in the terminal-to-user mapping that determines who likely did it. There is no analyst console session in between, and there is no second tool to log into. The structure of the workflow now matches the structure of the analyst's actual question (who did this?) rather than stopping at did something happen?

A constrained AI step with deliberately narrow scope

The AI step is doing one specific job: confirming that the enrichment data supports the conclusion and producing a structured summary the IR team can read in a few seconds. It does not generate detection logic. It does not decide whether to escalate. It is told the assumption that anchors the analysis (only one person can use a terminal at a time) and is given the closest prior login to evaluate against that assumption.

"Before, the IR team had to log in to another tool and run other searches. Now they just get the alert with all of the data available. It's all summarized for them in a very neat way."

– Visa cybersecurity engineering team

That narrowness is the point. The team's choice was not "let the model investigate" but "let the model verify and summarize what the queries already returned." Every AI decision in the pipeline is evaluating data the team can see and against criteria the team controls. That is what makes the validation auditable rather than opaque.

Token-efficient context windowing

The model never sees more data than it needs to. The pipeline filters the records passed in to only those relevant to the time window of the event, the last 5 to 15 minutes, matching how the detection is framed. The result is a smaller context window, lower tokens-per-decision, and tighter reasoning, because the model is not trying to interpret records that have nothing to do with the question.

Native automation inside the security platform

The whole pipeline lives in Elastic, alongside the data. There is no separate orchestration platform to maintain, no brittle integration to keep alive during an incident, and no second team to coordinate the workflow with. Workflows authored in YAML can call out to other systems via HTTP, so the platform’s reach extends without giving up the native home. For Visa specifically, this is consequential. A separate automation team currently owns SOAR-style orchestration on a different platform from the security data. That team has already requested API access to Elastic, and the cybersecurity engineering team is open to using Workflows to absorb response work that currently sits on the other tool. The case for consolidating onto a single native automation surface is no longer theoretical inside Visa.

Standardization where analyst expertise varied

The most visible result is speed, but the operationally important result is consistency. Mainframe log interpretation was previously one of the most variable parts of triage at Visa: an experienced analyst could finish the second-query work quickly; a newer analyst, or one less familiar with mainframe event codes, took meaningfully longer and produced more variable results.

"More experienced analysts who were proficient in the legacy query language could find the answer quickly. Anyone newer, that's where it would pose difficulty. Generally they're not as familiar with the mainframe logs. They're just different. That's where you get differences in quality, and that's where I thought the AI part would be really valuable. It could clarify things quickly for them."

– Visa cybersecurity engineering team

By pushing the assembly and the summary into the pipeline, the part of the workflow that depended on analyst proficiency in legacy query languages and mainframe event codes is no longer the analyst's job. The IR engineer reads a structured summary and exercises judgment on whether to act.

The operating model in practice

The detection itself fires rarely, historically only a few times per year. That is intentional. The team chose it precisely because it is a low-volume, high-stakes alert where the per-fire cost of the manual second-query step was clearly worth eliminating. The bigger story is that this is Visa's first production workflow of its kind, and the four-stage pattern it established is now ready to apply across other detections.

"What used to require 10 to 20 minutes of analyst effort per fire now completes in seconds. The IR team is no longer receiving a raw signal that requires investigation to begin. They are receiving a fully contextualized alert that is ready for decision-making."

– Visa cybersecurity engineering team

What an IR engineer at Visa now sees when this alert fires: a ticket with the original event, the identified user, named and identified by ID, the reasoning, and the link back to the underlying data. The pipeline did the second query, the LLM verified and summarized, and the webhook created the ticket. The engineer’s first action is judgment, not query construction.

The human is still the decision-maker. The pipeline does the assembly. The AI does the verification and summarization. The engineer does what only the engineer can do: judge, verify, and escalate. The work that gets pushed upstream is the work that was repeatable. The work that stays with the engineer is the work that requires judgment.

Before and after


BeforeAfter
Alert managementRaw event requiring a follow-up search to identify the userEnriched, AI-summarized ticket arrives in IR ready for decision
Operational effort10–20 minutes per fire on the second search and mainframe-log interpretationSeconds, end to end
Investigation flowAlert, then second query in a separate toolDetection, enrichment, validation, and delivery in a single chained workflow
Analyst skill dependencyExperienced analysts could move fast; newer analysts slowed by legacy query language and mainframe event codesStructured summary removes the proficiency dependency on legacy query languages
Tooling boundariesAlert in legacy SIEM, follow-up in legacy SIEM, ticket in a separate orchestration toolDetection, enrichment, validation, and delivery all native to Elastic, with webhook handoff to the IR ticketing system
Analyst roleManual context assembly, second-search execution, mainframe-log interpretationReading the structured case, judging the result, escalating

What the 4-stage pattern teaches

The reason this is Visa's first Workflows project but not its last is that the four-stage shape is portable. The specific detection is unusual; the operational pattern around it is not. Anywhere a detection requires a known, repeatable follow-up search before an analyst can act, the same four stages apply.


"It's a known part of their runbook. The pipeline can just do those first steps and eliminate them from what the IR team has to investigate."

– Visa cybersecurity engineering team

Three principles transferred from this build apply to any detection where the second query is the bottleneck. Automate the known parts of the runbook into the pipeline itself, because if the follow-up always runs the same way, it is not analyst work. Give the AI step a job narrow enough to be auditable: a constrained prompt, a structured input, a summary the engineer can verify against the data. Window the context, so the model receives only what is relevant to the time of the event rather than the full data the queries returned. The first principle is the most generalizable. The other two are what keep the AI defensible.

What comes next

The four-stage pattern (detection, enrichment, validation, delivery) is portable. The team has already identified other detections in the migration scope that would benefit from the same chained, AI-summarized, webhook-delivered approach. The team is actively evaluating Attack Discovery, which Martin describes as the capability that drew them to Elastic in the first place. The sequencing was deliberate: Workflows gave Visa a pattern it could fully quantify and verify before extending into capabilities with more correlation logic running internally, which is the right path for a regulated financial services SOC. The team is in parallel building an AI agent for threat hunting and is scoping how Workflows can absorb response work currently handled by a separate automation group. Keeping up with Elastic’s release cadence is part of the work now, too.

"You guys keep releasing more and more new things, and you’re distracting me from all of this."

– Visa cybersecurity engineering team

"Attack Discovery is probably on top of our list. The reason we fell in love with Elastic wasn’t the workflows, it was the Attack Discovery piece. We’re trying to market this tool as an AI-first SIEM, and to evangelize Elastic to many groups within cyber at Visa."

– Visa cybersecurity engineering team

Visa is one of the largest payments networks in the world, and the scale of its security operations reflects that. Your organization may not be migrating at Visa's scale today, but the same principles apply whether you are converting your first detection or rebuilding a full SOC: If the alert requires a second query before action can be taken, the pipeline is incomplete, and the four-stage pattern works at every step of the scale.

See how Elastic Workflows lets you move investigation upstream into your detection pipeline.