Elastic Security Labs - Articles by Mike Paquette

AI-generated hunting leads: The hunt starts before you ask the question

Tue, 05 May 2026 00:00:00 GMT

Threat hunting has always been a human art; a practitioner staring at logs, forming a hypothesis, and patiently chasing it down. What if the hardest part of the hunt (knowing where to look) could be done for you, automatically, in milliseconds, and tuned specifically to your environment? This is where AI-generated hunting leads come in, allowing you to shift from reactive alerting to proactive defense with entity-centric, risk-based threat hunting tailored specifically to your environment's unique behavioral patterns.

The hunting gap no one talks about

Ask any threat hunter what slows them down, and you'll hear the same answer: it’s not the querying or the pivoting; it's the blank page. The moment before the hypothesis is formed. Most analysts know that somewhere in their telemetry there are patterns that signal compromise, lateral movement, or abuse; they just don’t know where to start.

Modern AI agents have a discovery problem: they’re brilliant at answering questions but useless if you don’t know what to ask. This "curiosity gap" traps security teams in a cycle of reactive hunting. Whether it’s waiting for a vendor threat intelligence report to drop, an alert to scream, or a CISO to grill the team during a QBR, the damage is often already done. While analysts wait for a hypothesis, AI-powered adversaries are moving at machine speed—widening an already dangerous window of opportunity.

The industry has tried to close this gap through detection rules, threat intel feeds, and user and entity behavior analytics (UEBA) scoring. These are necessary, but they're static frames applied to a dynamic reality. A UEBA anomaly tells you something is unusual. It doesn't tell you why it matters in your environment today.

The core problem

Detection rules tell you what to look for that’s very specific. Threat intel tells you what others found. Neither one tells you what your environment is uniquely at risk for right now, because neither one actually knows your environment.

Building the foundation: The entity store

Solving the hunting gap required us to first solve a data problem. Hunting leads are only as effective as their context; and in security, context is the sum of everything true about an entity over time.

We built the Elastic entity store as a purpose-built ontology for exactly this. Unlike Elastic Common Schema (ECS), which captures the state of a field at event time, the entity store is a longitudinal record, a living profile of characteristics of every user, host, and service in your environment. It tracks four dimensions that matter for security reasoning:

// Entity Store Schema — Core Characteristics

entity.attributes // Who/what the entity IS
  mfa_enabled: false // From AWS integration
  privileged_groups: ["Domain Admins"] // From AD
  asset_criticality: "high"

entity.lifecycle // Temporal facts
  first_seen: "2024-09-14T08:22:00Z"
  last_active: "2025-03-31T23:47:00Z"
  dormancy_detected: true // Inactive 47 days, now active

entity.behavior // Anomalous signals (rolling window)
  brute_force_victim: true
  unusual_login_hours: true
  new_geo_access: "DE" // First access from Germany

entity.risk // Scored risk aggregation
  calculated_level: "Critical"
  score: 94.2

This schema isn’t just storage; it's also a reasoning substrate. Each field represents a signal that, in combination with others, tells a coherent story about an entity's current threat posture. A user who was dormant for 47 days, is now active outside business hours, logged in from a new country, and doesn't have multifactor authentication (MFA) is not just risky in isolation; that combination is a hunting lead.

Reasoning over entity data

With the entity store providing rich context, we built Entity analytics AI-hunting leads. These reasoning modules traverse entity profiles and correlate data across users and hosts to surface patterns that human analysts would find meaningful, if they had the time to look everywhere at once.

These AI-generated hunting leads are automatically surfaced on our Entity analytics home page, ingesting the entity store state to identify combinations that constitute a threat hypothesis. This isn't a simple rule match; it’s a narrative hypothesis grounded in your actual environment.

What makes this different

Proactive hunting assistance tools are emerging across the industry. Many are useful tools, but they share a fundamental constraint: They reason over threat intelligence reported information plus event telemetry, not over accumulated entity knowledge.

The difference matters. A query-based approach can find events that match a pattern. An entity-aware approach can find entities that, given everything we know about them, are likely to be involved in something worth investigating. That's a fundamentally richer signal source, and it's one that gets sharper over time as entity history accumulates for what’s been missing in retrohunt features in modern security information and event management (SIEM).

Hunting as a continuous discipline

The promise of proactive security is stopping attackers before they reach their objective. Traditionally, the barrier has been analyst capacity. With the rise of AI-driven attacks, this is becoming an impossible task for humans alone.

Entity analytics AI-generated hunting leads don't replace hunters; they multiply them. A senior analyst no longer spends hours figuring out where to look. Instead, they start their shift with a prioritized set of hypotheses that the AI-generated hunting leads already curated. Their time is preserved for what only humans can do: validation, decision, escalation, and response.

What's next

Entity analytics AI-generated hunting leads are the first production expression of a broader capability roadmap: an Elastic Security that doesn't wait for you to ask a question. As Entity analytics matures, with expanded entity types such as tracking AI agents, the reasoning surface expands accordingly.

Entity analytics is available in Elastic Security. Learn more about advanced entity analytics, AI-hunting leads and how the entity store governs user entities.

Your UEBA is lying to you: Why entity record quality decides everything

Tue, 05 May 2026 00:00:00 GMT

There's an uncomfortable truth in security analytics that nobody talks about at conferences: The quality of your detections, alerts, and investigations is only as good as the entity records that represent the users, hosts, and services in your environment.

Not the machine learning models. Not the anomaly detection algorithms. Not the risk scoring engine. The entities: the foundations to teach your system about the data and protect it.

Get the entities wrong, and everything downstream is contaminated, including AI. Your baselines are fiction. Your risk scores are noise. Your analysts are chasing ghosts. And the worst part? Most user and entity behavior analytics (UEBA) implementations get the entities wrong from day one. This blog explains why the entities you don't create matter and how a confidence-tiered model helps.

A note on terminology before we go further: in this piece we’ll distinguish between the real-world thing — a person, a host, a service — and the entity record Elastic Security creates to represent it. The argument that follows is about which records are worth creating, not about which things exist. We’ll use “entity record” when the distinction matters.

One username, hundreds of identities

Consider the simplest possible approach to creating a user entity record: Take a user.name field from an event log and call it an entity. A username like deploy appears in your telemetry, so you create a deploy entity record and start building a behavioral baseline.

The problem is immediate and severe. That deploy username might exist on 200 servers. It might be used by 12 engineers and a continuous integration and continuous deployment (CI/CD) pipeline. The behavioral baseline you're building is a smoothie blended from hundreds of different machines, used by different people, for completely different purposes. The system is treating a string match as an identity, barely one step above random.

When this entity record inevitably generates elevated risk scores, analysts investigate, only to discover that the "anomaly" was just a different engineer using the same shared account in a slightly different way. Multiply this across every common username in a large environment and you've built a system that generates investigative busywork at industrial scale.

The opposite mistake: An empty dashboard

Some security vendors recognize the problem and swing to the other extreme. They decide (correctly, in principle) that only identity-provider-backed entity records are trustworthy enough for behavioral analytics. If you can't tie an entity record to an authoritative account in a directory like Okta, Entra ID, or Active Directory, don't create one at all.

The engineering reasoning is defensible. The product experience is catastrophic.

A customer rolls out endpoint agents across their fleet, opens the security analytics dashboard, and sees nothing. Zero entities. The feature looks broken. The SOC analyst has no idea why no users are showing up, and worse, has no visibility into the activity happening on those endpoints right now. A purist entity data model has produced a blind spot.

The missing middle: Host-scoped identity

Most vendors building UEBA have historically picked between two unsatisfying defaults, bare usernames that blend everyone into noise, or IdP-only entities that leave most deployments with nothing. But without explicit governance over which signals come from which source, the noise still leaks through.What's missing is a middle layer: entities derived from endpoint telemetry that are tightly scoped enough to be meaningful but carefully governed enough to avoid the noise problem.

The instinct might be to simply pair a username with a host and call it an entity record. But without guardrails, this just moves the noise problem down one level. deploy on prod-web-03 is still five engineers' blended activity. root on a shared bastion host is still everyone and no one. You've reduced the blast radius from "all servers" to "one server," but the behavioral baseline is still a fiction if the underlying account is shared, automated, or observed only through a failed brute-force attempt that never actually succeeded.

The real question isn't whether to create local host-scoped entity records. It's which host-scoped entity records are worth creating and with what governance over how they participate in risk scoring and identity resolution downstream.

The Elastic approach: Two kinds of entities, governed differently

One answer is to recognize that not all record sources are equal and to build that distinction into the architecture itself, governing how each record is created, enriched, scored, and resolved.

This is the approach Elastic Security takes. Instead of treating all entity records as interchangeable, the system draws a clear line between identity-provider-backed entities and endpoint-observed local host entities and governs each category differently under the hood.

Identity-provider-backed entities are created only by authoritative identity systems: Okta, Entra ID, Google Workspace, Active Directory, and other identity-provider namespaces across identity and access management (IAM), cloud platforms, software as a service (SaaS), and privileged access management (PAM) systems. These entity records represent verified accounts in systems that own and manage those accounts. They get the full analytical treatment, that is, rich behavioral baselines, cross-platform enrichment from 120+ security integrations, and full participation in person-level risk scoring.

Endpoint-observed entities are created from endpoint telemetry: Your endpoint detection and response (EDR) agent observes jdoe active on a specific host and creates a host-scoped entity record tied to that local machine. These entity records are real and useful, but the system knows they carry less identity certainty, and it governs them accordingly.

Analysts don't need to think about any of this machinery. What they see is an entity store that's populated from day one with more accurate user entity records. The governance happens in the architecture so it doesn't have to happen in the analyst's head.

(An endpoint-observed entity for sarah.chen active on her corporate MacBook. The entity name (sarah.chen@CORP-MAC-SC-2024) uses the human-readable hostname for readability. The entity ID (user:sarah.chen@b92f1e3a-7d4c-4a8b-9f2e-1c3d5e7f9012@local) uses the machine's hardware UUID instead of its hostname — this is intentional. Hostnames change when a device is renamed or reimaged; the hardware UUID is permanent. The @local suffix identifies this as an endpoint-observed entity: it represents sarah.chen's activity on this specific machine, not sarah.chen as a verified identity across your organization.)

From fragmented accounts to a Unified User Group.

Even when you get entity record creation right, you’re still left with a fragmentation problem no amount of per-entity discipline can solve alone.

Consider John Doe in a large enterprise. He has an Okta account for SaaS access, an Entra ID account tied to his corporate laptop, and an Active Directory account for on-prem systems. Each is a legitimate, authoritative entity record by every standard described above, and yet they’re three separate records for the same human being, each with its own risk history, each generating signals in isolation.

When John’s Entra account shows a lateral movement indicator the same day his Okta account flags a suspicious login from an anomalous location, those signals may never connect. They exist as separate entities, investigated in isolation, by analysts who have no automated way to know they belong to the same person. The cross-surface campaign that entity analytics is supposed to catch hides in plain sight between identity providers.

Elastic Security solves this through automatic entity resolution: consolidating a user’s fragmented digital footprint across Okta, Entra ID, Active Directory, and more into a single unified identity. John Doe becomes a first-class citizen in the entity store: one primary record, all associated accounts grouped together, one aggregated risk score that reflects everything happening across every identity surface he touches. Resolution runs continuously as new integrations come online, without manual curation.

In the image above, we've created one entity record per user account and grouped them together, so when an analyst reviews the record, all related identities appear in one place.

What this changes for analysts

For the security operations center (SOC) analyst working a 2 a.m. alerted by their Elastic agentic workflow, these two architectural decisions (confidence-tiered entity governance and unified identity resolution) change the investigation fundamentally.

Instead of opening four separate entity cards for the same person, they open one. Cross-provider risk signals are aggregated into a single score, and attack narratives that span identity systems are visible as narratives, not as disconnected data points buried in separate views.

Compare this to investigating a bare deploy entity record with a risk score of 70 that’s actually an artifact of 12 people’s blended activity across 200 servers. One investigation leads somewhere. The other erodes trust in the entire capability, which is the real cost of noisy entity records. Not just the false positive itself, but the analyst who learns to ignore entity risk scores entirely.

The entities records you don't create

Perhaps the most underappreciated aspect of entity analytics design is restraint. Every entity record you create has a cost: Compute for baselining, storage for history, analyst attention when it generates alerts, and potential for noise propagation into the broader analytical model.

The discipline to not create an entity record (to require a minimum evidence threshold) is what separates an entity analytics system that gets more useful over time from one that slowly drowns its operators in noise.

In practice, this means maintaining a configurable exclusion list for common service and shared accounts: root, jenkins, deploy, postgres, and others like them. These accounts exist on hundreds of machines, are used by automated processes and multiple humans interchangeably, and would produce baselines that mean nothing. Elastic Security ships a default list covering the most common offenders. Today the list operates at the username level — root is excluded uniformly regardless of which host it appears on — and is fixed. Future iterations will make it configurable, letting teams add their own environment-specific service accounts and, eventually, specify compound patterns that combine username and host. That would allow excluding root globally on shared infrastructure while still creating a host-scoped entity when that account appears on a personal workstation where the activity is attributable to a specific person. The architecture is already built for it: because endpoint-observed entities are keyed as {user.name}@{host}, compound rules are a coherent extension, not a redesign.

The higher-fidelity approach we've described directly mitigates the broader noise problem. When entity records are created from any username string in a log, your ML models end up baselining service accounts, typo'd logins, and shared kiosk accounts as if they were people — a 2 a.m. backup job looks anomalous against a human baseline, and a one-event typo'd login never accumulates enough data to baseline at all. When entity records are anchored to authoritative identity sources and resolved across their various login forms, the model learns patterns for actual humans doing actual work. Anomalies become meaningful because the baseline is meaningful.

The best entity analytics isn't the one that creates the most entities. It's the one that creates the right entities, governs them by what it actually knows about their source and scope, and builds its analytical investment proportionally. Everything else (risk scoring, behavioral baselines, entity resolution, anomaly detection, and AI skills) is downstream of that foundational decision. Get the entity records right, and the analytics follow. Get them wrong, and no amount of machine learning or AI can save you.

Entity analytics is available in Elastic Security. Learn more about advanced entity analytics and how the entity store governs user entities.