Exploiting the Home windows Kernel through Malicious IPv6 Packets (CVE-2024-38063)

For the reason that newest Home windows patch dropped on the thirteenth of August I’ve been deep within the weeds of tcpip.sys (the kernel driver liable for dealing with TCP/IP packets).
A vulnerability with a 9.8 CVSS rating in essentially the most simply reachable a part of the Home windows kernel was one thing I merely couldn’t go up on.
I’ve by no means actually checked out IPv6 earlier than (or the drivers liable for parsing it), so I knew making an attempt to reverse engineer this vulnerability was going to be extraordinarily difficult, however a superb studying expertise.

For essentially the most half, tcpip.sys is essentially undocumented. I used to be capable of finding a few exploit write-ups for older bugs: right here, right here, and right here, however little else.
When the highest search outcome for my English Google search is written in Chinese language, I instantly know I’m method out of my depth and in for a nasty time, however study we should. Regardless of Google Translate doing a mediocre job, the submit offered some extremely detailed perception into how IPv6 fragmentation works, and gave me a superb head begin.

Afterward, whereas Googling some perform names, I got here throughout one other evaluation of the identical 2021 vulnerability, written by Axel Souchet (AKA 0vercl0k), which went even deeper into the internals of tcpip.sys and gave me sufficient data to outline a number of undocumented buildings.

Normally, even simply reverse engineering the patch to determine which code change corresponds to the vulnerability can take days and even weeks, however on this case it was on the spot.
It was really easy in reality, that a number of folks on social media instructed me I used to be improper and that the bug was some other place. Did I truly hearken to them after which waste a whole day reversing the improper driver? We could by no means know.

There was precisely one change made in the whole driver file, which it seems, truly was the bug in spite of everything.


A bindiff overview of tcpip.sys earlier than and after putting in the patch.

Solely a single perform in the entire driver has been modified. Usually, I might spend a complete day going by means of 20+ completely different perform adjustments simply to determine which is the one I ought to be taking a look at, however not this time.


Ipv6pProcessOptions() earlier than the patch.


Ipv6pProcessOptions() after the patch.

Not solely was it only a single perform that was modified, however a single line of code.

The extraordinarily long-named Feature_2660322619__private_IsEnabledDeviceUsage_3() perform is one thing Microsoft generally provides to allow partial patch rollbacks.
The decision checks for the presence of a world flag, or registry setting, which if set, will trigger the perform to return false, ensuing within the unique code being executed as a substitute of the patched model.

The rationale Microsoft does it’s because safety patches generally unintentionally break issues, so this setting permits an administrator to unpatch a single vulnerability, with out uninstalling the whole month-to-month patch rollup and drastically weakening their system safety.

Taking this into consideration, it’s clear that each one this patch does is exchange a name to IppSendErrorList() with IppSendError(), giving us a clue that the difficulty is with some form of listing. Best patch diff ever (or so I assumed).

Reverse engineering the patch to seek out the altered code is barely half of the problem (or on this case lower than 0.1%).
The remainder of the method consists of reverse engineering sufficient of the codebase to know what’s even occurring, determining what sort of vulnerability was patched, the way to craft a request to achieve the goal code, and what state ends in an exploitable situation.

The primary half is simple sufficient. The change is in Ipv6pProcessOptions(), which tells us it’s IPv6 and includes processing choices. So, a fast name to the RFC tells us precisely what an IPv6 choice is and the place we are able to discover one.


The vacation spot choices header structure from Wikipedia.

Okay, cool. What we’re in search of seems to be the vacation spot choices header, which sits straight after the principle IPv6 header.
Let’s use the Python library ‘scapy’ to craft a take a look at IPv6 packet.

Observe: To mitigate DDoS assaults utilizing spoofed IP addresses, Home windows restricts the power to assemble uncooked IP packets. Because of this, I opted to make use of Linux to develop my proof-of-concept. While Linux does enable customers to assemble and ship uncooked layer 2 and layer 3 packets, it requires the Python script to be run as root.

import sys
import struct
from scapy.all import *


def send_ipv6_option_packet(dest_ip):
	ethernet_header = Ether()
	ip_header = IPv6(dst=dest_ip)
	options_header = IPv6ExtHdrDestOpt()
	sendp(ethernet_header / ip_header / options_header)
	
	
if len(sys.argv) < 2:
	print('Use: python3 script.py <target_ipv6_address>')
	exit(-1)

send_ipv6_option_packet(sys.argv[1])

After setting a breakpoint on tcpip!Ipv6pProcessOptions, then working the script, it was clear that each one that was wanted to achieve the weak perform was sending an IPv6 packet with an empty choices construction.
I then tried including some invalid choices to the construction to see if I might attain the decision to IppSendErrorList().

A quick code assessment indicated that just about any invalid choice formatting might set off the decision to IppSendErrorList.
So, I made a decision on utilizing the Jumbo Packet choice with an invalid size (lower than 65535 bytes).

options_header = IPv6ExtHdrDestOpt(choices=[Jumbo(jumboplen=0x1337)])

So, what does IppSendErrorList() truly do? Properly, the code is fairly easy.


Your entire IppSendErrorList perform.

The code iterates a linked listing and calls IppSendError() on each merchandise within the listing.
Once more, the celebs have aligned and issues have been simple so far.
If IppSendErrorList simply calls IppSendError for every merchandise in a listing, and the patch replaces the decision to IppSendErrorList with IppSendError, then the issue happens when IppSendError is named on a listing merchandise apart from the primary.

So, what is that this a listing of, and the way will we make one?

That is the place issues went from apparent to abnormally tough, although I believe a big a part of this was on account of considered one of my two accessible mind cells being preoccupied with combating a nasty covid an infection.
I misplaced a few days to understanding elements of the code, falling asleep, then forgetting what it was I had found out.
The entire course of required over every week of reverse engineering elements of tcpip.sys to determine what was occurring. However Axel’s weblog submit was extraordinarily useful.

By wanting on the capabilities and construction Axel reverse engineered, and which different capabilities they’re handed to, It’s clear that the one and solely argument handed to Ipv6pProcessOptions() is similar packet_t construction outlined within the article.
Basically, the pointer handed to Ipv6pProcessOptions, and iterated by IppSendErrorList, is a linked-list of packets.

So, I set a breakpoint on Ipv6pProcessOptions() and inspected the listing.


The list->Subsequent entry is NULL.

Each time my breakpoint was hit, the listing contained just one packet.
I spent method longer than I’d wish to admit making an attempt to determine why and the way to get my listing to truly be a listing.
My first thought was IPv6 fragmentation: IPv6 permits senders to separate massive packets into separate smaller packets, which it might make sense to maintain collectively in a listing.

After intensive reverse engineering, I confirmed my assumptions have been appropriate, although the fragment listing isn’t associated to the one we’re coping with right here.

I truly ended up discovering the reply solely by chance.
Sometimes, the listing would populate, however the purpose was unclear. After a lot going spherical in circles, I spotted that when my kernel breakpoint is triggered, it pauses the whole kernel, inflicting the community adapter to build up packets. When the kernel resumes, these packets are handed down the stack to tcpip.sys in a pleasant neat listing. This solely occurred if the packets have been ship whereas the kernel was paused, however not processed earlier than the subsequent breakpoint was hit.

This conduct is probably going a efficiency optimization, the place at low throughput, the kernel processes packets individually, however at greater volumes, packets are organized into lists and processed in batches.
Most certainly lists are seperated primarily based on components equivalent to protocol and supply handle to hurry up processing, so our listing ought to comprise solely IPv6 packets we despatched.

Now that we all know packets are coalesced into lists throughout excessive throughput, it’s clear what the best choice could be.
Our DoS PoC is mockingly going to have to make use of DoS to set off the DoS situation.
If we flood the system with bursts of IPv6 packets, we must always be capable of get a pleasant large listing handed to IppSendErrorList().

At first, regardless of what number of packets I despatched, I might nonetheless solely get the listing to be n > 1 if I paused the kernel.
However… since we’re utilizing Python (painfully sluggish), in a VM (doubly painfully sluggish), we’re most likely going to want to tweak some settings.
To be able to counteract the VM-ception occurring on my assault system, I made a decision to easily reconfigure the goal VM to make use of solely a single CPU core.


Good! The packet listing is now a listing containing plenty of entries!

So, seems a VM inside a VM isn’t the most suitable choice for DoS, who’d have thought? However we made it work ultimately.
Now, we simply want to determine what IppSendError() does and through which half the issue lies.

After some intensive reverse engineering, it grew to become a lot clearer what IppSendError does.
Underneath common circumstances, it merely disables the packet by setting net_buffer_list->Standing to 0xC000021B (STATUS_DATA_NOT_ACCEPTED). Then, it transmits an ICMP error containing details about the faulty packet again to the sender.


Two related elements of IppSendError.

My first port of name was to see if there have been any capabilities in tcpip.sys which ignore the net_buffer_list->Standing worth.
This might outcome within the driver processing packets which are in an undefined or surprising states, hopefully resulting in an exploit situation.


The primary loop liable for processing packets.

For the reason that loop liable for calling all of the parsing capabilities is wrapped in an error test (that means we are able to’t go wherever as soon as the error code is about), I figured this was the improper rabbit gap to go down.
As a substitute, I made a decision to return to IppSendError and see if there are any code paths that modify the packet state previous to setting the error code, which might result in a race situation.

After much more reverse engineering, I discovered the next code close to the very backside of IppSendError.


A code path in IppSendError which units the packet_size to zero.

When IppSendErrorList, and thus IppSendError, is named with the argument always_send_icmp set to true, it seems to try to ship the ICMP error to each packet within the listing.

Then, for causes most likely identified solely to god, it reaches a block of code the place the packet->packet_size discipline will get set to zero.

To be able to set always_send_icmp to true, all we have to do is trigger a particular error within the choices header processing by setting the ‘Choice Kind’ worth to any quantity bigger than 0x80.

def build_malicious_option(next_header, header_length, option_type, option_length):
    dest_options_header = 60
	
    options_header = struct.pack('BBBB', next_header, header_length, option_type, option_length) + b'1337'
    return Ether(dst=mac_addr) / IPv6(dst=ip_addr, nh=dest_options_header) / uncooked(options_header)

packet = build_malicious_option(next_header=59, header_length=0, option_type=0x81, option_length=0)	
sendp(packet)

However, shouldn’t setting the packet_size to zero break the parser?


A snippet from the principle loop liable for processing packets.

The packet handler merely calls a VTable perform primarily based on the packet->next_header worth, which stays unchanged from when it was set throughout pre-parsing. this enables packet processing to proceed and even provides us management over which processing happens.

As a result of the packet->next_header worth is obtained from the ‘Subsequent Header’ discipline of the IPv6 packet, we are able to set it to any legitimate IPv6 header worth, and the loop will name the corresponding parser. This provides us plenty of potential assault floor.


The IPv6 packet format.

All that’s left to do is discover a reachable a part of the IPv6 parser that does one thing foolish with the packet_size discipline.

The primary place I made a decision to look was the IPv6 fragment parser, as a result of that’s the place the previous cve-2021-24086 vulnerability was, so it appeared like a superb place to seek out extra wacky code.


Eh… it’s so shut, but in addition to date.

We do have a vulnerability right here, however it’s not an RCE.

Basically, on most CPUs, registers are round. For those who increment a register previous its most doable worth, it loops again round to zero.
Equally, when you decrement it under its lowest doable worth, it loops round to the very best doable worth.
These are known as integer overflows and integer underflows respectively. This conduct is barely completely different for signed integers, however we aren’t coping with these right here.

The primary line, fragment_size = LOWORD(packet->packet_size) - 0x30, consists of the next ASM code:


The ASM code calculating the fragment dimension.

AX is the low 16-bits of the EAX register. Though the EAX register is 32-bits, AX operates as if it have been its personal 16-bit register, subsequently and any overflows or underflows are confined to AX, and gained’t have an effect on the remainder of the EAX register.
That is extremely handy as a result of an underflow on the EAX register would lead to a worth of 4 billion, which might lead to an try and allocate 4GB of reminiscence, which might possible fail.

For the reason that worth of packet->packet_size is zero, this code units ax to zero, then subtracts 0x30 from it.

Underneath regular situations the packet header is 0x30 bytes, so packet_size - 0x30 is the dimensions of the fragment knowledge.

In our case, packet->packet_size is 0, so subtracting even 1 from it is going to trigger the register to loop round to the utmost doable 16-bit integer worth (0xFFFF).
Since we’re subtracting 0x30, the worth of AX will underflow and turn into MAX_VALUE - 0x2F, or 0xFFD0, which is 65,488.

Sadly, for the reason that identical calculation is used for each reminiscence allocation and copying knowledge, we don’t get a buffer overflow.
I consider RtlCopyMdlToBuffer() additionally performs bounds checking on the supply buffer, so we don’t even get an out-of-bounds learn both.
Nevertheless, we don’t come away fully empty-handed.

As a result of ExAllocatePoolWithTagPriority() doesn’t zero the allotted reminiscence, and RtlCopyMdlToBuffer() solely copies the precise quantity of knowledge accessible, we get round 65kb of uninitialized kernel reminiscence.
Since reminiscence addresses are recycled after deallocation, the buffer is prone to be full of no matter was beforehand saved on the handle previous to reallocation.
If we are able to use fragmentation to assemble a packet which will get despatched again to us, equivalent to an ICMP Echo request, we might doubtlessly leak random kernel reminiscence, resulting in an ASLR bypass.

On high of that, the code additionally units reassembly->fragment_size to the underflowed 16-bit integer (65,488), so we now have two separate variables that we might doubtlessly use to trigger a buffer overflow.

Sadly, (or luckily, because it most likely saved me plenty of time), somebody beat me to the punch.
Earlier than I might discover someplace to make use of one of many underflowed integers to set off a buffer overflow, @ynwarcs discovered the reply and revealed a PoC. This solves the ultimate piece of my puzzle.

The answer (or at the least considered one of them) is Ipv6pReassemblyTimeout().
While we are able to’t trigger an overflow within the preliminary fragment dealing with, we apparently can throughout cleanup.

IPv6 fragments will stick round in reminiscence till considered one of three situations happens:

  1. We mess up our fragmentation badly sufficient that the system tells us it’s time to cease.
  2. We ship a fraction with the ‘Extra’ discipline set to 0, which signifies that is the final fragment, and the system will start reassembly.
  3. We don’t ship the final fragment earlier than the timeout interval (60 seconds) expires, and the system drops the fragments.

Ipv6pReassemblyTimeout() will get known as underneath situation 3, so let’s study how this may be exploited.


That is precisely what we’d like!

Beforehand, our concern was that the code used the very same calculation for each the reminiscence allocation and the copy operation.
This code, then again, doesn’t. Let’s take a deeper take a look at the ASM to see the way it’s exploitable.


The meeting code liable for calculating the allocation dimension.

As you’ll be able to see right here, the primary a part of the calculation (fragment_list->net_buffer_length + reassembly->packet_length + 8) is finished utilizing the 16-bit DX register.

For those who’ll keep in mind from earlier, we underflowed reassembly->packet_length to be 0xFFD0. So the DX register, after including the 8 bytes, is 0xFFD8.
If fragment_list->net_buffer_length is bigger than 0x27 (39 bytes), DX will overflow and reset to zero.

fragment_list->net_buffer_length ought to be round 0x38 bytes, so it’ll lead to overflowing the DX register to eight.
After the 0x28 bytes is added, we’ll get a reminiscence allocation of simply 48 bytes.

For the reason that subsequent memmove() calls simply makes use of the unadulterated reassembly->packet_length worth for dimension, it’ll lead to 65,488 bytes being copied from reassembly->payload to a 30 byte buffer.
A fantastic added bonus is that a lot of the info copied comes from the fragment payload, which we management, and could be arbitrary knowledge of any format, so we get a pleasant pretty controllable kernel pool primarily based buffer overflow.

I had wished to publish a DoS proof-of-concept, however it’s confirmed extraordinarily tough to set off the bug reliably, making it impractical for widespread use.
While I used to be in a position to get my PoC working utilizing the ultimate piece from ynwarcs, it requires the goal system to be intentionally throttled to account for the low throughput capabilities of python.

I do have a sense there are most likely higher and extra constant methods to make sure packet coalescing takes place, presumably by sending specifically crafted packets designed to hold up the parser even at low quantity….
However, I’m undecided how rather more time I need to spend on this. I’ve already discovered lots, received my PoC working, and written a cool article. So, I believe it’s time to name it a day (technically, a number of weeks, truly), and get again to work.

Anyway, I hope you loved the article and discovered one thing from my analysis!

Leave a Reply

Your email address will not be published. Required fields are marked *