PXE Dust: Finding a Vulnerability in Windows Servers Deployment Services

March 6, 2019

Research By: Omer Gull

 

Introduction

Many large organizations use Windows Deployment Services (WDS) to install customized operating systems on new machines in the network. The Windows Deployment Services is usually, by its nature, accessible to anyone connected via an LAN port and provides the relevant software. They determine the Operating System as well as the accompanying programs and services for each new network element.

With this amount of accessibility, it is natural to consider what would be the ramifications if a malicious actor was able to breach this server and modify it to control the content of every new computer, and equip it with his own proprietary malware.

In this report, we discuss our research on the WDS infrastructure and our attempts to exploit its vulnerabilities.

 

Windows Deployment Services

WDS is Microsoft’s solution for network-based installation of Windows operating systems. WDS uses disk imaging, in particular the Windows Imaging Format (WIM), and is included as a Server Role in all 32-bit and 64-bit versions of Windows Server since 2008.

WDS is a complex system and we are far from understanding its entire operation. For the scope of this research, though, we analyzed how WDS behaved during a new installation as this pre-authentication negotiation appears to be a promising attack vector.

Before delivering the full blown Windows image, WDS must provide some network booting strategy. For that purpose it uses a PXE (Preboot eXecution Environment ) server. PXE is a standard created by Intel that establishes a common set of pre-boot services within the boot firmware with the end goal being to enable a client perform a network boot and receive a Network Boot Program (NBP) from a network boot server. To transfer an NBP, a PXE server uses the Trivial File Transfer Protocol (TFTP).

TFTP is a simple file transfer protocol implemented on top of the UDP/IP protocols using the well-known port number 69. Due to its simple design, TFTP can be easily implemented by small footprint code. Therefore, it is the protocol of choice for the initial stages of any network booting strategy such as BOOTP, BSDP and PXE.  However, TFTP lacks most of the advanced features offered by more robust file transfer protocols. For example, it cannot list, delete, or rename files or directories and it has no user authentication. Instead, the protocol only supports reading and writing.  Today, TFTP is rarely used for Internet transfers and its main use is from within LANs. All of this makes it a promising protocol for our purposes.

 

Fuzzing

Our first task then was to create a TFTP dumb fuzzer using Boofuzz.

Boofuzz, the successor to the Sulley fuzzing framework, offers an easy setup with great documentation and thanks to the simple Boofuzz syntax, translating the relevant RFCs to a fuzzing script was a fairly straightforward task.

Tftp.py defines the protocol semantics and fields to be fuzzed. As this fuzzer is very generic, this code could be re-targeted to fuzz any other TFTP implementation, and we encourage other researchers to do so.

While the fuzzer was running, we manually reversed the server implemented in wdstftp.dll. as these two methodologies often complement each other.

 

wdstftp.dll

When we reversed the dll, we started to review the file read mechanisms implemented in CTftp::ParseRequest. As mentioned earlier, the protocol does not implement too many complex features so this surface is a reasonable place to start.

TFTP Read Requests (RRQ) are handled in the CTftpPacket::ParseRequest. After validating the requested file that exists within the PXE base directory, it is read into a CTptReadFile::CacheBlock structure.

Figure 1: CacheBlock Struct

 

These CacheBlocks are managed in a linked list.

ReadFiles are performed asynchronously by the server with CTptReadFile::_IOCompletionCallback assigned as their callback function.

Figure 2: ReadFile Callback

 

 

So far so good. However, we then noticed a rather strange behavior. It seems that the CacheBlocks linked list is limited in size to….. two nodes?

Figure 3: maxCacheBlocks = 2

 

In the figure below, we can see that when the number of cache blocks exceeds 2, the tail is deleted.

Figure 4: free tail

 

With our new-found knowledge of the TFTP protocol, and the use of the blksize and windowsize options, we should be able to craft a request that will populate more than two cache blocks before receiving an acknowledgement packet. If all goes well and the timing is right, a cacheblock is freed before it is used by the callback function CTptReadFile::_IOCompletionCallback.

Ready, set (page heap on), go.

Figure 5: Crash in windbg

 

We can see that RAX now points to freed memory(!), effectively giving us a Use-After-Free vulnerability in Windows Deployment Services.

This bug seems fairly trivial, so why did our fuzzer miss it?

The answer is that the Sulley framework fuzzes one field at a time, and our requested reads were never this long. This is a good lesson for future fuzzers: on the one hand you want your packets to be valid enough so they won’t be rejected by the parser. On the other hand, if you make it too valid, you’ll miss potential bugs.

 

Exploitation Methods

As this is a remotely triggerable, non-authenticated, high privileged Windows server bug, we can definitely mark this as a critical vulnerability.

Usually, when exploiting a Use-After-Free (UAF) bug, we try to allocate a different object (of similar size) or a similar object in a different state and cause some sort of confusion between them.

Looking for allocation primitives that will allow us to allocate an object that will fit in the freed area, we examined the process heap.

It turns out that WDS actually uses several heaps, and our heap was shared between wdstftp.dll, wdssrv.dll and wdsmc.dll.

While wdstftp.dll allowed for some quite flexible allocations such as the one in TFTP Error packets, they were all ASCII that was converted to Unicode.

Figure 6: Unicode POC

 

This is a nice primitive for a POC, but forging a functional payload is going to take a bit more as we’ll need to de-reference some pointers.

The next allocation primitive mechanism candidate was wdssrv.dll. It exposed an RPC interface that provides the ability to remotely invoke services provided by the WDS Server.

The protocol’s binary nature and generous documentation seemed promising.

Looking for any attacker controllable allocations, we reached CRpcHandler::OnRecvRequest.

As its name suggests, this RPC handler does the initial parsing of RPC requests before inserting them into a queue to be handled by future workers. Sadly, these workers will not share our heap so we are limited to the handler itself.

To use the freed memory, we need to use the same Heap bucket (size 0x5c-0x78).

Our only allocation in the handler of somewhat controlled size is the CMemoryBuffer::Initialize.

Figure 7: RPC allocation primitive

The following script allows us to perform a successful allocation that fits in our target bucket.

Figure 8: RPC POC

 

However, apparently some of the struct, and most importantly the CacheBlocks callbackCtx pointers, were left untouched (and uninitialized).

Figure 9: Memory layout using RPC POC

 

If we enlarge our RPC payload, we will be allocated to the wrong bucket due to size calculations performed by CMemoryBuffer::Initialize.

Our next thought was to see if we could escalate things using the CacheBlocks field we are able to alter.

Unfortunately, further reversing revealed that CTftpReader hadn’t really used them. Too bad…

For our final attempt in exploiting this bug, we tried a different approach. We tried to repurpose the bug and use it to gain a significant information disclosure from the server. In a standard scenario, the IOCompletionCallback is called on a CacheBlock that was filled with the file’s content, and sends its content back to us. By confusing the server and planting a “fresh” CacheBlock that wasn’t yet filled with the file’s content, we hoped that the server would send us the uninitialized memory of our new CacheBlock, thus causing a major information leak.

After numerous attempts to win this race, we always received back partial file content instead of uninitialized memory. We suspect that a busy server that handles a large amount of file read/write requests would read our TFTP image more slowly, thus improving our chance to win this race condition.

 

Conclusion

WDS is a popular Windows server service that is widely used for the installation of image distribution. Its underlying PXE server had a critical remotely triggered use after-free-bug that can be potentially exploited by an unauthenticated attacker.

We responsibly disclosed the bug we found to Microsoft who assigned it as CVE-2018-8476 with critical status and described it as potentially leading to a code execution against all Windows Servers since 2008 SP2.

Due to time constraints, we have not continued to pursue the exploitation of the bug though both Check Point Research and Microsoft believe that the bug is likely exploitable.

The Check Point IPS blade provides protections against this threat:
Microsoft Windows Deployment Services TFTP Server Code Execution (CVE-2018-8476)