CATEGORIES

Game On – Finding vulnerabilities in Valve’s “Steam Sockets”

December 10, 2020

Research by: Eyal Itkin

Overview

The beautiful thing about video games is that there’s something for everyone. You can play as a 19-year-old Canadian redhead trying to climb a difficult mountain; or as an insurance inspector sent to decipher the fate of a doomed merchant ship; or as an amnesiac detective juggling a murder case, a workers’ strike and a toxic ex; or as a hunter-gatherer navigating a post-apocalyptic Earth infested with killer dinosaur robots — the possibilities are truly endless, and when this world of opportunity meets our humble profession of vulnerability research, it’s a recipe for excitement. It’s not every day we get to look behind the curtain and see how the experience gets made, and how it can be broken.

This is why our attention was caught by two excellent presentations — Jack Baker’s Finding and Exploiting Bugs in Multiplayer Game Engines (DEF CON 28) and Amat Cama’s  Remote exploitation of the Valve Source game engine (Insomnihack 2017).  These talks both provide that exact glimpse into the inner workings of game engines and the way that faults in their implementation can be found and exploited. They also serve as prior art and as the inspiration for the research we present here, where we turned our eyes to a major networking library that underlies a sizable chunk of online gaming – Valve’s Game Networking Sockets.

During our research we found several vulnerabilities in the implementation of the Game Networking Sockets library, which enable a variety of possible attacks. For example, when playing against an online opponent, an attacker can remotely crash the opponent’s game client to force a win; under some conditions, they can even perform a “nuclear rage quit” and crash the Valve game server, making sure that no one gets to play. When playing a game developed by 3rd-party developers, one can do even better and remotely take over the game server to execute arbitrary code. Once in control of the server, the same vulnerability could be used again, this time to take over all of the connected players.

In this research we found 4 different vulnerabilities (CVE-2020-6016 through CVE-2020-6019). While we could drill down into the deep technical details of each, we find it would be more educational to focus on the most interesting one — CVE-2020-6016.  The exploitation phase of this vulnerability was not a standard affair and required us to refresh our knowledge of esoteric subjects such as the finer details of the C++ standard and the implementation of the GNU C Compiler (GCC). At one crucial moment, when the attack plan seemed lost, we were able to ride in on a clever hack used by C++ in order to enable a more ergonomic use of iterators.

Below we present the details of how we discovered and exploited CVE-2020-6016.

Introducing Valve’s Game Networking Sockets

Valve’s Game Networking Sockets (GNS), also known as “Steam Sockets”, is the core networking library used in a wide variety of games — including Valve’s own titles (such as CS:GO, Dota2, Team Fortress 2, …) and several third-party titles (such as Bungie’s Destiny 2). The library was open-sourced in 2018, and a year later added to the Steamworks SDK, thus making Valve’s latency-reducing, DoS-protecting network relay infrastructure available for any third-party game developers that care to use it. Epic Games, who you may have heard of due to their 2017 sandbox survival game Fortnite (or their 1992 platform game Jill of the Jungle), have a detailed guide for developers on how to use Steam Sockets in conjunction with Epic’s Unreal game engine.

The library supports communication both in peer-to-peer (P2P) mode (based on WebRTC, a web framework for real-time communication) and in a centralized client-server mode. Once the encrypted connection is established, parties can exchange messages — either short-burst, “unreliable” messages that are sent and forgotten, or more elaborate “reliable” messages with a mechanism in place to detect and correct for lost messages, for a price of increased overhead (similarly to UDP vs. TCP). By “messages” we don’t mean just winners gloating and losers accusing winners of cheating, but also information needed for the infrastructure to function such as statistics and ping measurements.

To establish such a secure communication channel, parties use Valve’s proprietary handshake protocol:

Figure 1: Schematic overview of GNS handshake protocol.

The messages in the handshake protocol make use of Google’s protobuf library, thus greatly reducing the risk of any parse-related vulnerability when handling these complex messages.

Similarly to TLS, during this protocol both the client and the server verify each other’s identity by providing signed certificates, and both parties announce the cryptographic schemes they support. Again similarly to TLS, GNS uses asymmetric cryptography to enable the two parties to negotiate a shared symmetric encryption key; but unlike TLS, GNS doesn’t support a myriad different crypto algorithms, and instead specifically uses Elliptic-Curve Diffie-Hellman Key Exchange to negotiate a shared secret, from which each client can derive a shared AES-256 key for encrypting messages.

(If you’re staring at the above paragraph and asking yourself “but why is this so complicated? Isn’t one scheme and one key enough?” rest easy knowing that yes, this is pretty standard; the short of it is that asymmetric cryptography has nicer security features, but symmetric cryptography has better run-time performance, so this hybrid is used to combine the advantages of both.)

Once the negotiation is complete and both parties have agreed on a shared key, the key is used to encrypt all further communication and the channel is ready for use.

Fragmentation, Reassembly and Negative Offsets

Recall that earlier we mentioned that GNS supports both “reliable” messages with an acknowledgement mechanism, more akin to TCP streams, and “unreliable” messages which clients just send and hope for the best, which are more like UDP datagrams. Having to handle a Maximal-Transmission-Unit (MTU) of 1300 bytes per message sent over the wire, each logical message is split up into several reliable / unreliable segments. These segments will eventually be reassembled by the library into a complete “logical” message that will be passed on to the game engine itself.

When we saw this fragmentation reassembly mechanism for the first time, we knew this is the part of the program we would focus our research on. Reassembling multiple fragments, each with its own length and offset, is a notoriously difficult task, and various attempts at solution implementations have led to multiple high impact vulnerabilities over the years. Even in our previous research on Apache Guacamole one of the discovered vulnerabilities, CVE-2020-9498, involved a dangling pointer in the reassembly process of RDP channel messages.

When skimming through the code, we found the following interesting mismatch. The segment offset is initially read into an unsigned variable as can be seen in the figure below:

Figure 2: Parsing the segment offset as an unsigned 32-bit value.

However, the same nOffset variable is later passed on to SNP_ReceiveUnreliableSegment, which treats it as a signed 32-bit value:

Figure 3: Reassembly function treating the segment offset as a signed value.

If we’re sending a bunch of fragments to a GNS server for reassembly, it only makes sense that we get to tell the server which fragment goes where. So far, so good; but if we pick a large enough value for that “where” (nOffset), when parsed by CSteamNetworkConnectionBase, the value will silently overflow and be interpreted as a negative number. The fragment will then be “reassembled” straight into the server’s process memory, well before the buffer assigned to our message begins.

We figured that it shouldn’t be so straightforward, and that the server probably applies some sanity checks to incoming segments. To better understand which checks apply and how to bypass them, we looked into the members of the segment structure to understand what information is sent over to the server. They are as follows:

  • nMsgNum – Logical message number to be reassembled.
  • nOffset – The offset of the current segment in the fragmented message.
  • cbSegmentSize – The size (in bytes) of the current segment.
  • bLastSegmentInMessage – Is this the last segment?

The first two fields create a unique “key” for each segment, and this key is used for storing the segments in a dedicated hash table. The segments are accumulated in this hash table until it is decided that all of the pieces are in place and the message can be reassembled.

Figure 4: Retrieving a new / used data struct for the key associated with the current segment.

Once the key is checked to be unique, the data struct is initialized, and the content of the segment is copied into it.

Figure 5: The segment’s data struct is populated with the segment’s content and attributes.

After each segment is properly added to the hash table, a list of segments for the current message number is fetched from the table and scanned to check for missing segments (referred to as “gaps”).

Figure 6: A do-while loop that checks for gaps in the list of seen segments.

So, as it turns out, the sanity checks for incoming segments are mainly there to ensure that a message isn’t assembled from an incomplete list of segments. The attack plan seemed straightforward:

  1. Send a segment of a given length (0x400 bytes), a negative offset (-0x180) and marked as the last segment of the message.
  2. The calculated cbMessageSize for that message will be 0x400 – 0x180 = 0x280.
  3. All of the checks will pass, and later on our segment will be copied to offset -0x180 into a heap buffer of size 0x280, thus achieving a Heap-Based Buffer Underflow.

But, champagne glasses in hand and kazoos in mouth, we were forced to cancel our celebrations when we ran into the following piece of code:

Figure 7: The key used for querying the hash table uses an m_nOffset of 0.

When scanning the hash table for segments to reassemble, the code isn’t looking for segments with a negative offset. The table is queried starting with offset 0 and going up. For whoever wrote the code, this was just the natural thing to do, but as for us, our plan had just run head-first into a stone wall.

Back at the drawing board, we tried to see how the attack might be salvaged. We noted that while the strategy failed, it still managed to induce the program into an undefined state: the reassembly verification loop is going to execute a do-while loop on an empty iterator, looking for a phantom segment that the code can’t find. It will keep treating the output of table queries as valid segment data, even when a query finally returns an end() element. We started thinking how this unusual situation could be used to our advantage.

A dive into C++ iterators and the end() element

Iterators were not a part of the original language design of C++ and were added via the Standard Template Library (STL). Since there is no built-in integration with language keywords, in order to make the notation for invoking iterators at least somewhat ergonomic, C++ iterators use the language’s operator overloading feature as a bootstrap. Specifically, by implementing comparison and increment operators, iterator logic can be invoked using a fairly idiomatic for loop. See for example this basic C++ code that prints the elements of a vector:

#include<iostream>
#include<iterator> // for iterators
#include<vector> // for vectors
using namespace std;

int main()
{
    vector<int> ar = { 1, 2, 3, 4, 5 };

    // Declaring iterator to a vector
    vector<int>::iterator ptr;     

    // Displaying vector elements using begin() and end()
    cout << "The vector elements are : ";
    for (ptr = ar.begin(); ptr < ar.end(); ptr++)
        cout << *ptr << " ";

    return 0;    
}

Compare to equivalent Rust, which has a for keyword that can interface with iterators directly (similarly to Python):

fn main() {
    let v : Vec<u32> = (1..=5).collect();
    print!("The vector elements are: ");
    for num in v {
            print!("{} ", num);
    }
}

The C++ example above is taken from here. To make the magic work, behind the scenes C++ is leveraging pointer comparisons. When the C++ reference says that the end() element represents the “logical upper bound of the vector”, this isn’t a metaphor or a figure of speech; according to cppreference std::end() literally “… returns an iterator to the end (i.e. the element after the last element) of the given container c or array array.” This definition is even illustrated in the reference, using the following Figure:

Figure 8: std::end() pointing after the last element, as taken from cppreference.

So, while the end() element serves as a special bookmark, under the hood it isn’t implemented as a magic constant value, a field in a larger struct, or anything of the sort. It is an actual pointer to the end of our container, or “past the last element” to be precise; an off-by-one error given corporeal form.

This may make a certain amount of visual sense when iterating over vectors, but the visual intuition breaks down when iterating over more complex data structures. Behind the scenes, std::map as used in the GNS code is implemented using a balanced (Red-Black), sorted, binary tree. The tree stores key-value pairs, which are the respectable key and value that were inserted into the hash table. While the “end” of a vector might make some visual sense as seen above, the correct placement for the “end” of a tree is less of a technical question and more of a zen koan. We examined GCC’s implementation and found that it uses the balanced tree’s root (i.e. its median value) as the highest-valued pointer, past which one can find the end() of the tree — and far be it from us to start an argument about that.

Shaping the memory and retrying our original attack, now using end()

After this short lesson in C++ internals, it is time to get back to our exploit attempt. Having already chosen GCC instead of Visual Studio as our compiler, and for the sake of simplicity, from now on we will assume that our target game server is a 64-bit Linux server.

As the end() element isn’t really a valid element with memory that was allocated for it, when referencing the fields inside of it, we actually reference nearby memory:

Figure 9: The struct fields near the hash map, will be “used” as end()’s fields.

After a short debugging session, we came up with the following memory layout:

Figure 10: Memory layout when looking through std::end() as a key-value segment pair.

By consulting this layout, we were able to obtain a list of requirements for our desired end() “element” to have the appearance of a valid segment to the untrained eye of a naive for loop:

  • m_nMsgNum – needs to be unique. Represented by the number of pairs in the hash map.
  • m_nOffset – needs to be a small negative number (-0x180). Represented by the lower 4 bytes of the reliable stream position.
  • m_cbSegSize – needs to be a decent value, bigger than the offset (0x400). Represented by the lower 4 bytes of the highest seen message number.
  • m_bLast – needs to be “true” (i.e. not 0x0). Represented by the top 4 bytes of the highest seen message number.

The toughest part is turning m_nOffset into a negative value. This part requires us to send almost 4GB of reliable data, until the stream position will be at 0xFFFFFE80, which represents the value -0x180 in 2’s complement.

After a long exploitation session, which started with each (failed) attempt taking us 80 minutes, we gradually improved the send rate and fixed errors as they turned up. By the end, we were able to cut down the attack’s running time to fewer than 15 minutes and reliably craft the fake segment with the desired values, thus triggering the Heap-Based Buffer Underflow:

==10656==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004218 at pc 0x7f73b639f733 bp 0x7f73b10fb6a0 sp 0x7f73b10fae48
READ of size 1024 at 0x629000004218 thread T1
    #0 0x7f73b639f732  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79732)
    #1 0x7f73b5fd4983 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
    #2 0x7f73b5fd4983 in SteamNetworkingSocketsLib::CSteamNetworkConnectionBase::SNP_ReceiveUnreliableSegment(long long, int, void const*, int, bool, long long) ../src/steamnetworkingsockets/clientlib/steamnetworkingsockets_snp.cpp:2477
    ...

Surprise, we even free() our invalid segment

To our surprise, it turned out that the code flow performs one more step after reassembling the user message: it erases the now unneeded segments.

Figure 11: Erasing the used segments from the hash table after the reassembly is over.

As we recall, at this point itMsgStart is our crafted end() element, which isn’t a valid hash table entry. Still, this do-while loop will happily erase our non-existing element from the table, effectively free()ing the memory of m_mapUnreliableSegments “back” to the heap.

In the standard case, when using glibc’s heap implementation, we could also craft the memory value stored right before this field. This important hash table will now be stored in the heap, allowing us to fully control it by sending a reliable segment that will use the same heap allocation. Once in control of the hash table’s tree, achieving a Write-What-Where exploit primitive is purely technical.

Sadly for us, in Valve’s case, this invalid free() operation will just cause the game server to abort. At least in the game that we’ve tested (CS:GO), the heap implementation that is used is that of gperftools, Google’s old TCMalloc. This implementation will correctly catch the invalid call to free(), detect that our buffer isn’t originally a heap buffer, and crash the program.

Disclosure Timeline

  • 02 September 2020 – Vulnerabilities were disclosed to Valve.
  • 02 September 2020 – Valve responded and requested more information.
  • 03 September 2020 – Valve acknowledged the vulnerabilities, and notified relevant partners about it.
  • 03 September 2020 – Valve patched the vulnerabilities as part of version v1.2.0.
  • 04 September 2020 – Valve sent us the patches and asked for us to verify them.
  • 04 September 2020 – We approved Valve’s fixes.
  • 17 September 2020 – Binary updates were shipped to Valve’s game clients and servers.
  • 10 December 2020 – Full public disclosure, some partners have yet to patch their games.

Important: We encourage all gamers of 3rd party games (non-Valve games) to check that their game clients received an update after September 4th 2020, as this is the date in which the library was patched by Valve.

Conclusion

There is that special moment in vulnerability research when there’s, so to speak, “blood in the water”. The researcher grins wryly and mumbles to themselves, “if this code were secure, that part wouldn’t be in here”. From there, achieving code execution is an exercise of tedious algebra; pointers are lined up and contorted inputs are crafted. More often than not, the program yields — eventually. The code can get lucky once via some happy accident, such as checking only for segments with positive offsets when collecting segments, but it can only get lucky so many times. Eventually something’s gotta give.

In this research, we were able to find four new vulnerabilities in Valve’s game networking library by taking advantage of two of C++’s language quirks — silent conversion between signed and unsigned integers, and under-the-hood implementation of the “iteration stop” element as a pointer. The first quirk has by now become a fact of life that we’ve all silently grown resigned to, but the second is alarming and brings fresh to mind the phrase “what could possibly go wrong”.

With the constant deluge of new vulnerability disclosures, it seems that insecure code is the rule, rather than the exception. Even so, a huge difference is made by vendors and how seriously they treat these issues. We would like to express our deep appreciation for Valve, who replied to us in detail and produced a working patch within 2 days.

POPULAR POSTS

BLOGS AND PUBLICATIONS

  • Check Point Research Publications
  • Global Cyber Attack Reports
  • Threat Research
February 17, 2020

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

  • Check Point Research Publications
August 11, 2017

“The Next WannaCry” Vulnerability is Here

  • Check Point Research Publications
January 11, 2018

‘RubyMiner’ Cryptominer Affects 30% of WW Networks