Reuven Plevinsky and Tal Vainshtein
Following the recent hype over the TCP networking vulnerabilities found by Netflix in Linux and FreeBSD, for which Check Point quickly responded and provided protection, we have researched several implementations of Selective ACK mechanism. One of these implementations was OpenBSD.
OpenBSD is a free, open-source operating system based on the Berkeley Software Distribution (BSD), which emphasizes strong security. Similar to Linux, it is a UNIX-like OS. The main difference between them is that Linux was developed by Linus Torvalds as a clone of the Unix system, while the BSD family of Unix systems is derived from the original Unix operating system developed at the AT&T Bell Laboratories.
What is SACK?
SACK stands for Selective Acknowledgment, a feature introduced nearly two decades ago to help TCP performance when retransmitting packets. According to RFC2018 – TCP Selective Acknowledgment Options:
TCP may experience poor performance when multiple packets are lost from one window of data. With the limited information available from cumulative acknowledgments, a TCP sender can only learn about a single lost packet per round trip time. An aggressive sender could choose to retransmit packets early, but such retransmitted segments may have already been successfully received.
A Selective Acknowledgment (SACK) mechanism, combined with a selective repeat retransmission policy, can help to overcome these limitations. The receiving TCP sends back SACK packets to the sender informing the sender of data that has been received. The sender can then retransmit only the missing data segments.
SACK option contains start sequence and end sequence that represent a range of data (a hole) that wasn’t received by the other host.
Implementation of SACK in OpenBSD
The received SACK ranges are kept in a sorted linked list in the TCP state. For inbound packets, the TCP option Selective ACK is handled by the function tcp_sack_option(). The function verifies the received SACK range with the existing TCP state, traverses over the SACK range list to find the correct place for the new range, and adds the entry in a way that the list remains sorted.
The entries of the list are allocated from a pool, which is limited to 32K entries.
The entries are removed from the list in the following cases:
- when the connection is closed
- when an ACK is received with a higher sequence
- from 2msl timer which is set only in TCPS_FIN_WAIT_1 state and above
- and from the TCP retransmit timer whose interval can be up to 64 seconds
Snip of the function tcp_sack_option() that adds a new element to the middle of the sorted SACK ranges list.
We found that according to the implementation, the SACK holes sorted list is bounded in the TCP established state of the connection by (1) the size of the pool (up to 32K entries), and (2) by the TCP retransmit timer (whose interval could be up to 64 seconds). This means that an attacker could manipulate the connection’s window scaling and RTT, forcing the victim to send a large amount of not-ACKed data and increase its retransmission timeout. This in turn enables the attacker to send a large number of SACKs. As the sorted list of SACK holes becomes larger, inserting additional elements becomes more expensive, resulting in higher and higher CPU consumption that may eventually lead to a denial of service. (The complexity of inserting a new element to a sorted list of n entries so it remains sorted is O(n). The complexity of inserting n elements to the list in a sorted way is O(n^2)).
We suggested limiting the SACK holes list to the number of elements that would commonly be encountered in real-life traffic. The owners of OpenBSD were responsive and committed a fix for the issue shortly after we contacted them. The fix limits the number of SACK holes to 128.
The Check Point Firewall protects against this threat [sk156192].