Reserved Handshakes

While re-implementing the full-security handshake in the rewrite of libFenrir, I came across the old problem of stateful vs stateless handshakes, and what it means for (D)DOS attacks.

After thinking about it for a while, let me introduce you to a slight modification of the full-security handshake to have the best world of bot stateful and stateless handshakes.

We will go through various design, and quickly analyze various solutions from TCP to minimaLT.

Handshakes

We want a connection. Let’s leave aside the “difficult” crypto stuff, and let’s just concentrate on the current solutions.

TCP

The famous 3-way handshake. SYN – SYN/ACK – ACK. Completely stateful.

The server needs to track the IP:PORT combination to differentiate each client. This is true not only for the connection, but also for the initial packets exchanged while establishing the connection.

We can not allow a connection without having done the full 3-way handshake, otherwise we risk allocating too many resources, and a malicious client might fill our ram/cpu way too easily. To avoid this problem, we need to make sure that when we have received an ACK the client has already sent a SYN and we actually responded with a SYN/ACK.

But how to do that? Simple, we keep a table of every combination of SYN/ACK, ip and port that we see. Then we can decide if that ACK has already gone through a SYN before. This way we are a bit more sure that the client is not making us allocate resources that will be unused.

An other parameter is checked on the server side during the handhsake is the sequence number (SEQ).
The server has to check that the sequence number it sends during the SYN/ACK and the ACK it will receive are sequential.
Historically, TCP implementations started handshakes with predictable SEQ numbers, so an attacker could spoof a handshake even if he did not actually receive the SYN/ACK packet. Today the implementations are careful to choose random parameters to avoid connection spoofing.

TCP + Syncookie

The TCP approach sounds nice, until you realize that you can just send a huge load of SYNs and the server will have to keep updating and growing the tracking table we discussed before. We resolved a small part of the problem, but not all of it.

So some protocols like SCTP introduced the idea of Syncookies, which has been ported into TCP as an option.

The idea is simple. The server instead of answering with a SYN/ACK and then updating its syn table, will send a slightly bigger SYN/ACK, which contains the information that prove that the client sent a particular SYN just before. Such information usually encodes the SYN data plus some timestamp and crypto signatures to make sure that everything actually comes from the server.

TCP syncookies try to encode all of this in the SEQ field, a 32-bit value.

There are more ways to implement SYNCOOKIES, but it all boils down to that.

Depending on the information included in the cookies, the handshake can be more or less resilient to DDoSes.

We have thus removed the state from the first exchange of packets.

MinimaLT: proof of work

minimaLT uses a different techniques to avoid having to keep a state while establishing the connection. One of these is the proof of work in the form of a puzzle

This puzzle is easy to solve, but not that easy. This is not so nice for embedded/resource constrained devices, but the puzzle is not that hard, either, and can be easily tuned.

When we receive the first packet we send back the puzzle, and the second packet we receive must contain a proper solution.
The server will decrypt a part of the second packet that contains the answer. If the given answer and the decrypted answer match, we can allocate the connection.

Each puzzle must be generated based on the IP of the server and client, otherwise a botnet might cache the solutions to the puzzles.

…But I could not understand if minimaLT “tunnel ID” is included in the requests (does not seem, based on the paper), nor what is the protection against a client solving a puzzle and then just repeating the answer with a different tunnel id, thus establishing multiple connections while solving only one puzzle. There is a timeframe for the validity of the puzzle, but that is present in a syncookie, too.

Essentialy, this solution seems like a syncookie that requires a client to do some computation.

Reserved Handshakes

In Fenrir we use syncookies in the “Full-security” handshake.
But this was only for the first packet exchange. After receiving the second packet, and exchanging keys, we kept a state to remember the exchanged key.

The rationale for doing this was that the next packets were… packed with information, and keeping a few bytes off the network might have helped.

We are moving to a second syncookie exchange during the key-exchange, but with additional security against DDoSes.

The only unresolved problem until now is that once a client has a syncookie (or solved a puzzle) it can send the same answer again and again to start as many connection as possible in a little timeframe (2-5seconds usually).
This might not seem much, but botnets are big, and attackers will take every advantage they can get.

The obvious solution is to reserve a connection ID during the handshake, but only for a small timeframe. But if we do that we would be keeping a state, and fall in the same problem that TCP had at the beginning! Is there a way to avoid allocating more resources per-connection?

It turns out, there is. And is pretty efficient, too.

We will reserve a connection ID for every handshake that reaches the key exchange phase, and put it in the syncookie. The problem is, how do we avoid that an other connection takes the same conection ID?
We simply keep a lone unsigned integer as the next free connection ID. When we need a new ID we increase it and use the old value in the packet. The next request will get a different ID.

The syncookie is encrypted and authenticated, so the clients can not modify it.

Conclusion

Now the syncookie is tied to a specific connection ID. If you send it again you will effectively try to establish an already-established connection, and that will not be accepted.

Note that we also managed to keep the handshake completely independent from information like IP address and UDP port.

Although this solution sounds really simple the only reason this has not been done before is the same we could not have a lot of Fenrir features – retrocompatibility.

TCP syncookies are an example of this. The implementation might have been more robust if there had been space to include the IP:Port tuple, but there was no space in the SEQ field. We might have had a solution with the [TCPCT](https://tools.ietf.org/html/rfc6013) extension, but it was both implemented and then removed from the Linux kernel due to complexity and performance reasons.

The difference between the TCPCT syncookie and Fenrir is mainly that we also exchange keys, so we include the (encrypted) key in the syncookie, to avoid storing any state at all. TCPCT also did not address concerns like amplification attacks, since the SYN/ACK answer with the syncookie seems to be up to twice the size of the client SYN.

Tips & Tricks

Using just a single integer to track the connection id means that as long as we don’t handle more than 2^32 handshakes in the space of a timeout (2-5seconds) we will never repeat the same ID again.

We could also use a RNG to generate a non-repeating sequence of numbers. Nothing different.

None of these solutions lets you pack the connection IDs together, although there are different ways, like tracking blocks of connection ids so that you can consider them “free” after a certain time. That, however, is out of scope for now and the explanation probably does not fit in two lines of a blog post.

-Luker