Transport, encryption, authentication protocol
This project is maintained by Luca Fulchir
RSS feedWhile re-implementing the full-security handshake in the rewrite of libFenrir, I came across the old problem of stateful vs stateless handshakes, and what it means for (D)DOS attacks.
After thinking about it for a while, let me introduce you to a slight modification of the full-security handshake to have the best world of bot stateful and stateless handshakes.
We will go through various design, and quickly analyze various solutions from TCP to minimaLT.
We want a connection. Let’s leave aside the “difficult” crypto stuff, and let’s just concentrate on the current solutions.
The famous 3-way handshake.
SYN
– SYN/ACK
– ACK
. Completely stateful.
The server needs to track the IP:PORT
combination to differentiate each client.
This is true not only for the connection, but also for the initial packets exchanged
while establishing the connection.
We can not allow a connection without having done the full 3-way handshake, otherwise
we risk allocating too many resources, and a malicious client might fill our ram/cpu
way too easily. To avoid this problem, we need to make sure that when we have received
an ACK
the client has already sent a SYN
and we actually responded with a SYN/ACK
.
But how to do that? Simple, we keep a table of every combination of SYN/ACK
, ip and port
that we see. Then we can decide if that ACK
has already gone through a SYN
before.
This way we are a bit more sure that the client is not making us allocate resources
that will be unused.
An other parameter is checked on the server side during the handhsake is the sequence number (SEQ
).
The server has to check that the sequence number it sends during the SYN/ACK
and the ACK
it will receive are sequential.
Historically, TCP implementations started handshakes with predictable SEQ
numbers,
so an attacker could spoof a handshake even if he did not actually receive the SYN/ACK
packet.
Today the implementations are careful to choose random parameters to avoid connection spoofing.
The TCP approach sounds nice, until you realize that you can just send a huge load
of SYN
s and the server will have to keep updating and growing the tracking table
we discussed before. We resolved a small part of the problem, but not all of it.
So some protocols like SCTP
introduced the idea of Syncookies,
which has been ported into TCP as an option.
The idea is simple. The server instead of answering with a SYN/ACK
and then
updating its syn table, will send a slightly bigger SYN/ACK
, which contains
the information that prove that the client sent a particular SYN
just before.
Such information usually encodes the SYN
data plus some timestamp and crypto
signatures to make sure that everything actually comes from the server.
TCP syncookies try to encode all of this in the
SEQ
field, a 32-bit value.
There are more ways to implement SYNCOOKIES, but it all boils down to that.
Depending on the information included in the cookies, the handshake can be more or less resilient to DDoSes.
We have thus removed the state from the first exchange of packets.
minimaLT uses a different techniques to avoid having to keep a state while establishing the connection. One of these is the proof of work in the form of a puzzle
This puzzle is easy to solve, but not that easy. This is not so nice for embedded/resource constrained devices, but the puzzle is not that hard, either, and can be easily tuned.
When we receive the first packet we send back the puzzle, and the second
packet we receive must contain a proper solution.
The server will decrypt a part of the second packet that contains the answer.
If the given answer and the decrypted answer match,
we can allocate the connection.
Each puzzle must be generated based on the IP of the server and client, otherwise a botnet might cache the solutions to the puzzles.
…But I could not understand if minimaLT “tunnel ID” is included in the requests (does not seem, based on the paper), nor what is the protection against a client solving a puzzle and then just repeating the answer with a different tunnel id, thus establishing multiple connections while solving only one puzzle. There is a timeframe for the validity of the puzzle, but that is present in a syncookie, too.
Essentialy, this solution seems like a syncookie that requires a client to do some computation.
In Fenrir we use syncookies in the “Full-security” handshake.
But this was only for the first packet exchange. After receiving
the second packet, and exchanging keys, we kept a state to remember the exchanged key.
The rationale for doing this was that the next packets were… packed with information, and keeping a few bytes off the network might have helped.
We are moving to a second syncookie exchange during the key-exchange, but with additional security against DDoSes.
The only unresolved problem until now is that once a client has a syncookie (or solved a puzzle)
it can send the same answer again and again to start as many connection as possible in a
little timeframe (2-5seconds usually).
This might not seem much, but botnets are big, and attackers will take every advantage they can get.
The obvious solution is to reserve a connection ID during the handshake, but only for a small timeframe. But if we do that we would be keeping a state, and fall in the same problem that TCP had at the beginning! Is there a way to avoid allocating more resources per-connection?
It turns out, there is. And is pretty efficient, too.
We will reserve a connection ID for every handshake that reaches the key exchange phase, and put it
in the syncookie. The problem is, how do we avoid that an other connection takes the same
conection ID?
We simply keep a lone unsigned integer as the next free connection ID. When we need a new ID
we increase it and use the old value in the packet. The next request will get a different ID.
The syncookie is encrypted and authenticated, so the clients can not modify it.
Now the syncookie is tied to a specific connection ID. If you send it again you will effectively try to establish an already-established connection, and that will not be accepted.
Note that we also managed to keep the handshake completely independent from information like IP address and UDP port.
Although this solution sounds really simple the only reason this has not been done before is the same we could not have a lot of Fenrir features – retrocompatibility.
TCP syncookies are an example of this. The implementation might have been more robust if there had
been space to include the IP:Port
tuple, but there was no space in the SEQ
field. We might have had
a solution with the [TCPCT](https://tools.ietf.org/html/rfc6013)
extension, but it was both
implemented and then removed from the Linux kernel due to complexity and performance reasons.
The difference between the TCPCT
syncookie and Fenrir is mainly that we also exchange keys, so
we include the (encrypted) key in the syncookie, to avoid storing any state at all. TCPCT
also did not address concerns like amplification attacks, since the SYN/ACK
answer with the
syncookie seems to be up to twice the size of the client SYN
.
Using just a single integer to track the connection id means that as long as we don’t handle more than 2^32 handshakes in the space of a timeout (2-5seconds) we will never repeat the same ID again.
We could also use a RNG to generate a non-repeating sequence of numbers. Nothing different.
None of these solutions lets you pack the connection IDs together, although there are different ways, like tracking blocks of connection ids so that you can consider them “free” after a certain time. That, however, is out of scope for now and the explanation probably does not fit in two lines of a blog post.
-Luker