% !TeX root = Fenrir.tex \xkcdchapter[0.6]{The Fenrir Project}{standards}{The good thing about standards is that\\there are many to choose from...} \label{Fenrir Project} \lettrine{S}{ince} the current protocol stacks limit the developers in the application development, and there is not a sensible authentication protocol that supports authentication and authorization, this chapter is dedicated to the definition of a new one that encompasses as many as the possible requirements \ref{Requirements} as possible. \section{The OSI Stack} The first problem is to understand where we want to put our protocol in the OSI stack. As we have seen in \ref{OSI stack}, an incorrect placement in the stack can result in overly complex layering that push the burden of making everything work on the developer. As a remainder, the OSI model is composed of 7 layers: \begin{center} \begin{tabularx}{0.4\textwidth}{| c | X | X |} \hline & Layer & Example \\ \hline \hline 7&Application & \\ \hline 6&Presentation & JSON\\ \hline 5&Session & RPC\\ \hline 4&Transport & TCP/UDP\\ \hline 3&Network & IP\\ \hline 2&Data Link & Ethernet/MAC \\ \hline 1&Physical & 100BASE-T\\ \hline \end{tabularx} \end{center} The TCP/IP model however does not follow the model strictly, as the layers are not always independent one of another. If we want to portray what actually happens in a common web connection, we end up with something like this, where multiple layers limit the upper layers capabilities, and many features (like session management) are reimplemented multiple times, as seen in Fig. \ref{App stack} \begin{figure}[h] \begin{center} \begin{tabularx}{0.4\textwidth}{| X | X | X |} \hline \multicolumn{3}{|c|}{Application \{7\}}\\ \hline OAuth \{5\}& cookie \{5\} & HTML \{6\}\\ \hline \multicolumn{3}{|c|}{HTTP \{6\}} \\ \hline \multicolumn{3}{|c|}{TLS \{4,5\}} \\ \hline \multicolumn{3}{|c|}{TCP \{4,5\}} \\ \hline \multicolumn{3}{|c|}{IP \{3\}}\\ \hline \multicolumn{3}{|c|}{Ethernet \{2\}}\\ \hline \end{tabularx} \caption{Possible application stack with possible OSI Layers} \label{App stack} \end{center} \end{figure} The reason for which OAuth, cookies and HTML are on the same layer is that they all need direct access to the HTTP headers, and are all dependent one from the other. The application then has to keep all their interaction together. Keeping in mind what we are trying to avoid, let's analyse our choices: \subsection{High level protocol} By "high level" we mean anything above HTTP. If we try to fix the situation with another high level protocol (like OpenID-Connect is trying to do) we gain ease of implementation due to the abstractions of the lower level protocols and their security properties, but we are also limited by them. Efficiency is also greatly impacted, and we might have to rely on other protocols to avoid the limitations of the protocol stack we choose (like OpenID-Connect has to rely on webfinger to avoid OAuth non-interoperability). This means that our quest for simplicity will lead to a contradiction, as the more protocols need to be used, the more the attack surface increments, while we need to handle all the protocol interactions and limitations. As stated, this is the road chosen by the OAuth and OpenID-Connect authors, so there is little to gain from choosing it again. \subsection{Low level protocol} Since we can not go much high in the OSI model, we need to understand how low we should go, and what will change for our protocol. Going lower or at the IP layer will break the very first requirement, the compatibility with the existing infrastructure. The same could be said for working at the TCP/UDP layer, as too many firewalls would require reconfiguration. Fortunately in this case we could think about a transitional period where the protocol is tunnelled via UDP, and when enough support has been gained, we might switch directly above the IP layer. We have upper and lower bounds, let's analyse the case-by-case problems: \begin{itemize} \item \textit{rewrite the HTTP layer}: we are still fairly high in the protocol stack, still limited by TCP, we still have duplicated session identifications (TCP,TLS), but we should be able to implement the federated support. However we require a new TCP port to be assigned, to not interfere with existing protocols, but making us interfere with strict firewalls. \item \textit{rewrite the TLS layer}: same as above, but now we also have to handle \textbf{secrecy} and \textbf{authenticity}. We still require a new port assignment. \item \textit{rewrite the TCP/UDP layer}: we get complete liberty of communication features, and might be able to implement our full list of requirements \ref{Requirements}. We do not require new port assignments, and if we use UDP tunnelling we still retain compatibility with the existing firewalls. This is the road chosen by QUIC and minimaLT. \end{itemize} At first glance this is much more complex, as we need to reimplement everything from TCP up to OAuth in a single solution, but we can gain a lot of features from experimental protocols, and add the federation and authorization support, which is found virtually nowhere. The overall stack becomes much shorter, and there can be less feature duplication (like the case of session identificators). To summarize what we can achive here, we can gain in: \begin{itemize} \item \textbf{Robustness}: we can design against amplification attacks, and avoid design problems like cleartext TCP RST which will drop any connection. \item \textbf{Efficiency}: less layers mean less encapsulation, and less headers before the user data. \item \textbf{Federation}: we can finally design the protocol so that authentication on multiple domains is the same, by including domain discovery techniques. \item \textbf{transport flexibility}: \textbf{multiplexing} support, and choosing every stream transport features (\textbf{reliability, ordered delivery} etc..)will increase application features while simplifying the application development. \item \textbf{multihoming/mobility}: finally design a protocol whose connection status is not dependent from layer 3 (IP) data. \item \textbf{datagram}: handling message begin/end despite of packet data fragmentation will further simplify user data management. \end{itemize} This seems obviously more work, but the overall amount of code used for the whole protocol stack will be much less, thus reducing the attack surface. ~\\ Since we are talking about a new, experimental protocol, the obvious choice should be to write a new, lower-layer protocol. To avoid the SCTP/DCCP mistakes, the protocol will need to be able to work both on top of UDP (for bypassing firewall and NAT problems) and directly on top of IP (for efficiency) seamlessly, so we should also take into account a transitional phase between UDP based and IP-based transport. Not only the attack surface will be reduced, especially after the code base stabilizes, but there will be no need to analyse the interaction between multiple protocols, thus simplifying the development and analysis phase. By not having to rely on old technology we will be able to fully control the security properties of the system for the first time in the development of a protocol. It will seem that we are creating an overly complex protocol, but when comparing the result with the number of protocols we are aiming at replacing (TCP/UDP-(D)TLS-OAuth and more) the complexity of our solution will obviously be inferior to the total complexity of the possible interaction of existing solution, not to mention the lack of analysis of the security properties of the interaction of the different possible user configurations. \section{Fenrir outline} \begin{figure}[h] \centering \includegraphics[width=0.5\textwidth]{images/Fenrir_logo.png} \caption{Fenrir Logo} \label{fig:Fenrir_Logo} \end{figure} \subsection{Federated Authentication} The main feature of our protocol is the federated authentication. This means that we will need some form of interaction between the server of multiple independent domains, with each trusting only their users for authentication. Therefore, each user will be identified by its user and domain, in email-like format. For this, we use the distinction provided by Kerberos for our system, and we divide the players into three (plus one): \index{Client Manager}\index{Authentication Server}\index{Federation} \begin{itemize} \item \textbf{Authentication Server}: in short: \textbf{AS}. Handles authentication and authorization for its domain. \item \textbf{Service}: the service the client wants to use. Be it the mail service or a web service. \item \textbf{Client}: the user program that connects to a \textit{Service} \item \textbf{Client Manager}: the program that manages authentication data for the user. \end{itemize} \subsection{Decoupling authentication from application}\index{Authentication!Decoupling} The first two distinctions are fairly straightforward: the authentication server will handle authentication, so that the service can be designed without having access -for example- to the password database or to the user login. This is an important distinction, as the applications are vulnerable to bugs and have a much higher attack surface. By decoupling the two, the user and password databases should be better protected, as the only application that has access to them is the one especially designed to protect them. The distinction between \textit{Client} and \textit{Client Manager} has the same purpose. Current applications usually save login information in cleartext, or in a poorly-obfuscated manner (base64 and alike). For the same reasons as before, we want to decouple application authentication from the application itself. This will permit the system-wide usage of strong authentication methods like security tokens or smart cards, provide better support for authentication algorithms, instead of having clients rely on old methods like the deprecated SSLv3. Over time this will provide better security for the next legacy applications, as the security of the application will be upgradable regardless of the application itself. Decoupling authentication from the application will have one more interesting outcome: as the \textit{Client Manager} will handle both authorization and authentication, it will be able to limit the application scope, so that the user will be able to limit the applications it does not trust, or that only need to check for the existence of an account. This means that both the \textit{Authentication Server} and \textit{Client Manager} will be the primary subject of the attacks towards Fenrir, but this also means that the attack surface will be much lower, and the efforts can be concentrated in a single software. As popular software is migrating towards the web, this situation is increasingly common anyway, since the web browsers need to track each and every user password. since users never care enough about security the password database will be as good as in clear text. Moreover the attack surface of a browser is huge, especially thanks to its plugin system. \subsection{The authentication protocol} Federated authentication algorithms are nothing new. As per our earlier distinction, we will focus on a kerberos-like infrastructure. Due to the previously introduced decoupling, our protocol needs some way to convey the various interacting players that a user has a certain authentication and authorization. This is done through the use of a token, so that login and passwords will not reach the \textit{Client} or the \textit{Service}, and are used as little as possible. One characteristic of many authentication algorithms is the usage of timestamps to protect the authorization tokens or general messages. While this provides safety by putting an expiration time on the usage of said tokens, it also means that applications, servers and authentication servers must have at least a loosely-synchronized clock. Although nowadays clock synchronization seems easy and widespread, it is not yet at a state were we can safely assume that the clocks have little discrepancy. Embedded devices are still produced without a clock source, so each time they are booted the clock is reset to 1970. The most famous clock synchronization protocol (NTP) is used almost always in clear text, and basing our clock on the attacker's response is not wise. Requiring all clocks in the world to be synchronized between a couple of minutes from each other, even for devices that do not have stable clock sources is in our opinion wrong. Therefore Fenrir will \textit{not} use timestamps. This will mean that occasionally additional round trips will be needed to check the validity of the data, but this also means that tokens can be simplified, as they do not need signatures anymore, and a token revocation will be effective immediately. This choice is further supported by the existence of the \textbf{Online Certificate Status Protocol} \cite{OCSP}. X.509 certificates are essentially proof of authentications granted for a fixed time (mostly one year). For a number of years there was no way to revoke a certificate in a useful way, as the CRL (Certificate Revocation Lists) were updated very slowly. Once such CRLs became not only too slow, but also too big, OCSP was introduced to check in real-time if the certificate had been revoked or not. This alone proves that basing our protocols on time differentials alone is not sufficient. As the figure \ref{fig:FederationOutline} describes it, the protocol will relay heavily on the authentication servers, which will act as a trusted third party. The image is only an outline, and assumes a shared token between client@example.com and the authentication server on example.com \label{FederationExample} \begin{figure}[t] \centering \begin{framed} \centering \begin{tikzpicture}[node distance=4cm,>=stealth] \node[label=above:{Auth.Srv example.com}] (AS1) {\includegraphics[width=2cm,keepaspectratio]{images/auth_server.png}}; \node[label=above:{Auth.Srv domain.com}, right= 3.5 cm of AS1] (AS2) {\includegraphics[width=2cm,keepaspectratio]{images/auth_server.png}}; \node[below of=AS2,left=2.5cm of AS2, label=below:{Client example.com}] (C) {\includegraphics[width=2cm,keepaspectratio]{images/computer.png}}; \node[below of=AS2,right=1.5cm of AS2, label=below:{Service domain.com}] (S) {\includegraphics[width=2cm,keepaspectratio]{images/server.png}}; \draw[<-,thick] (AS2.180) -- node[below]{$1: auth, use ``service''$} (C.90); \draw[<->,thick] (AS1.30) -- node[below]{$2: check account$} (AS2.150); \draw[->,thick] (AS2.340) -- node[right]{$3: new user: id$} (S.60); \draw[<-,thick] (AS2.300) -- node[below]{$4: ok, keys$} (S.120); \draw[->,thick] (AS2.250) -- node[below]{$5: ok: ip, keys$} (C.20); \draw[<->,thick] (C.340) -- node[below]{$6: communication$} (S.200); \end{tikzpicture} \end{framed} \caption{Fenrir overview: client@example.com connects to ``service'' in domain.com} \label{fig:FederationOutline} \end{figure} The service will not receive the login data, and in fact will only get a connection id, cryptographic keys and an internal user id. As confirmation, the client will receive the ip, connection id and keys to connect to the service, so that no other handshakes are needed. Moreover, since the client receives a notification only when the connection has been confirmed between the authentication server and service, we avoid non intuitive errors like ``authenticated but not connected'', typical of federated protocols like Kerberos or OAuth, were the authentication server is too detached from the service. The authentication server will receive a lot of trust from the user, as it needs to control its authentication and authorization data, but it will not be able to impersonate the user on different domain's services, as the service and the user will also share a secret key. There will still be room for impersonating a client on the same domain, although that is technically unavoidable with any design due to the service and the authentication server belonging to the same organization (authentication and logs can always be forged by administrators that control the services). ~\\ We will now follow a detailed bottom-up approach at designing the protocol. \section{Transport} \subsection{Layer 4: UDP tunnel}\index{Transport UDP} The SCTP protocol history tells us that introducing new functionalities while being incompatible with existing network infrastructure is dooming ourselves to fail. However it also tell us that a UDP-based protocol can move up to a standalone one, given enough traction. SCTP evolved maybe too quickly from udp based to standalone, firewalls worldwide did not update, and very few applications ended up using it. As our initial requirement is the compatibility with existing infrastructure, our protocol will be based on top of IP, but will include an optional UDP lightweight tunnel in its main components so that existing infrastructure will not have problems using it Using UDP as a lightweight tunnel permits us to use a single socket for transmission or reception of every connection, without having the kernel track for us every connection. Firewalls will permit UDP connections, as the DNS system is based on that, and NATs will continue working as always. Using UDP permits us to handle everything in user space, and the only thing that a non-UDP connection will need from the kernel is simply that it forwards the packet as-is to the right application. \subsection{Layer 4.5: Fenrir} \subsubsection{Connection} The very first thing we need is something to identify the connection. Older protocols use the tuple of source IP, destination IP, source and destination PORT, but this is unnecessary and spans multiple protocols which should be independent. The solution is simply to use a single identifier as the connection id. This alone grants independence from the IP layer, enabling us to support multihoming and mobile clients, and all encryption data will be based on this id. Once we identified the connection we need to check whether it is a legitimate packet. The first thing to do is to check the packet legitimacy by checking a cryptographic header (HMAC like), then decrypt the packet and access its contents. The last two steps can be condensed into one if we use AEAD ciphers (Authenticaed Encryption -- Additional Data). The algorithms used will be decided during handshake: this might make us use one full RTT, but unlike minimaLT, we won't be tied to a single algorithm, so that if problems are found in the future, we can simply shift to a new algorithm instead of throwing away the whole protocol. With this setup, all data out of the connection id will be encrypted and authenticated. Regarding the encryption, 4 methods are available: \begin{itemize} \item \textbf{MAC then Encrypt} : First authenticate the cleartext data, then encrypt everything. Used in TLS. \item \textbf{Encrypt AND MAC} : authenticate the cleartext, encrypt the cleartext and transmit both the authenticated and encrypted payload in cleartext. Used in SSH. \item \textbf{Encrypt then MAC} : Encrypt the cleartext, then authenticate the encrypted cleartext. Used in IPSEC. \item \textbf{Authenticated Encryption}: new algorithms can authenticate data while encrypting it, so that no additional headers are required. \end{itemize} These various methods were analyzed in 2008 \cite{Bellare:2008:AER:1410264.1410269}, and in 2009 even the ISO/IEC committee proposed a standard \cite{ISOIEC19772} where the recommended mode of encryption is \textbf{Encrypt-then-MAC}. Authenticated encryption algorithms were born later but are regarded to be as secure as the ISO proposed method. A short summary of the analysis mentioned earlier: \begin{itemize} \item \textbf{MAC then Encrypt} : Only the cleartext is authenticated. If the cipher is malleable, the attacker can change both the MAC and the cleartext. This is what happened in WEP, where the CRC was used as a MAC \footnote{\href{http://www.netstumbler.org/unix-linux/chopchop-experimental-wep-attacks-t12489.html}{ChopChop attack: http://www.netstumbler.org/unix-linux/chopchop-experimental-wep-attacks-t12489.html}} \item \textbf{Encrypt AND MAC} : The MAC is not secret, thus can reveal some information on the cleartext (especially in context were data is mostly static). It does not grant ciphertext integrity. \item \textbf{Encrypt then MAC} : ciphertext integrity is granted, so cleartext integrity is granted by composition. Authenticity can be verified before decryption. \end{itemize} Due to these reasons, the only modes available in Fenrir will be AEAD and encrypt-then-MAC. An example of a famous AEAD cipher is OCB3 \footnote{\href{http://www.cs.ucdavis.edu/~rogaway/ocb/}{OCB Homapage: papers and implemetation: http://www.cs.ucdavis.edu/~rogaway/ocb/}}, which is now an IETF draft. Recently the CAESAR \cite{CAESAR} competition was created to determine a standard from the various AEAD ciphers, the winner is expected to be announced somewhere towards the end of 2017 / beginning of 2018. Due to the inclusion of the AEAD ciphers, and the fact that different authentication headers can have different lengths, the actual length (or existence) of the authentication header is tied to the connection and decided during the handshake, and thus will not be specified in the packet. To help in debugging and for specific applications, cleartext data transmission should be supported (although always authenticated). While authentication is always needed, encrypting public data might be a lot of work for resource constraint devices, so it should be possible to explicitly disable it. So far the Fenrir packet we have built can be seen in figure \ref{fig:FenrirPacket1}. The connection id is the only part in clear text. \begin{figure}[t] \begin{center} \rowcolors{1}{green}{green} \begin{tabularx}{0.7\textwidth}{| l || X | X | X | X |} \hline \hline \hiderowcolors bytes &0&1&2&3\\ \hline \hline 0-3 & \multicolumn{4}{c|}{Connection id} \\ \hline & \multicolumn{4}{c|}{Encrypted data} \\ \hline & \multicolumn{4}{c|}{Authentication header} \\ \hline \hline \end{tabularx} \caption{The Fenrir packet so far} \label{fig:FenrirPacket1} \end{center} \end{figure} \subsubsection{Multiplexing multiple data streams}\index{stream}\label{multiplexing} After granting \textit{secrecy} and \textit{authenticity}, we want to provide an easy way to multiplex the user and protocol data. Not providing a multiplexing service will bring developers to recreate protocols like FTP, using multiple connections, which are hard to track for firewalls and is more complex to maintain from the programmer point of view. This probably is one of the big factors that contributed to make developers port all their applications on the web, as all those details are now handled by the web server. Afterwards, to avoid the limitations of the HTTP protocol, websockets and webrtc standards were created, so that applications could ``downgrade'' the HTTP connection to a raw socket, while increasing application complexity again. SCTP was the first protocol to try to fix these problems, followed by SPDY (HTTP2 \cite{HTTP2}) and QUIC \cite{QUIC:rfc} implemented support for \textbf{multiple streams}, so that the programmer can use one stream per resource, thus parallelizing network flow. As of today however the only protocol able to combine multiplexing with multiple stream characteristics (reliable, unreliable delivery) is the SCTP protocol. To implement multiplexing, we need an header similar to the SCTP one, but it can be much simpler: we only need a stream id, the data length, a counter to be able to reorder different fragments. We can further improve on the idea by adding flags to signal whether this is the first fragment, end of fragment, or a full user message. Fenrir will thus be able to handle both \textit{bytestream} transmission and \textbf{datagram} with message boundaries, relieving the user of the need to parse and handle message lengths or escape sequences to handle the begin and end of the user messages. Characteristics like the reliability of the stream do not change once the stream is set up, so we should not reserve space in the header for this, but instead signal this in the control messages used to add or remove streams. SCTP has a lot of different headers, each having its own structure and purpose, but that increases the overall parsing complexity. Instead in Fenrir even the control messages will be carried in a normal (reliable) stream, so we can simplify the parsing and waste less bytes on the header. The resulting packet is drawn in figure \ref{fig:FenrirPacket2}. \begin{figure}[t] \begin{center} \rowcolors{1}{green}{green} \begin{tabularx}{0.75\textwidth}{| l || X | X | X | X |} \hline \hline \hiderowcolors bytes &0&1&2&3\\ \hline \hline 0-3 & \multicolumn{4}{c|}{Connection id} \\ \hline & \multicolumn{4}{c|}{Encryption Header} \\ \hline \showrowcolors 4-X & Rnd Length & \multicolumn{3}{c|}{Randomness} \\ \hline (X+1)-(X+4)& \multicolumn{2}{c|}{Stream id} & \multicolumn{2}{c|}{Data length} \\ \hline (X+5)-(X+8) & flags & \multicolumn{3}{c|}{Stream counter} \\ \hline \hiderowcolors & \multicolumn{4}{c|}{Authentication header} \\ \hline \hline \end{tabularx} \caption{The packet so far. Green color highlights the encrypted data} \label{fig:FenrirPacket2} \end{center} \end{figure} Since we have a full byte for flags, we can allocate the bits as follows: \begin{itemize} \item \textbf{0-1}: start/end of user message. if none are present, it's an intermediate fragment. \item \textbf{2-7}: used as high-order bits of the next ``stream counter'' field, thus bringing the counter up to $2^{30}$ bits. \end{itemize} Having a bigger stream counter will help in the case of high-latency-high-bandwidth links (like satellite links), were the time spent waiting for the ACK of the first packet might be as much as the time spent filling the sliding window. \subsection{Randomization}\index{Side channel} Some of the latest attacks on TLS and other protocols are side-channel based (Lucky13 \cite{TLS:Lucky13}, CRIME, BREACH), for example on the size of the transmitted data, or on the amount of time necessary to (de)encrypt data. Although these attacks are more of an issue for implementations than for protocols, Fenrir should try to limit the amount of information gathered from such attacks. To make data-size based attacks more difficult, packets contain a random amount of data, where the actual maximum amount should be user-defined, as we can interfere too much with the application flow. As can be seen on figure \ref{fig:FenrirPacket2}, the first field we encounter in the encrypted section of the packet is a one-byte counter indicating how many random bytes were introduced in the next field, named "randomness". The randomness is put at the beginning of the packet for two reasons: \begin{itemize} \item randomizing even the packet structure: if the attacker is not able to know where a certain field will be located in the packet, it will have less information even if new attacks are discovered on the ciphers used. \item stressing the need for implementations to use this field to randomly limit the packet size. \end{itemize} If we used only the padding we would do nothing to the packet structure, and we would not randomize the user data placement in the various packets of the connection. Not randomizing the data placement is what let analysis like CRIME to know exactly which packet to target. For example, when transmitting a password over HTTP, we might already know that due to the headers and page content that the password will be transmitted in the $n^{th}$ packet of the connection. While it does not seem much, this is exactly the basis of the recent BYCYCLE \cite{TLS:BYCICLE} attack. Fenrir also uses some header fields (stream id, counter) that can (and should) be randomized in order to increase entropy of the encrypted data and to give attackers less details on the content of the cleartext, so that known-plaintexts attacks get more difficult. \subsection{Reliability mechanisms} Since Fenrir is designed to work on top of the IP layer, we need to reimplement retransmission and control flow mechanisms. The ACKs are designed based on the SCTP ones, where a single packet can ACK multiple segments of various streams (SACK). This means that instead of just reporting the last received byte, we report the stream id along with the first and last byte of ack'd data. This allows us to acknowledge even non-sequential data, so that retransmission is more efficient, just like SCTP. QUIC introduces a \textbf{Forward Error Correction}\index{Forward Error Correction} mechanism for transmitted data, which works a lot like RAID 5: every two packets a third packet is sent which is the XOR of the previous two packets. This is a novel way to handle error correction in network transmission protocols, but it is not flexible enough, as the correction does not spawn more than two consecutive packets, and can't be tuned to the average channel error rate. To fix this, Fenrir will use the RaptorQ \cite{rfc6330,libRaptorQ} error correction mechanism, a highly configurable FEC code where an infinite number of error recovery packets can be generated from a user-defined set of packets. Although the cpu and memory requirements can rise significantly with the size of the set of packets that we want to protect (quadratic memory, cubic cpu time), applying error correction to a set of a thousand packets is still relatively inexpensive. Forward error correction will be activated only if requested, and per-stream, so that different reliability statuses can be achieved, depending on what the programmer needs. \section{The Handshake}\index{Handshake} This part is crucial in the protocol design, as it needs to handle connection setup, negotiation of protocol features, cryptography details and more, but in the less amount of packets, size and round trips as possible, to improve efficiency. Since this is a security protocol, we need to pay attention to where the trust of the system comes from. Other problems we will face in this phase are user authentication, authorization, DDoS or amplification attacks prevention. Current protocol stacks perform poorly against DDoS protection thanks to the lower TCP layer. Efficiency is also low, as the system has to perform multiple handshakes (TCP-TLS-OAuth). Since we collapse multiple layers in one, Fenrir will not suffer from layered protocol handshakes. A new feature in this area, introduced by both QUIC and minimaLT, is the support for multiple handshake types, depending on the information available to the client. The most secure handshakes can take up to 3 full round trips, but minimaLT and QUIC can go down to a 0-round-trip connection. Unfortunately the handshake length and the overall security of the handshake seem to be inversely correlated, but some concessions can be made in terms of security if the applications need a very low connection setup time. The next sections are dedicated to the security and design of the handshake. For now, we only need a way to quickly distinguish an handshake packet from an established connection. For this, we reserve the connection id ``0'' to connection setup. In this case we leave the packet completely in clear text. The stream identifier will be random, and will help us track the different handshakes. \subsection{Flexibility} Many attacks on TLS concentrated on its complexity, on the interaction between different handshakes and ciphers of different TLS versions. Thanks to this, new experiments like CurveCP\footnote{\href{http://curvecp.org}{CurveCP: http://curvecp.org/}} and its successor, minimaLT \cite{minimaLT} have tried to throw away all flexibility and concentrate on a single encryption algorithm. This made the protocols much more efficient and simple, but a single failure in the cipher or protocol will mean that these protocols will have to be thrown away. Since TLS can still be used today exactly thanks to its flexibility in both specification and cipher choice, Fenrir will feature multiple handshakes and use multiple ciphers. Moreover, there currently exists no stable, proven algorithm that can withstand quantum computer attacks, so fixing ourselves on one or on a single set of choices might significantly lower the protocol life. \subsection{Trust model}\label{trust_model}\index{Trust Model} Every cryptographic system is based on the distribution of symmetric or asymmetric encryption keys. Distribution of keys means that we will have to trust someone, as that someone will be able to forge and distribute keys that will completely undermine any system. The most widely spread approaches to this are the PGP \textit{web of trust} and the \textit{public key infrastructure} used with X.509 certificates by TLS and others. The problem with the web of trust, is that trust is transitive: if I trust one key, then I'm implicitly trusting all the keys signed by that key. This means that compromising a single key can lead the whole web of trust to crumble. It also means that to know if a key is trusted, we need to traverse the whole web of trust. This is not efficient, it vastly increases the attack surface and it also means that the validity of a signature depends on which web of trust we are using (that is: there is no global state). This approach has only been used by relatively small communities who can manually check the validity of the certificates. The public key infrastructure works by trusting one or more central authorities that have the explicit task of verifying whether a certificate is owned by its rightful owner. The problem with this approach is the reliability of said certificate authorities. In the past big organizations like Comodo had security breaches\footnote{\href{https://blog.mozilla.org/security/2011/03/25/comodo-certificate-issue-follow-up/}{Comodo security breach: https://blog.mozilla.org/security/2011/03/25/comodo-certificate-issue-follow-up/}}, or had lax security checks\footnote{\href{https://www-secure.symantec.com/connect/sites/default/files/Test_Certificates_Incident_Final_Report_10_13_2015v3b.pdf}{Symantec's fake certficate for google.com:\\ https://www-secure.symantec.com/connect/sites/default/files/Test\_Certificates\_Incident\_Final\_Report\_10\_13\_2015v3b.pdf}}, so that anyone could get a certificate even when they were not supposed to get one. Since the common computer holds more than a hundred different authorities and since these authorities are located all over the word, even in countries with oppressive regimes, this model should be seen as lacking in trust. Even without going so far as considering state-level attackers, companies such as Nokia\footnote{\href{http://www.zdnet.com/article/nokia-hijacks-mobile-browser-traffic-decrypts-https-data/}{Nokia Series 40 MITM: http://www.zdnet.com/article/nokia-hijacks-mobile-browser-traffic-decrypts-https-data/}} and Amazon\footnote{\href{http://www.zdnet.com/article/amazons-kindle-fire-silk-browser-has-serious-security-concerns/}{Amazon Kindle MITM: http://www.zdnet.com/article/amazons-kindle-fire-silk-browser-has-serious-security-concerns/}} include new certificate authorities on some of their devices, then redirect all HTTPS traffic to their own servers, essentially performing a man in the middle attack. Whatever the choice, a public key needs to be transmitted with a format. Historically this format has been X.509. This is a big standard, dating back to 1988, that received multiple corrections which made it very difficult to parse, as shown by the sizes of current parsers: the polished and simplified parser of libreSSL is currently more than 10.000 lines of C code, while the OpenSSL and GnuTLS versions are 25k and 35k lines respectively. This has been a real problem that has been exploited multiple times in both the basic libraries (*SSL) and the programs that use these libraries. There does not appear to be a solution for all the technical and political problems behind this, and as such Fenrir tries not to choose any of the two models. The web of trust is hard to keep secure and is slower, while the public key infrastructure based on current certificate authorities is safe for the average user, but as good as broken for a lot of governments. The only choice Fenrir tries to enforce thus is to stop using the difficult X.509 certificate format, and fall back to handling raw public/private keys. Still, we need a way to get the public keys safely. The public key is the minimum and most important piece of information we need to safeguard, but we might want to protect other data, such as the IP address resolving to the authentication server. Fenrir will not force a single method to get the necessary information, as new and more secure systems might come out in the future, or some other trust model might be required. \subsubsection{Connection information needed} To setup a connection with TLS first we need to resolve the ip address of the server, then we connect to a specific port (e.g.:80/tcp for web) on the given ip. We then get the certificate, check it against out trust model and server fqdn, finally we can connect. This flow is very basic, but we can start by noting a couple of things: \begin{itemize} \item the fqdn resolvers almost never use DNSSEC, and in fact the ip address is often intercepted and modified to support transparent proxies. Even if we used DNSSEC, we would be leaking information such as whom we are connecting to. \item the 80/tcp port identifies the service we are using. While the web is now ubiquitous, we are still leaking some of the protocol identification. \item the X.509 certificate does not protect ip address(es), ports, and leaks the service fqdn (this is needed in TLS to support different certificates for the same server). \end{itemize} Although difficult to exploit, we are leaking a lot of information, and we have to leak it due to the legacy of the internet development. Efficiency also suffers, adding multiple fqdn-to-ip resolutions to the DNS systems (for example for load balancing) made it necessary for the DNS packets first to increase from a maxium size of 512 bytes to as much as the packet provides, and then to even switch to DNS over TCP to support even bigger records. Such records are needed for load balancing, failover configurations, and integrating other information such as text, binary, encryption eys or signatures into the DNS system. Thanks to a clean design, we can now hide more information and increase flexibility and efficiency. What Fenrir requires is a directory service that provides the required information to the client. Such informations are: \begin{itemize} \item UDP port for UDP tunnelling \item IP addresses (v4, v6), with load-balancing and failover support. \item public keys \end{itemize} \subsubsection{DNSSEC resolution}\index{DNSSEC} Different directory services can be used in Fenrir, and the protocol itself should not standardize on a single one. DNSSEC has been chosen to its widespread usage and similar objective. Other systems, like LDAP would require a certificate authorities trust model, include a much more complex, slow system. DNSSEC has been designed to provide secure(but unencrypted) fqdn-to-ip resolution, although it greatly increases the size of each record, since every record needs cryptographic signatures. We can find records for everything we need here: \begin{itemize} \item IP address(es): A, AAAA records \item UDP ports, IPs, failover support: SRV records \item public keys: specific formats (IPSEC, SSH...) or TXT records \end{itemize} As we are not using a common format, we can encode everything we need in a single TXT record, thus saving bandwidth, packets and round trips. This will need to be encoded as most of this data is binary, so the \textbf{Z85} encoding (which is similar to \textbf{base585}) will be used as the encoding standard. Base64 is more inefficient, and Z85 is designed to be safe from common escape sequences (quotes, double quotes, backslash) used in XML and other formats, so we can maintain the encoding format even when experimenting with other directory services. We want to provide information on the authentication servers and even provide load balancing and fallback features. The format of the encoded message is thus: \label{DNSSEC} \begin{itemize} \item \textbf{2 bytes}: udp port (port 0 mean to use raw IP) \item \textbf{1 byte}: number of address structures, formed as: \begin{itemize} \item \textbf{1 byte}: bitfield for type and address priorities \begin{itemize} \item \textbf{0-1}: 00: ipv4 01: ipv6. \item \textbf{2-4}: priority group (for fallback) \item \textbf{5.7}: preference priority in the same group \end{itemize} \item \textbf{X bytes}: ip address. size dependent on previous ip type \end{itemize} \item \textbf{Y * 2 bytes}: list (plus length) of supported authentication algorithms \item \textbf{1 byte}: number of public keys (in order of preference). \begin{itemize} \item \textbf{2 bytes}: public key type\label{DNSKEYID} \item \textbf{2 bytes}: public key id \item \textbf{Z bytes}: public key. size/format dependent on type. \end{itemize} \end{itemize} This format is versatile and can be adapted to the situation, since it has all the information needed. It is encoded from binary so we can not be more efficient than this in the DNS system, and we have all the features that would otherwise require multiple DNSSEC records and signatures. Since the published information regards only the authentication server, and the services ips will be given from the A.S., a further round of load balancing and failover can be applied directly inside the A.S. This can be coupled with geo-ip optimizations or further customizations as needed, without affecting the standards. To better support the handshake algorithms, we publish a list of such algorithms in the DNS (variable $Y$ above). We now gained a complete decoupling between service and standard udp ports, and thus both our authentication server and services can be at any port the administrator wants. This will help in setting up multiple services and domains in the same machine, and avoiding firewall problems. When using elliptic curve cryptography public keys have a much shorter size than RSA, so a single UDP packet should be able to carry all the necessary information. \textbf{Warning:} Efficiently packing all this information still results in an answer packet that is much bigger than the request. As we can not put other restraints on the DNS system, this will result in the DNS system being used for amplification attacks. Thanks to the structure of the DNSSEC system, however, this is already happening, and multiple servers are already available\footnote{\href{http://dnscurve.org/amplification.html}{DNSSEC amplification: http://dnscurve.org/amplification.html}} that provide an amplification of even a hundred times the original packet. Therefore this should not further increase any threat (as it is already present) and might even push for a resolution of the DNSSEC amplification problem (either switching to TCP or dropping the DNS system). \subsection{Perfect Forward Secrecy} \label{pfs}\index{Perfect Forward Secrecy} \textbf{PSF} is a property of key exchange handshakes that grants that if a whole connection is intercepted today, and the private keys are leaked tomorrow, the eavesdropped connection will not suffer in security. This is usually implemented through the generation and usage of \textbf{ephemeral}\index{Ephemeral keys} asymmetric keys, which are used only for one connection and then are thrown away. These ephemeral keys are signed by the main public key, so that the receiving end can be sure of the authenticity. This way the main public key is not used to encrypt any data. The actual scope of the ephemeral keys is not strictly a single connection. Depending on the trust model and synchronization mechanism, an ephemeral key can span multiple connections during multiple hours; the important part is that this key is easy to replace and is in fact regenerated constantly. The actual regeneration timeout is dependent on the amount of security we want to grant to the whole system. Longer timeouts will obviously increase the attacker window. PFS is currently supported in SSH, TLS (optional, per-connection), CurveCP and minimaLT (timeout-based, multiple connections) and QUIC. QUIC actually uses an even weaker form of PFS, where the initial key exchange is not covered by forward secrecy, but the client can then request an other key exchange to force the property. This choice was made to lower the amount of round trip necessary to the connection setup, sacrificing security for efficiency. In Fenrir PFS is regarded as fundamental, and as such it is always used, even when it means losing in terms of efficiency. \subsubsection{minimaLT approach}\index{Perfect Forward Secrecy!minimaLT}\index{minimaLT!Perfect Forward Secrecy}\label{minimaLTpfs} MinimaLT \cite{minimaLT} introduces a novel way to handle PFS. Instead of having one long-term public/private key, it only uses ephemeral keys, that must be changed at specified intervals. Every key will be published directly in some directory service (DNS). As the key is short lived, minimaLT can avoid one full round trip to setup ephemeral key exchanges. This means that the DNS server and the minimaLT server must synchronize, and need coherent clocks to publish keys at the right time. The longer we keep the same ephemeral key, the lower the security, but higher efficiency is gained through DNS caching. Theoretically this might put more work on the DNS servers, but in practice changing ephemeral keys every half a day, or even just once a day could be considered enough for applications that are not security-critical. Even for intensive usage, changing keys every couple of hours would still provide decent security and caching without adding much overhead to the DNS system. The practical problem, however, relies in the DNS registrars. Registrars that provide high configurability are rare, to the point where very few records are available to the end user (even TXT records can be restricted). Some services allow only a couple of unpaid modifications per day, further limiting the usability of the system. Interactions with the DNS service and user is also not standard, so it can hardly be scripted in a general way, and minimaLT requires keys change at exactly the same moment in both server and DNS system, so even clocks must be synchronized and be reliable. All of these problems do not arise the moment we want to setup our own DNS system, but requiring custom modifications to a service (DNS) that is not always under the direct control of the application owner might be excessive. The biggest drawback, still, is the need to perfectly synchronize the DNS key publishing and service key rotation. In Fenrir this is solved by giving each key an ID, as seen in section \ref{DNSSEC}. \subsection{Handshake}\index{Handshake}\label{Handshake} In Fenrir 3 different handshakes will be available: \begin{itemize} \item \textbf{Full-Security}: includes syncookie-like handshake and a TLS-like handshake. It requires 3 round trips but mimics from the most tested algorithms (TLS). \item \textbf{Stateful Exchange}: 2 round trips (avoids the syncookie), but needs to store a state and thus should be disabled during DoSes. \item \textbf{Directory synchronized}: like minimaLT we publish the ephemeral keys directly on the directory service. Only 1 round trip. \end{itemize} Perfect forward secrecy will be forced in all handshakes, Although the scope of the PFS (connection or timeout) can change. MinimaLT and QUIC both support 0-round-trip connection setup. Although it sounds exciting, this is explicitly avoided in Fenrir as we are not able to grant that a packet has not been spoofed. Although an attacker that can intercept all network packets can keep a working connection even when spoofing the ip address, we should not put all devices in the position to exploit ip address spoofing even when they do not have the capability of intercepting the required network packets. A bigger problem relies in accepting data (aka: service commands) on the first packet: it can create big amplification attacks, for example when requesting a file for an other ip address. Our service will fill the congestion window for a non-confirmed ip address, resulting in a huge amplification attack. Since we are designing from scratch a protocol, we should learn from the TLS system, and try to hide all information possible, even including the service type and fqdn that the client is trying to access. This information is always in plaintext in TLS, as the tcp port tells us which service is being accessed, and the fqdn of the service is in cleartext in the certificate. This is done to let the server and client support SNI (Server Name Identification) \cite{TLS:SNI} which lets servers use multiple certificate on the same TLS socket. Thanks to the format specified in \ref{DNSKEYID}, we can support multiple public keys at the same time. The keys are only related to the authentication server, which means that an eavesdropping attacker will not be able to detect which service or domain the client wants to connect to, since multiple domains can be handled by the same authentication server. The key id specified in the format will also let us handle any delays between the publishing of the new key and the deletion the old one. We will now look more in-depth to the 3 possible handshakes: \subsubsection{Full-Security}\index{Handshake!Full-Security} This is the most secure handshake, designed for security at the expense of efficiency. The round trips for this handshake are: \begin{itemize} \item \textbf{RT n.1}: syncookie exchange to avoid DoS, supported algorithms. \item \textbf{RT n.2}: exchange of supported algorithms, ephemeral keys and key exchange. \item \textbf{RT n.3}: authentication and authorization \end{itemize} \begin{figure}[h] \centering \begin{sequencediagram} \newthread{c}{Client}{} \newinst[8]{s}{Server}{} \begin{call}{c}{supported alg., nonce(timestamp)}{s}{selected alg., syncookie, $sign_1$} \end{call} \postlevel \begin{call}{c}{syncookie, $sign_1$, ephemeral key}{s}{ephemeral key, $sign_2$} \end{call} \postlevel \begin{call}{c}{auth, connection info}{s}{success/reject} \end{call} \end{sequencediagram} \caption{Full-Security handshake sequence diagram} \end{figure} This key exchange looks a lot like the classic TLS one, and the client needs a full 3 round trips to complete the authentication. It is also the slowest method, thus the need for more efficient handshakes. All the responses from the server are signed, and the signature includes the last client's packet data. This is done to avoid man in the middle attacks. The initial syncookie exchange includes a server timestamp. This is done to avoid replay attacks that skip the first round trip. As the timestamp is generated and checked only by the server, there is no requirement on clock synchronization. Perfect forward Secrecy is enforced at round trip n.2. After this round trip the server creates a state that includes only the exchanged keys. No public keys or other information needs to be kept in the server between the second and third round trip. \subsubsection{Stateful Exchange}\index{Handshake!Stateful Exchange} When more efficiency is needed, this handshake can be used. Although it takes one less round trip, the server now needs to store a state right after the first received packet. This method should therefore be available only when few connections per second are created, and should be automatically disabled when under heavy load for concerns of a possible memory DoS. \begin{itemize} \item \textbf{RT n.1}: algorithm support, ephemeral key from server. \item \textbf{RT n.2}: client ephemeral key, authentication. \end{itemize} \begin{figure}[h] \centering \begin{sequencediagram} \newthread{c}{Client}{} \newinst[8]{s}{Server}{} \begin{call}{c}{supported alg., nonce}{s}{selected alg., ephemeral key, $sign_1$} \end{call} \postlevel \begin{call}{c}{ephemeral key, Encrypt(auth, connection info)}{s}{success/reject} \end{call} \end{sequencediagram} \caption{Stateful handshake sequence diagram} \end{figure} We removed the initial syncookie exchange from the \textit{Full-Security} handshake to have quicker connection setup, but this exposes us to a memory DoS. The memory DoS problem arises from the need to store the server's ephemeral private key in memory during the handshake. If a new ephemeral private key is generated for every connection, DoSes are easy to cause. However a solution is to simply keep using the same ephemeral key for more connections, so that only a few keys need to be kept in memory (due to key rotation). This solution breaks the one-key-per-connection rule, but if the key is changed quickly enough it should not be a problem, as long as the key exchange itself is not broken by reusing the same key. Both QUIC and minimaLT have handshakes based on a similar concept of shaing the forward secrecy shared connections. It should also be noted that the amount of information carried by each packet is now increased, so there is a risk that multiple packets are needed to transfer the information. This can be a problem in the case of packet loss, but it does not count as a round trip as both the client and server can send the packets without waiting for the ACK. \subsubsection{Directory-Sinchronized}\index{Handshake!Directory-Sinchronized} This is basically the minimaLT approach, as seen in Section \ref{minimaLTpfs}, revisited. The only round trip required in this case comprehends the client's ephemeral key, and user authentication, and the server only answers with a "success" or "rejected" response. \begin{figure}[h] \centering \begin{sequencediagram} \newthread{c}{Client}{} \newinst[8]{s}{Server}{} \begin{call}{c}{ephemeral key, Encrypt(auth, connection info, nonce)}{s}{success/reject} \end{call} \end{sequencediagram} \caption{Stateful handshake sequence diagram} \end{figure} The problem of this solution remains in the synchronization between the directory service and the authentication server. However the Fenrir approach is much different from minimaLT, in that we only need to synchronize the authentication server keys, and not all the services. Moreover, each public key of the authentication server has a random identifier attached, so that multiple keys can coexists, making key rotation much smoother. This will let us avoid to rely on timeouts and clocks of other machines. Obviously this handshake is the most open to syn-flooding attacks, as the client can create a connection with only one packet. As for the previous handshake, this one should be immediately dropped once the number of new connections per second increases over a certain threshold, and when the number of stale connections increases too much. We are trading the \textbf{robustness} of the handshake for speed, and with one one round trip there is no way to know whether the initial packet was forged or not. \subsubsection{Downgrade protection}\index{Handshake!Downgrade attack protection} We outlined multiple handshakes, but we can not just let them be used independently. One of the (many) problems found in the TLS protocol was its frailty from downgrade attacks \cite{TLS_POODLE}. While the TLS protocol kept being extended to counter new problems and add features, there was no security mechanism to stop the attacker from performing a downgrade attack. A downgrade attack is successful when both the client and server support new protocol versions or key exchanges, but the attacker can force on of the two to use an older (broken) version. In TLS the mechanism was regulated by a timeout. If the attacker could simply drop the initial handshake, the client would fall back to the older and broken SSLv3. This is obviously a grave error probably made while trying to introduce flexibility after having though about security. The solution in the Fenrir protocol will thus be to list all the supported encryption primitives, handshakes and protocol versions during any handshake, and signing everything from both the client and server, so that no such attacks can be performed. Learning from the TLS protocol, the server will be the one deciding which algorithm should be used, as the client application and devices are rarely updated constantly by its users and even by its producers (e.g.: android versions on phones, small IoT devices, cheap devices in general). Later, on chapter \ref{Formal Verification}, we will purposely include broken handshakes in the formal proof to simulate what would happen when new security discoveries might prove an algorithm of a whole handshake to be broken. As long as either the client or the authentication server disable a broken algorithm or key exchange, none of the parties will be allowed to use it. \subsubsection{Spoof/Amplification protection} Until now the TCP 3-way handshake, and SCTP syncookie provided a sufficient proof that the other party has control of the IP it is using. This proof is not without problems: an attacker that controls the network can spoof any IP he wants, and remain undetected. Such an attacker however is not the most common type, as it requires control of ISP networks, or it can only spoof a very limited set of IPs. Every connection made with the Fenrir protocol will require at least one round-trip. While this is already forced in the design of the handshakes, there are other attack vectors that must be considered: session resumption and IP change. In TLS the session resumption authentication was an extensions to the protocol, that required a new secret to be shared between the client and the server, and this secret identifies the TLS session. IP change in TLS means to drop the session on one IP and move it to the new one, resuming the session there. In Fenrir the shared session secret is simply the encryption keys. Having the keys alone is proof enough of being the owner of the session. No other secret or handshake is needed. Changing IP without confirming the source with an other round-trip can lead to a bad amplification attack: imagine a client that starts downloading a big file, which then pretends to change IP. The destination will now be flooded with a big amplification attack that even hides the attacker source. To avoid this, the server will accept packets from all IPs, but when a new IP is to be used, the client will have to send a dedicated ``new-IP" packet from that IP, and the server will respond with a 256bit random syncookie. Only when the client returns the cookie, the new IP will be approved. The syncookie exchange will still happen inside the encrypted connection, with the same keys as before, to avoid setting up a new authentication (and possible attack) vector. The "new-ip" packet from the client will be required to be a full MTU, to avoid having even a small amplification attack of few bytes. \section{Attacks} \label{Attacks} We have a general overview of the protocol. Before going into more details about the authentication, we should try to understand the possible weak points of our model so far. We will concentrate on the overall trust model and try to understand where the weak points are and how to remove them. The model on which Fenrir is based on is that the client, authentication server and services of its domain are trusted sources. This does not mean that other authentication servers are fully trusted, so login information must not be leaked to other domains. We still rely on a \textit{trusted third party} for authentication and authorization, so extra care should be taken in securing this component. \label{Shared secrets} \subsubsection{Rogue AS} A malicious authentication server can obviously impersonate all its users into its services. That is because both service and authentication server are controlled (by definition) by the same organization: logs can be forged, user data can be edited. But this is true for any system, so there is no point in trying to counter this. However, we can imagine a scenario were a previously trusted authentication server becomes a \textbf{rogue AS} after an attacker manages to compromise the operative system of said AS. In this case our model does not prevent the AS to impersonate its user, for both its own services and other domains' services. While we can not do much to avoid the AS impersonating a client in new services, we can stop the AS from impersonating the client into services where the client has already registered, thus protecting the user data and privacy. The idea is to share a secret between the client and the service. Such a secret will stop a compromised AS from accessing user information. \subsubsection{Trusting trust} A second and more troublesome source of attacks that is rarely considered is the compromise of our trust base, the \textbf{directory service}. In the web of trust model, this can be the compromise of a trusted key, in the public key infrastructure used in X.509 is the compromise of one of the many certificate authorities. This last scenario has already happened (see \ref{trust_model}) multiple times. DNSSEC in Fenrir was not chosen only due to its widespread usage (and testing), but also due to the inherent publicity of the system. Forging an answer and changing the publicized keys will affect lots of people and should be easier to check, as you only need a script constantly checking the public information against the one given by the user. Although results can be forged for specific users, changes will be much easier to spot relative to forged certificates. Personal account changes will be easier to spot as all logging must pass through the AS, which tracks both logins, devices id and can thus apply further heuristics (like country of origin) to all services, independently from domain or type of service. By introducing the very same solution as for the rogue AS problem, we can use a secret between the client and the authentication server. In the case of a compromised trust system, existing clients will not be able to connect to the fake authentication servers and will thus remain secure. The drawback of such an approach is that when an authentication server is actually compromised, the shared secret will need to be regenerated again, and all the users will be unable to log in until they explicitly confirm that they want to reset the shared secret. This will have the added benefit of forcing the domain administrators to inform the users of security breaches. \subsubsection{Improvements} Our model had two single points of failure: the authentication server and the trust model. The attacker now does not gain any advantage by breaking into the trust model, as it needs the shared secret between the client and the authentication server. Just compromising the authentication server will not be enough to grant access to the user data inside the remaining services, as there is an ulterior shared secret between the client and the service. We have thus eliminated the only two single points of failure introduced by the federated, trusted-third party model, making our model at least as secure as the old client-server one. \begin{figure}[t] \centering \begin{framed} \centering \begin{tikzpicture}[node distance=4cm,>=stealth] \node[label=above:{}] (AS1) {}; \node[label=above:{Auth.Srv domain.com}, right= 3.5 cm of AS1] (AS2) {\includegraphics[width=2cm,keepaspectratio]{images/auth_server.png}}; \node[below of=AS2,left=2.5cm of AS2, label=below:{Client example.com}] (C) {\includegraphics[width=2cm,keepaspectratio]{images/computer.png}}; \node[below of=AS2,right=1.5cm of AS2, label=below:{Service domain.com}] (S) {\includegraphics[width=2cm,keepaspectratio]{images/server.png}}; \draw[<-,thick] (AS2.180) -- node[below]{$1: token \oplus secret_1, use ``service''$} (C.90); \draw[->,thick] (AS2.340) -- node[right]{$2: new user: id$} (S.60); \draw[<-,thick] (AS2.300) -- node[below]{$3: ok, keys\oplus secret_2$} (S.120); \draw[->,thick] (AS2.250) -- node[below]{$4: ok: ip, keys\oplus secret_2$} (C.20); \draw[<->,thick] (C.340) -- node[below]{$5: encrypt(keys)$} (S.200); \end{tikzpicture} \end{framed} \caption{Overview: shared secret($secret_2$) and backup secret ($secret_1$) usage} \end{figure} \section{Credentials and Authorization} \label{FederationAlgorithm} \label{Credentials}\index{Authentication} Authentication and authorization are the first things that we will need to keep in mind after setting up a secure channel. We will now introduce a new authorization technique to help the user limit account access to applications and whole devices. Finally we will implement the shared secrets introduced in the previous section, decreasing the tactical value of an hacked authentication server, that will no longer be able to impersonate the client into existing services. \subsection{Token}\index{Token} One of the easiest ways to break into any account is to try to guess commonly used passwords. Thanks to the design of Fenrir, instead of continuously using weak passwords we can use hard-to-guess \textbf{tokens}. After the initial username/password login, tokens will be used for everything, from application authentication to user authorization. This design will also limit the reuse of password by the users, which means that they might be willing to use more difficult passwords, as they need the passwords \textit{only once}. It should be noted that the connection is either anonymous or tied to an authenticated user. This means that the token will only be used during authentication. Although this seems obvious, it is a big difference from other protocols like Kerberos or OAuth, were the developer is provided a token that represents its authentication and has to transmit the token at every request. The Kerberos/OAuth token approach is more flexible as it can span multiple protocols and scopes, but it decreases security as the developer has to pay attention not to leak the secret token. Developers would have to pay much more attention to security and information leaks, while the vast majority can barely be trusted with writing a stable application. By tying the token to the connection we simplify application security, but we now need to handle all possible communication characteristics, which we already discussed in section \ref{multiplexing}. \subsection{Authentication}\index{Authentication} Fenrir is a token-based authentication protocol, so the main identifier for a user will be its token. To avoid enumeration attacks, a token should be a 128 bit string of random content. But the first time the client still needs to pass through 'normal' password authentication. This poses multiple problems, such as: \begin{itemize} \item the need to store the password safely in the database \item the need to somehow limit bruteforce attacks \item the need to pay particular attention to the information leaked in this phase. \end{itemize} Although some of these problems are not strictly part of the protocol definition, they are still common attack vectors, thus we should address them. The first problem can be easily solved by using the \textbf{Argon2} algorithm (variant Argon2i) recently approved by the \textit{Password Hashing Competition} \cite{PHC} to store the passwords. Argon2 takes measures to force a certain amount of time and memory usage to compute a salted hash, rendering bruteforce less effective even in the case of a leakage of the user database. Since the authentication is token-based, the only use of the username/password pair should be to get an identification token. By adhering to this model we make sure that the password is used only once in the whole device lifetime. As the user/password combination is used sparingly (once per each device), the authentication server can severely limit the login attempts, further limiting online attacks. As a remainder, although the combination of the previous two paragraphs will make password guessing a lot slower and harder, it will still not protect users from targeted attacks in the case of common passwords (targeted dictionary attacks). A recent attack on the TLS protocol, called \textit{BICYCLE} \cite{TLS:BYCICLE}, has been able to deduce the length of parts of the encrypted text when a stream cipher is used in TLS. If the attack is focused on password length, the attacker can greatly limit the search space for the password, increasing the efficiency of bruteforce attacks. While this is an attack that must be targeted towards specific users, and can only be used in the specific context of a new device registration, we can completely avoid these problems by hashing the provided user password with SHA3 and only transmitting the hash, which is now constant-length. This will not increase the password strength, but will merely make all password lengths equal. We might want to do the same for usernames, but that would hide the domain part of the username, and would make the federated identity impossible. One of the limitations of current protocols is that we are forced to authenticate at the beginning of the connection, during the handshake. Although this mode must obviously be supported, if we limit the login to the handshake we might force the developer to make another connection and drop the previous one (e.g.: before logging in into a website). While this is not terribly complicated, it still is something that is not usually done. Due to this TLS authentication is not used and multiple protocols stacked on top of it are used to reimplement the authentication. To avoid duplication of features and more stacking of protocols, Fenrir should provide the ability to drop unauthenticated streams and switch to an authenticated connection. \subsection{Device identification} A novelty introduced by OAuth is the authentication of applications. This means that the application binary has to provide a sort of username and password to the service before being recognized, and before providing user login information. The application credentials are therefore stored statically inside the application binary. Although some applications might try to obfuscate such data, most applications will simply store the data in plaintext. Although such an application authentication is inherently flawed and with few use cases (so few that OAuth2 introduced mechanisms where this was not needed), it does not mean that the concept is useless. In Fenrir we shift the application authentication to become a \textit{device} identification. This means that the user can now control which and how many devices are connected to its account. The main purpose of this device identification in fact is not to limit the number of user devices, but only to identify different devices and to give a scope to each device. This will let the user know if his credentials have been leaked/stolen, as there will be new devices connected to the account. This will also let the user limit the device authorization so that a lost device can be easily blocked. To implement this, each device will have a 128 bit random identifier that will be transmitted along with the user authentication to the authentication server. \subsubsection{User tracking} It should be noted that the device identification is a great tool to track the user. By combining the current features the authentication server is now able to see exactly which device is accessing which accounts, and can thus form a detailed picture of the user habits. This will always be a problem for protocols based on a trusted third party, and as the name itself implies, we need to find a third party that provides an authentication server that we can trust. The problem is not to be taken lightly, as user profiling can have serious privacy implications. Such a problem however already exists in OAuth, where the login provider can understand where each user is loggin into every time the user uses the "login with \{Facebook,Google, etc.\}" functionality. Device identification is already present in OAuth, but hidden, in the form of browser identification. Since OAuth identifies even the application using it is possible to now only know the domain accessed, but also to distinguish the application that the user is accessing. Finally, since the "login with" functionality in OAuth needs to be strictly tied with the authentication provider parameters, the only choices the user is usually between big names such as Facebook or Google, thus concentrating the amount of identification on few providers. Unlike OAuth, Fenrir design makes these problems explicit, and thanks to the federated identity the user will be able to choose the authentication provider that grants more privacy to its users. \subsection{Single sign-on}\index{Single sign-on} Single sign on mechanisms provide the user with the ability to login once and be automatically authenticated into different services of a single vendor. While it looks like a very useful feature, it is not very flexible, since you also need to log out of every service before logging in with a different user. Since we want to provide the user with the ability to handle multiple accounts, we can not guess what account the user will want to use on an application. As an example, imagine wanting to login with one account into Google+, but looking at an other account email in gmail. This is currently not possible unless we first logout completely with the first account. Fenrir will thus not support single-sign-on. The whole purpose of single-sign-on is to make login easier and automatic. Thanks to the token usage, our model already needs only user confirmation of the right username to use on a particular application. Simplifying it further would limit the user's ability to login with different accounts on services owned by the same entity. \section{Authorization Lattice} \label{AuthorizationLattice}\index{Authorization!Lattice}\index{Lattice} Aside from authentication, one feature many protocols try to include is a way to limit the scope of the user authentication, that is: \textbf{authorization}. Some protocols like Kerberos have a binary authorization (either authorized or not), so the system can not limit the account scope for a particular session. The first thing that needs to be done is to somehow list the possible privileges, then we can tie a token to a specific privilege, and thus limit the application capabilities. This by itself is already included in OAuth, but there is no standard way to manage privileges and the application has to restrain itself, which is not really safe if we do not trust the application. Fenrir will therefore include a standard way to list authorizations, and have multiple levels of enforcement. By taking advantage of the 3rd-party model, the authentication server will associate a maximum privilege to a token. That token can then be further limited in privilege by the client manager during authorization. Since the final application does not interact with tokens or authentication in any way, it is forced to use the limited level of privilege that it has been provided with, thus ensuring our account safety with untrusted applications. A workflow of the privilege checking is in Figure \ref{fig:token workflow} This model however requires different privileges to be ranked one relative to the other. In mathematical term, we are looking at a \textbf{complete lattice}, as it is the most basic structure that grants our requirements. \begin{figure}[t] \centering \begin{tikzpicture}[-,>=stealth',auto,node distance=1cm] \tikzstyle{every state}=[fill=none,draw=none,text=black] \node[state] (B) {Bottom}; \node[state] (R) [above left of=B,node distance=2cm] {Read}; \node[state] (M) [above left of=R,node distance=2cm] {Modify}; \node[state] (A) [above right of=R, node distance=2cm] {Add}; \node[state] (W) [above of=R, node distance=3cm] {Write}; \node[state] (T) [above right of=W, node distance=2cm] {Top}; \node[state] (I) [right of=A, node distance=2cm] {Account info}; \path (B) edge node {} (R) (R) edge node {} (A) (R) edge node {} (M) (A) edge node {} (W) (M) edge node {} (W) (W) edge node {} (T) (B) edge node {} (I) (I) edge node {} (T); \end{tikzpicture} \caption{Example of a complete lattice of privileges} \label{example of a lattice} \end{figure} \begin{figure}[t] \centering \begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=6cm, thick,main node/.style={draw,font=\sffamily\Large\bfseries}] \node[main node] (AS1) {AS: example.com}; \node[main node] (C) [below of=AS1] {Client c@example.com}; \node[main node] (AS2) [right of=AS1] {AS: domain.com}; \node[main node] (APP) [below of=AS2] {Application}; \path[every node/.style={font=\sffamily\small}] (C) edge [bend right] node[left] {1: get token} (AS1) (AS1) edge [bend right] node[below left] {2: token:$P_1$ privilege} (C) (C) edge [bend right] node[right] {3: token:$P_2$ privilege} (AS2) (AS2) edge [bend right] node[above] {4: token + $P_2$, ok?} (AS1) (AS1) edge [bend right] node[above] {5:OK\!} (AS2) (AS2) edge [] node[above left ] {6: keys} (C) (C) edge [] node[above] {keys} (APP); \end{tikzpicture} \caption{Token check workflow: $P_{1,2}$ privileges, $P_1 >= P_2$} \label{fig:token workflow} \end{figure} A complete lattice, as seen in Figure \ref{example of a lattice}, is a partially ordered set in which all subsets have both a supremum (top) and an infimum (bottom). Thanks to each pair of privilege having an upper element that has both privileges and a lower level with the least privilege in common it will be easy to quickly verify whether a token that was tied to a privilege can be used with the privilege that the client manager is asking, or can be further limited if the client manager decides so. We now have a hierarchical definition of privileges, which can be enforced independently of the application, is tied directly into the protocol and is in complete control of the user. The only downside is that as each lattice of privileges is specific to an application, we will need to transfer the lattice between the service, the authentication service and the client manager in order to let the user work with lattices of every service. However this is needed only on the first connection (the lattice can be cached), and is not needed for anonymous connections as there is no user to authorize. We should still be wary of rogue services building pointlessly big lattices, so there should be a limit to the size of the lattice to avoid DoSing authentication servers and client managers. The chosen limit for now is 64 nodes, with labes of at most 25 bytes. This lets us use an easy format, and limit the total size to 2.5Kb, or about 2 ethernet packets. \section{Advanced Authentication}\index{Authentication!Advanced} The authentication we provided is still kind of basic, and we can be much more flexible and secure. We will now implement the shared secret techniques introduced in \ref{Shared secrets}. \subsection{Direct Authentication}\index{Authentication!Direct}\label{shared secrets} One problem of the Fenrir system is that we are putting a lot of trust in the authentication server. While this lets us create a lot of interesting new scenarios, it creates a very big single point of failure. As time passes the authentication server implementations will become more and more resilient to attacks, but the risk of a compromised authentication server is too great. To solve this problem we can introduce an automatic, lightweight second layer of authentication that does not require user intervention. We want \textbf{direct authentication} between the client and the service. The idea is that the clients will share a secret key directly with the service on which the clients logs into (non-anonymously). If we can manage this, then the compromise of a whole authentication server will let the AS impersonate the client in services where the client has not yet logged in (as it can just pretend to be a new client), but the existing user accounts an data will be safe, as the hacked authentication server will not be able to guess the shared key between client and service. Aside from the shared key generation, the authentication flow in fig. \ref{fig:FederationOutline} does not need any change. The key generated by the service will be XORed with the shared secret, and sent back to the authentication server to be forwarded to the client. \subsubsection{Storage implications} Implementing direct authentication poses a new problem: the users' accounts are meant to be accessible from multiple devices concurrently, so we need to share the shared secret between all the client managers. To do this however we need to synchronize the secrets between the clients. An easy way of doing this would bw to keep the secrets on the authentication server. Obviously enough, the secrets can not be in clear text, as we are storing them on the entity from which we need to protect them. The easiest way is to store the shared secret with the hash of an ulterior user password. A rogue authentication server will not be able to MITM the connection, even by manipulating the shared secrets. This however brings the total user password to two: \begin{itemize} \item The \textit{account} password, used by the user to identify itself with the authentication server \item The \textit{safety} password, used to make the shared secret unusable by the authentication server. \end{itemize} Both passwords however can be easily derived from one user provided password and a key derivation function. The algorithm Argon2 \cite{PHC} can be used in such a way, providing the username as a salt. Usernames are unique by definition, but Argon2 has a salt of strictly 16 character, too little for many usernames. To avoid this problem, the SHA3 hash of the username will be used as hash. \subsubsection{Trusting trust with direct authentication} Once the client and service have a shared key, it should become impossible even for the authentication server to break the encryption between the two, as even man in the middle attacks by the rogue authentication service will be stopped. However, when first sharing the key the authentication server can create a fake service, let the client exchange keys with him, effectively use a man-in-the-middle attack. The only solution to avoid this problem is to publicize the public key of the services in a way similar to what we have done with DNSSEC. The service will then sign the generated data that will be relied to the client via the authentication service, thus assuring the client of the impossibility of man-in-the-middle attacks. This will however break the previously stated feature that the identity of the accessed service will be kept secret. The reason is that DNSSEC queries are signed, but still in cleartext. If we assume this to not be a problem, or in the future shift to a safer directory service, we still hide the service identity. Still, we will never be able to hide IP addresses, which will likely be associated with a service, thus indirectly revealing the service identification. Due to this reasons protecting the service identity is not something that we should strive for, as it can be inferred in many ways. So we should safely implement this second public key check. \subsection{Backup trust} We now want to implement a similar mechanism between the authentication server and the client manager. Such a thing would make the compromise of the directory service close to useless. Since we derive the trust of our system from the directory service, we are again removing an other point of failure. Since the authentication in Fenrir happens after the initial key exchange, we can not apply the shared secret to the connection key, but we can apply it to the token. We will not be able to protect the username, but we can secure the token and the connection after the token has been sent. By using the previously introduced key derivation function, we can generate an additional random key that we will use to XOR the token. We now need to update the connection keys. The quickest way is to use the same shared secret with which we XORed the token to XOR the old encryption keys. The connection can now only be maintained if both parties had the same secret, and we did not increase or modify the content of the previously discussed handshakes. \subsection{Authentication summary} Thanks to the \textit{direct authentication}, we have broken away from relying only on the authentication server. Our model is now a hybrid between a third-party security model and a purely client-server one. To successfully compromise an account now an attacker needs to compromise the service it wants to access; compromising the Authentication server will not help. The only other single point of failure was the trust system, but that has been removed as well with the \textit{backup trust}. This means that we have fully removed the single points of failure introduced by the federated model, and brought the attention of the attacker back to the service itself, which does not have the login information. The authentication server can still impersonate the users on services where the users have not been registered yet. The usefulness of such an attack is however much more limited. \subsection{Putting it all together: an authentication flow example}\index{Authentication!Example} \begin{figure}[t] \centering \begin{framed} \centering \begin{tikzpicture}[node distance=4cm,>=stealth] \node[label=above:{Auth.Srv example.com}] (AS1) {\includegraphics[width=2cm,keepaspectratio]{images/auth_server.png}}; \node[label=above:{Auth.Srv domain.com}, right= 6 cm of AS1] (AS2) {\includegraphics[width=2cm,keepaspectratio]{images/auth_server.png}}; \node[below of=AS2,left=2.5cm of AS2, label=below:{client@example.com}] (C) {\includegraphics[width=2cm,keepaspectratio]{images/computer.png}}; \node[left of=C,label=above:{DNSSEC}] (DNSSEC) {\includegraphics[width=2cm,keepaspectratio]{images/pubkey_db.png}}; \node[below of=AS2,right=1cm of AS2, label=below:{Service domain.com}] (S) {\includegraphics[width=2cm,keepaspectratio]{images/server.png}}; \node[below of=C,label=below:{Application}] (APP) {\includegraphics[width=2cm,keepaspectratio]{images/applications_utilities.png}}; \draw[<->,thick] (DNSSEC.340) -- node[above]{$0: pubkeys$} (C.200); \draw[->,thick] (AS2.180) -- node[below]{$1: Priv: P, token \oplus secret_1, use ``service''$} (C.90); \draw[<->,thick] (AS2.160) -- node[above]{$2: priv.\ P,\ token, on ``service". OK?$} (AS1.20); \draw[->,thick] (AS2.340) -- node[right]{$3: new user: id$} (S.60); \draw[<-,thick] (AS2.300) -- node[below]{$4: ok, keys\oplus secret_2$} (S.120); \draw[->,thick] (AS2.250) -- node[below]{$5: ok: IP, keys\oplus secret_2$} (C.20); \draw[<->,thick] (C.270) -- node[below left]{$6: keys,\ privilege: P$} (APP); \draw[<->,thick] (APP.20) -- node[below right]{$7: encrypt(keys)$} (S.250); \end{tikzpicture} \end{framed} \caption{Full overview. and interactions between systems in a federated example: token, privileges, shared secrets and public keys interaction} \end{figure} We have initially defined our protocol, presenting existing solutions, their defects, then slowly evolved it around the good and bad choices made by current protocols, while introducing a couple of innovations, like the authorization lattice presented in Section \ref{AuthorizationLattice}. A formal, verifiable model will be presented in Chapter \ref{Formal Verification}, we now want to look at the big picture, at a working example of all the features we have introduced so far. In our examples, we will have a client, \textbf{client@example.com}, connecting to the a service of the domain \textbf{domain.com} DNSSEC will be used as the source of our trust. Although this system has many problem (no privacy, caching delays, political problems), it is the only source of trust currently available and largely compatible with today's infrastructure. The client manager will be set up with an username and a password. The password will be used to derive the shared secret \text{backup trust}. this will happen only once per account in the lifetime of the client manager. The client manager now gets the public key of the authentication server of example.com. The client manager now has to connect to its AS. This connection is persistent and will be used to synchronize the tokens and shared secrets. While authenticating, the client will XOR its token with the \textit{backup trust} secret. If the authentication is successful, the encryption key is updated, and XORed with the \textit{backup trust} secret. The client manager will now synchronize the tokens with the authentication server. The user will now request to authenticate into a specific service in \textit{domain.com}. The client manager will now get the DNSSEC key for the \textit{domain.com} domain, and connect to that authentication server. While authenticating it will provide the token for the service it wants to access. The token will be verified between the two authentication services. In parallel a DNSSEC request will be made for the service's public key. If successful, the authentication server of \textit{domain.com} will contact the required service, informing it that a new user has connected. If that user has a shared secret with the service, the service will reply with a new key, XORed with the shared secret. If the user does not have a shared secret, the first two messages between the client and service will be a key exchange, as seen in section \ref{Handshake}, which will generate the session key and the shared secret.