Analysis.tex

% !TeX root = Fenrir.tex

\xkcdchapter{Analysis of the problem}{problems}{This is how I explain computer problems to my cat.\\ My cat usually seems happier than me.}
\label{Analysis}


\lettrine{W}{hat} we need is a federated authentication protocol, but before starting analysing the current solutions we need to look at the big picture from the developer point of view. When designing and application, what are we looking for at the various layers?

This chapter is dedicated to enumerating and describing the ideal characteristics of any protocol. At the end of this dissertation we will get back to the list of requirements presented here to understand how many of them we met and why we could not met any other.


\section{Requirements and terminology}
\index{Requirements}

The very first requirement we want to achieve is the \textbf{compatibility} with the existing infrastructure. Designing new protocols is not useful if we need the whole internet infrastructure to change before we can use our solution.

\subsection{Transport Requirements}

Looking at the problem from the developer's point of view the first thing we need is to be able to transfer data. Just here we can find some different and seeming contradictory requirements:

\begin{itemize}
	\item \textbf{reliability}: ability to get the full data even in the case of packet loss
	\item \textbf{unreliability}: the service does not react to packet loss
	\item \textbf{ordered delivery}: the data is received in the same order as it is sent.
	\item \textbf{unordered delivery}: the data order is not important.
	\item \textbf{datagram delivery}: the user provides data in chunks and only full chunks will be delivered to the receiving application
	\item \textbf{bytestream delivery}: the application receives as much bytes of data as the protocol is able to provide, without distinguishing the beginning and end of user messages
\end{itemize}

It is easy to see that there is a pair of incompatible requirements for each requirement, but all the other combinations are actual use cases of existing applications, with the exception of the unreliable-unordered-bytestream combination, whose only use case is the transmission of data that is composed purely of independent bytes (something difficult to imagine).

The datagram delivery requirement is intended in the general sense. It usually means that a message is limited to one packet, but what we mean here is merely that the protocol is able to handle the beginning and end of the user message. This means that the user application will receive a complete message, independently of how many packets that message was fragmented into.

The unreliability requirement could be split into two subrequirements:
\begin{itemize}
	\item \textbf{hard unreliability}: there is no assurance at all on the delivery of the data, much like UDP.
	\item \textbf{soft unreliability}: although there are no assurances, techniques are implemented to recover from data loss. 
\end{itemize}

An very simple example of soft unreliability is to send each packet twice. The network might drop one, but the second packet might get through.

Other features usually of a transport protocol are:
\begin{itemize}
	\item \textbf{multiplexing}: the ability to support multiple data stream on the same connection in parallel
	\item \textbf{multihoming}: the ability to take advantage of multiple IPs/interfaces.
	\item \textbf{mobility}: the ability to maintain an active connection even if one of the endpoints changes IP.
	\item \textbf{efficiency}: the ratio of user data to actual transmitted data.
\end{itemize}

Multiplexing and multihoming are advanced features that are beginning to be incorporated in the latest versions of existing protocols (HTTP2 - MTCP), and bring a huge advancement in both application development and better usage of existing infrastructure.

\subsection{Security}


Security wise, the assurance that are most commonly looked for are:
\begin{itemize}
	\item \textbf{secrecy}: encrypting data so that unauthorized parties are unable to understand the contents
	\item \textbf{authenticity}: the assurance that an authorized party is the one that sent the data, that has not been tampered.
	\item \textbf{authentication}: the process to prove that a user is who he claims to be.
	\item \textbf{authorization}: the privilege given to the authenticated user.
	\item \textbf{federated identity}: linking one user to a domain, so that other domains can verify the association.
	\item \textbf{Robustness}: ability to withstand attacks.
\end{itemize}

The usual way to implement the above are encryption(for secrecy), HMACs (authenticity), user/password pair (authentication).

Authorization is not used very much today as there seems to be no standard way to handle it.

Federated identity include Kerberos-like protocols where the user identified as a 'user@domain' string, and also OAuth-like protocols where the username has no information on its associated domain, but the application itself provides a list of trusted domains.

The \textit{robustness} is a general term we use to describe how to protocol will withstand attacks
such as amplification attacks or DoSes. DoS attacks flood the victim with multiple connection requests, so the protocol must withstand many such attacks without storing too much data or requiring to many computations to distinguish a legitimate connection from an empty attempt. Amplification attacks are what happens when a spoofed request reaches the server: if the server sends back much more data then what it received, the attacker might use this service to flood a third party, without consuming its bandwidth and while hiding its presence.

\subsection{General requirements}

Not all requirements fit in the above lists, or can be put in both:

\begin{itemize}
\item \textbf{Flexibility}: How much using the selected protocol will limit the developer.
\item \textbf{Interoperability}: The ability to interact between different implementations or technologies.
\end{itemize}

A flexible protocol will obviously have more use cases and receive more development, but a loose specification might introduce security holes and hinder interoperability.

Interoperability is the ability of the same protocol to interact with different implementations, a core requirement for a federated protocol, despite the difference in the application that use such protocol.


\xkcdchapter[0.36]{Existing Solutions}{workaround}{Just 'cause you can doesn't mean...}
\label{Existing Solutions}

\lettrine{W}{e} will now list the most common protocols that are used in the stack of an application.
To better understand the current situation and some of the problems that arise from using one solution instead of the other, we will try to highlight the problems and limitations.

\section{Authentication and Authorization}

The following protocols handle user authentication, but have limited to no support to authorization. Even support for federation is limited, but looking at these
protocols will reveal which choices provided which limitations, so they are still worth mentioning.

\subsection{Kerberos}

One of the first standard \textbf{authentication} protocols based on the idea of a federation of servers is \textbf{Kerberos}. Its original design is based on symmetric encryption, though there are variants that use asymmetric private/public key encryption.

The biggest drawback of this protocol is its requirement that all clocks be synchronized between a couple of minutes at most relative to the authentication
server. While the difference in timezones can be overcome by using times based on UTC, computers over the internet are not always synchronized, current
time synchronization techniques are largely \textit{not} authenticated and there are a lot of embedded devices whose clock is reset at 1970 at boot. On top
of this, not all computers synchronize their clocks, or do so a little at a time, so that it can take a lot of time to get to the correct time.

Since kerberos is purely an authentication protocol, it does not handle the user data connection to the service. This means that the user will get a token
and that token will have to be used in all communications between the user and the services. it's the user job to protect its data.

There is no support for authorization, and although the authentication connection is done on top of a reliable connection, nothing prevents the user from using
the token in different connections types (tcp, udp, tls...).The downside is that the handling and security of the token is in the hands of the user.

A token represents the user authenticity, so losing one will permit the attacker to impersonate its victim for the duration of the connection. Even after the
expiration of said token, the session remains valid, as there is not ``kick out'' mechanism.

Overall, we find \textbf{authentication}, and \textbf{federation} support, but the lack of clock synchronization has stopped this protocol from being used
extensively.


\subsection{OpenID}

This is an federated authentication solution that is heavily based on the web.

Each user registers with an identity provider, ans is granted a URI as an identifier. When the user wants to connect to a service, it is redirected to its
identity provider which (after eventual login) redirects the user back to the service.

The whole flow is based on HTTP redirects, so any client needs an HTTP client\&server implementation. The protocol does not handle authorizations and the
authentication method is not specified, so it can be anything from the classical user/password to more complex token-based logins.

One of the drawbacks of this protocol is its reliance on a third party: while it means that the user does not need multiple usernames and passwords, it also
mean that the third party is able to authenticate everywhere as the user. Having to trust the third party completely, especially after the Snowden  leaks is
not a pleasant thing.\\
This is sometimes referred to as the ``trust problem'', since in today's internet anyone can setup an authority, without having to provide any proof of 
reliability or non-malevolence.


It is based on top of HTTP(s), it only works on top of the TCP-TLS-HTTP pile and we can use this to provide both \textbf{authentication} and \textbf{federation}
support


\subsection{OAuth}\index{OAuth}

This \st{protocol} framework was born in 2007 since OpenID was not gaining much usage. As OpenID it is based on top of HTTP, and its flow is based on
redirects between services to exchange tokens and grant user authentication. \textbf{OAuth} \cite{OAuth:Online} tries to introduce authorization but it is
very rarely used, as it was not introduced in the first drafts, and lots of documentation seems to confuse authentication and authorization.

Although OAuth is widely used, its history is a bit troublesome: OAuthv1 is a protocol with sometimes loose definitions, implementations of
different domains can differ significantly, and the web pages used by the specifications are not standard, so two implementations will never be
automatically interoperable, thus it's impossible to create a federation.\\
OAuth2 did not fix any of this, and the specification become so loose that it was demoted to ``framework'', and its main developer quit while asking
for its name to be taken away from the final rfc.

Both version 1 and 2 introduce an application authentication (sometimes improperly referred to as ``authorization''), which is completely insecure, as each
application must have a static username and password (unrelated from the user credentials) that have to be saved permanently in the application binary.\\
Due to the current usage of personal devices, reverse engineering of these credentials is trivial, and in fact nowadays each service lets everyone
register new applications and there seem to be no restrictions based on the application credentials.

OAuth somehow solves the OpenID ``trust problem'' without really solving it: since each service now needs a dedicated implementation of the
authentication protocol/framework, each service provider is forced to include only the implementations for the OAuth provider it trusts.

This however does not mean that each OAuth provider can not impersonate its users, and since the only providers included are usually the ones like
facebook, twitter,github and so on, it limits the possible damage from unreliable OAuth providers, but doesn't solve the problems risen by Snowden.

As for OpenID, OAuth only works on top of the TCP-TLS-HTTP pile. Since it is based on tokens, nothing stops this from being used in other contexts as long
as the token is transmitted, although this use case is rare as all new applications today tend to be REST-based.

\subsection{OpenID Connect}

This is a new protocol \cite{OpenID-Connect} currently undergoing standardization, based on top of OAuth 2 and it has been
developed by many big names such as Microsoft, Google, Paypal and many others.

This protocol tries to fix shortcomings of both OAuth and OpenID by:
\begin{itemize}
\item standardizing the authentication flow and various parameters
\item migrating from implementation-defined data formats to JSON standardized messages
\item creating a (optional) discovery protocol for a better interaction between authentication providers (using webfinger)
\item including (optional) dynamic client registration
\end{itemize}

The main selling points of OpenID-Connect are interoperability and the usage of the webfinger protocol to lend account information to third parties in a
standard way.\\
This protocol also tries to integrate with other services, like webfinger for user data discovery, lower http protocol status for http-based logout, and
includes even listing supported encryption and authentication algorithms of the lower-layer TLS protocol, and includes information on the type of login
to impose limitations on the actions on the services, such as requiring payment services being disabled when the user used a low-standard authentication
method (such as a cookie instead of a private certificate or two-factor authentication). 

As the protocol is currently under standardization (as of April 2015), there is a chance things might change. The specification is rather big, and includes
many optional parts, so it's difficult to say whether this protocol will solve both OpenID and OAuth protocols.


\section{Authentication and Secrecy}

Previous protocols grant user authentication, but the secrecy of the connection is left to other layers. For a complete evaluation we should consider
what protocols can be used underneath the previous authentication protocols, and if any insecurity arises from the interaction of the two.

\subsection{(D)TLS}

TLS is the successor of SSL, which has been recently deprecated\footnote{\url{https://tools.ietf.org/html/rfc7568}}. DTLS is the version that works on top of
UDP, which treats protocol data delivery as ordered and reliable, but user data as unreliable and unordered.

This protocol provides authentication, but has not been included in the authentication list as its authentication is rarely used, as it must be done
before the connection takes place.

In the last years, TLS has been subject to a lot of attacks from different angles, from the key renegotiation\cite{rfc5746} (year: 2009) to exploiting
CBC weaknesses (\textbf{BEAST} attack, 2011) or statistical data from the compressed connections (\textbf{CRIME}, 2013).

Transport wise, TLS requires a reliable delivery, so TCP is needed. This is a requirement for all the above authentication protocols, but Kerberos and
OAuth are token based, so as long as the token is transmitted on a secure connection, DTLS could be used for the application data, with the
caveat that if the token packet gets lost, the application connection won't be usable, which is why this solution is not used often.

In short, TLS provides \textbf{authentication} (with limitations), \textbf{secrecy} and \textbf{authenticity} of the data (provided new guidelines are followed
to avoid the latest attacks), and provides only a \textbf{stream} based data transfer.

Although TLS is a reliable protocol, it is not seen as a restriction as the above-layered authentication protocols can not handle packet drops. However this
end up limiting more complex applications, as more connections are needed to have different transport capabilities, and authentication must be handled
in an application-specific manner, thus giving more work to the developers.


\subsection{QUIC}\index{QUIC}


\textbf{QUIC} is an experimental protocol by Google, born in 2013, based on UDP. It reimplements control flow and other mechanism to provide a reliable
connection, borrows concepts like \textbf{multiple streams} from SCTP and includes TLS-like encryption in the protocol handshake.

As of today there is no library implementing the QUIC protocol, as the only implementation can be found inside the Chromium browser. The only documentation
available, aside from Chromium's code can be found in two decriptive documents\cite{QUIC:reasons}\cite{QUIC:crypto}, but the information is far from an rfc.

The selling points for this protocol are the integration of TLS in the handshake, multiple ways to reduce the number of RTT for a connection setup
(based on old information from previous connections), the ability to handle \textbf{mobile} clients and handling both reliable and unreliable connections.


In short, QUIC can be used for \textbf{secrecy}, \textbf{authentication}, for both \textbf{reliable} and \textbf{unreliable} connections.
The transport method is purely \textbf{stream}-based, but we can have \textbf{multiple streams}. As for TLS, the authentication must be performed at the
beginning of the connection.

\subsection{minimaLT}

As the successor to \textbf{CurveCP}, \textbf{minimaLT} is transport protocol written from scratch and based on elliptic curve cryptography.

It only includes reliable data transfer, but has an rpc-like mechanism to mimic \textbf{multi-stream} data transfer.\\
One of the main selling points of the algorithm, like for QUIC, is the efficiency in creating new connections, ranging from 2 RTT down to 0 RTT.

One of the novelty introduced in the protocol, aside from relying strictly on elliptic curve cryptography, is the synchronization with DNS servers to
publish new public keys every few minutes, thus removing a double public key exchange to implement ephemeral key agreement.

The protocol tries to avoid the ``old'' sin-ack mechanism, and instead relies on encryption and challenge-response mechanisms to setup the connection.

The protocol does not have an RFC yet, but the main characteristics are collected in a public document\cite{minimaLT} and there is a working implementation
in EthOS.

Overall it is a novel \textbf{reliable} protocol, provides \textbf{multistream} \textbf{datagram} data transfer, and supports easily client \textbf{mobility}.
Although it seems robust against DoS attacks, the 0-RTT mechanism might be exploited to create amplification attacks.


\section{Transport Level}

Finally a quick word on the transport limitations caused by using the authentication algorithm presented above.

\subsection{TCP}
This is a limit for OAuth, OpenID and OpenID-Connect: the connection is single, there is no multiplexing of commands, so the programmer has to
handle multiple connections and security tokens by himself in order to achieve network parallelism or to use and unreliable data transfer.

TCP was not taught with current network conditions in mind, so it had to be extended in order to support things like satellite links (option: increased
max window size from 16 bits to 24), mutipath (currently experimental support from Apple \& linux), syncookies (to avoid Dos) and many other.
While the protocol is extensible, having many options will slowly eat away the Maximum Segment Size and increase the disparity of supported options
between the various operating systems.

\subsection{UDP}
The protocol that does little more than nothing, it is hard to use for authenticated connection as the authentication can be lost just like any other packet and
the user has to watch the MSS (which can change during transmission). Only DTLS, minimaLT and QUIC are based on top of UDP, but the first is very rarely
used, and the last two are experimental protocols, so unreliable secure connections are unused or rarely standardized, as the programmer end up having
to do everything by himself anyway.

\subsection{SCTP/DCCP}

These two protocols could be considered the evolution of TCP and UDP, but never gained much attention as the internet border routers do not have any
support for NATing these protocols and SCTP was never implemented in Microsoft Windows.

Aside from the firewall issues, SCTP hadles everything we might need in a transport protocol, except for its security. The format is somewhat verbose
and the headers can be maybe more complex than needed. Separately securing every stream used in the connection can be not so error-prone,
so these protocols never gained much traction out of internal, already secured environments.