Analysis.tex

% !TeX root = Fenrir.tex

\xkcdchapter{Analysis of the problem}{problems}{This is how I explain computer problems to my cat.\\ My cat usually seems happier than me.}
\label{Analysis}

\lettrine{A}{lmost} all applications nowadays require an internet connection and the privacy and security of said applications is getting more important every day.\\
All of said security is based on complex cryptography and authentication algorithms, but only a small part of the developers actually study these algorithms.
Fortunately we have projects like (open/libre)ssl whose only purpose is to implement these algorithms correctly and provide the programmer easy interfaces
to the security software, so that the developers can concentrate on the actual application.

However these projects are neither interchangeable nor equivalent, so the programmers still needs to choose based on their needs. This also mean that a lot of programmers just use the most common solutions, and adapt their program instead of doing proper research.

What we want is a federated, flexible authentication and authorization protocol that does not limit the choices of the programmer regarding to features of
data transport, so that the developer can use the security as a base of their application, instead of designing said application around the limitations of a
particular solution.

\section{Requirements and terminology}
\index{Requirements}

The protocols we will consider are all above layer 3 (IP) and below layer 7 (Application).

To better understand what we are looking for and to keep track of the various capabilities of each authentication protocol, we can define a set of general
characteristics:

\begin{itemize}
\item \textbf{Efficiency}: the ability to work with the least amount of information and/or messages on the wire.
\item \textbf{Flexibility}: How much using the selected protocol will limit the developer.
\item \textbf{Interoperability}: The ability to interact between different implementations or technologies.
\end{itemize}

More specifically, in security protocols we will find:
\begin{itemize}
\item \textbf{Robustness}: the ability to avoid attacks (DoS, amplification, etc..)
\item \textbf{Secrecy}: the ability to keep data secret from attackers
\item \textbf{Authenticity}: the ability to distinguish between forged and genuine data
\item \textbf{Authentication}: proof that the user is really who he claims to be
\item \textbf{Authorization}:privilege level given to a user
\item \textbf{Federation}: ability to collaborate between different domains
\end{itemize}

Since one of our aims is flexibility, we also need to track some features of the transport protocols that are used underneath the security protocols:
\begin{itemize}
\item \textbf{Reliability}: ability to react to data loss
\item \textbf{Best-effort}: opposite of a reliable protocol
\item \textbf{Multiplexing}: ability to handle multiple data streams without setting up multiple connections
\item \textbf{Datagram transmission}: ability to handle the user message as a unit, instead of just a stream
\item \textbf{Stream transmission}: the transmission unit is the byte, the user must distinguish begin/end of messages
\item \textbf{Multihoming}: ability to handle devices with multiple addresses
\item \textbf{Mobility}: ability to handle device which change address.
\end{itemize}


\xkcdchapter[0.36]{Existing Solutions}{workaround}{Just 'cause you can doesn't mean...}

\lettrine{T}he previous problems are not at all new ones, and programmers have tried multiple times to write solutions that would interact with existing
technologies. Due to the ISO/OSI layered model however this solutions always tried to solve one specific problem, without looking at the big picture.

To better understand the current situation and some of the problems that arise from using one solution instead of the other, we will try to list the most
widespread protocols with a short description, highlighting the problems and limitations, .

\section{Authentication and Authorization}

The following protocols handle user authentication, but have limited to no support to authorization. Even support for federation is limited, but looking at these
protocols will reveal which choices provided which limitations, so they are still worth mentioning.

\subsection{Kerberos}

One of the first standard \textbf{authentication} protocols based on the idea of a federation of servers is \textbf{Kerberos}. Its original design is based on symmetric encryption, though there are variants that use asymmetric private/public key encryption.

The biggest drawback of this protocol is its requirement that all clocks be synchronized between a couple of minutes at most relative to the authentication
server. While the difference in timezones can be overcome by using times based on UTC, computers over the internet are not always synchronized, current
time synchronization techniques are largely \textit{not} authenticated and there are a lot of embedded devices whose clock is reset at 1970 at boot. On top
of this, not all computers synchronize their clocks, or do so a little at a time, so that it can take a lot of time to get to the correct time.

Since kerberos is purely an authentication protocol, it does not handle the user data connection to the service. This means that the user will get a token
and that token will have to be used in all communications between the user and the services. it's the user job to protect its data.

There is no support for authorization, and although the authentication connection is done on top of a reliable connection, nothing prevents the user from using
the token in different connections types (tcp, udp, tls...).The downside is that the handling and security of the token is in the hands of the user.

A token represents the user authenticity, so losing one will permit the attacker to impersonate its victim for the duration of the connection. Even after the
expiration of said token, the session remains valid, as there is not ``kick out'' mechanism.

Overall, we find \textbf{authentication}, and \textbf{federation} support, but the lack of clock synchronization has stopped this protocol from being used
extensively.


\subsection{OpenID}

This is an federated authentication solution that is heavily based on the web.

Each user registers with an identity provider, ans is granted a URI as an identifier. When the user wants to connect to a service, it is redirected to its
identity provider which (after eventual login) redirects the user back to the service.

The whole flow is based on HTTP redirects, so any client needs an HTTP client\&server implementation. The protocol does not handle authorizations and the
authentication method is not specified, so it can be anything from the classical user/password to more complex token-based logins.

One of the drawbacks of this protocol is its reliance on a third party: while it means that the user does not need multiple usernames and passwords, it also
mean that the third party is able to authenticate everywhere as the user. Having to trust the third party completely, especially after the Snowden  leaks is
not a pleasant thing.\\
This is sometimes referred to as the ``trust problem'', since in today's internet anyone can setup an authority, without having to provide any proof of 
reliability or non-malevolence.


It is based on top of HTTP(s), it only works on top of the TCP-TLS-HTTP pile and we can use this to provide both \textbf{authentication} and \textbf{federation}
support


\subsection{OAuth}\index{OAuth}

This \st{protocol} framework was born in 2007 since OpenID was not gaining much usage. As OpenID it is based on top of HTTP, and its flow is based on
redirects between services to exchange tokens and grant user authentication. \textbf{OAuth}\cite{OAuth:Online} tries to introduce authorization but it is
very rarely used, as it was not introduced in the first drafts, and lots of documentation seems to confuse authentication and authorization.

Although OAuth is widely used, its history is a bit troublesome: OAuthv1 is a protocol with sometimes loose definitions, implementations of
different domains can differ significantly, and the web pages used by the specifications are not standard, so two implementations will never be
automatically interoperable, thus it's impossible to create a federation.\\
OAuth2 did not fix any of this, and the specification become so loose that it was demoted to ``framework'', and its main developer quit while asking
for its name to be taken away from the final rfc.

Both version 1 and 2 introduce an application authentication (sometimes improperly referred to as ``authorization''), which is completely insecure, as each
application must have a static username and password (unrelated from the user credentials) that have to be saved permanently in the application binary.\\
Due to the current usage of personal devices, reverse engineering of these credentials is trivial, and in fact nowadays each service lets everyone
register new applications and there seem to be no restrictions based on the application credentials.

OAuth somehow solves the OpenID ``trust problem'' without really solving it: since each service now needs a dedicated implementation of the
authentication protocol/framework, each service provider is forced to include only the implementations for the OAuth provider it trusts.

This however does not mean that each OAuth provider can not impersonate its users, and since the only providers included are usually the ones like
facebook, twitter,github and so on, it limits the possible damage from unreliable OAuth providers, but doesn't solve the problems risen by Snowden.

As for OpenID, OAuth only works on top of the TCP-TLS-HTTP pile. Since it is based on tokens, nothing stops this from being used in other contexts as long
as the token is transmitted, although this use case is rare as all new applications today tend to be REST-based.

\subsection{OpenID Connect}

This is a new protocol\footnote{\url{http://openid.net/connect/}} currently undergoing standardization, based on top of OAuth 2 and it has been
developed by many big names such as Microsoft, Google, Paypal and many others.

This protocol tries to fix shortcomings of both OAuth and OpenID by:
\begin{itemize}
\item standardizing the authentication flow and various parameters
\item migrating from implementation-defined data formats to JSON standardized messages
\item creating a (optional) discovery protocol for a better interaction between authentication providers (using webfinger)
\item including (optional) dynamic client registration
\end{itemize}

The main selling points of OpenID-Connect are interoperability and the usage of the webfinger protocol to lend account information to third parties in a
standard way.\\
This protocol also tries to integrate with other services, like webfinger for user data discovery, lower http protocol status for http-based logout, and
includes even listing supported encryption and authentication algorithms of the lower-layer TLS protocol, and includes information on the type of login
to impose limitations on the actions on the services, such as requiring payment services being disabled when the user used a low-standard authentication
method (such as a cookie instead of a private certificate or two-factor authentication). 

As the protocol is currently under standardization (as of April 2015), there is a chance things might change. The specification is rather big, and includes
many optional parts, so it's difficult to say whether this protocol will solve both OpenID and OAuth protocols.


\section{Authentication and Secrecy}

Previous protocols grant user authentication, but the secrecy of the connection is left to other layers. For a complete evaluation we should consider
what protocols can be used underneath the previous authentication protocols, and if any insecurity arises from the interaction of the two.

\subsection{(D)TLS}

TLS is the successor of SSL, which has been recently deprecated\footnote{\url{https://tools.ietf.org/html/rfc7568}}. DTLS is the version that works on top of
UDP, which treats protocol data delivery as ordered and reliable, but user data as unreliable and unordered.

This protocol provides authentication, but has not been included in the authentication list as its authentication is rarely used, as it must be done
before the connection takes place.

In the last years, TLS has been subject to a lot of attacks from different angles, from the key renegotiation\cite{rfc5746} (year: 2009) to exploiting
CBC weaknesses (\textbf{BEAST} attack, 2011) or statistical data from the compressed connections (\textbf{CRIME}, 2013).

Transport wise, TLS requires a reliable delivery, so TCP is needed. This is a requirement for all the above authentication protocols, but Kerberos and
OAuth are token based, so as long as the token is transmitted on a secure connection, DTLS could be used for the application data, with the
caveat that if the token packet gets lost, the application connection won't be usable, which is why this solution is not used often.

In short, TLS provides \textbf{authentication} (with limitations), \textbf{secrecy} and \textbf{authenticity} of the data (provided new guidelines are followed
to avoid the latest attacks), and provides only a \textbf{stream} based data transfer.

Although TLS is a reliable protocol, it is not seen as a restriction as the above-layered authentication protocols can not handle packet drops. However this
end up limiting more complex applications, as more connections are needed to have different transport capabilities, and authentication must be handled
in an application-specific manner, thus giving more work to the developers.


\subsection{QUIC}\index{QUIC}


\textbf{QUIC} is an experimental protocol by Google, born in 2013, based on UDP. It reimplements control flow and other mechanism to provide a reliable
connection, borrows concepts like \textbf{multiple streams} from SCTP and includes TLS-like encryption in the protocol handshake.

As of today there is no library implementing the QUIC protocol, as the only implementation can be found inside the Chromium browser. The only documentation
available, aside from Chromium's code can be found in two decriptive documents\cite{QUIC:reasons}\cite{QUIC:crypto}, but the information is far from an rfc.

The selling points for this protocol are the integration of TLS in the handshake, multiple ways to reduce the number of RTT for a connection setup
(based on old information from previous connections), the ability to handle \textbf{mobile} clients and handling both reliable and unreliable connections.


In short, QUIC can be used for \textbf{secrecy}, \textbf{authentication}, for both \textbf{reliable} and \textbf{unreliable} connections.
The transport method is purely \textbf{stream}-based, but we can have \textbf{multiple streams}. As for TLS, the authentication must be performed at the
beginning of the connection.

\subsection{minimaLT}

As the successor to \textbf{CurveCP}, \textbf{minimaLT} is transport protocol written from scratch and based on elliptic curve cryptography.

It only includes reliable data transfer, but has an rpc-like mechanism to mimic \textbf{multi-stream} data transfer.\\
One of the main selling points of the algorithm, like for QUIC, is the efficiency in creating new connections, ranging from 2 RTT down to 0 RTT.

One of the novelty introduced in the protocol, aside from relying strictly on elliptic curve cryptography, is the synchronization with DNS servers to
publish new public keys every few minutes, thus removing a double public key exchange to implement ephemeral key agreement.

The protocol tries to avoid the ``old'' sin-ack mechanism, and instead relies on encryption and challenge-response mechanisms to setup the connection.

The protocol does not have an RFC yet, but the main characteristics are collected in a public document\cite{minimaLT} and there is a working implementation
in EthOS.

Overall it is a novel \textbf{reliable} protocol, provides \textbf{multistream} \textbf{datagram} data transfer, and supports easily client \textbf{mobility}.
Although it seems robust against DoS attacks, the 0-RTT mechanism might be exploited to create amplification attacks.


\section{Transport Level}

Finally a quick word on the transport limitations caused by using the authentication algorithm presented above.

\subsection{TCP}
This is a limit for OAuth, OpenID and OpenID-Connect: the connection is single, there is no multiplexing of commands, so the programmer has to
handle multiple connections and security tokens by himself in order to achieve network parallelism or to use and unreliable data transfer.

TCP was not taught with current network conditions in mind, so it had to be extended in order to support things like satellite links (option: increased
max window size from 16 bits to 24), mutipath (currently experimental support from Apple \& linux), syncookies (to avoid Dos) and many other.
While the protocol is extensible, having many options will slowly eat away the Maximum Segment Size and increase the disparity of supported options
between the various operating systems.

\subsection{UDP}
The protocol that does little more than nothing, it is hard to use for authenticated connection as the authentication can be lost just like any other packet and
the user has to watch the MSS (which can change during transmission). Only DTLS, minimaLT and QUIC are based on top of UDP, but the first is very rarely
used, and the last two are experimental protocols, so unreliable secure connections are unused or rarely standardized, as the programmer end up having
to do everything by himself anyway.

\subsection{SCTP/DCCP}

These two protocols could be considered the evolution of TCP and UDP, but never gained much attention as the internet border routers do not have any
support for NATing these protocols and SCTP was never implemented in Microsoft Windows.

Aside from the firewall issues, SCTP hadles everything we might need in a transport protocol, except for its security. The format is somewhat verbose
and the headers can be maybe more complex than needed. Separately securing every stream used in the connection can be not so error-prone,
so these protocols never gained much traction out of internal, already secured environments.

\chapter{Protocol Summary}

\lettrine{A}{t} the current time the most stable and standard protocol pile to use is TCP-TLS-HTTP-OAuth2, but a lot of useful features, like federation support and
interoperability are lost on this solution.

Unless we want to try to authenticate over an insecure connection, and handle by hand that connection security, which is not something many developers
are able do do correctly, we are limited to one non-multiplexed secure connection, and it must be over HTTP.

There are interesting solutions like QUIC, but the experimental, non-standardized status, has kept away the authentication protocols developers,
so that only highly security conscious developers might be able to implement such a solution, at the expense of portability.

A lot of efficiency is lost due to multiple handshakes (TCP, TLS, OAuth have their own handshakes), since each protocol can not get any properties
(especially security properties) out of the lower-level protocols. This leads to longer connection times and and increased attack surface of the
various services.

\section{The solutions}

\subsection{High level protocol}

If we try to fix the situation with another high level protocol (like OpenID-Connect is trying to do) we gain ease of implementation due to the
abstractions of the lower level protocols and their security properties, but we are also limited by them. Efficiency is also greatly impacted, and we
might have to rely on other protocols to avoid the limitations of the protocol pile we choose (like OpenID-Connect has to rely on webfinger to avoid OAuth
non-interoperability).

This means that our quest for simplicity will lead to a contradiction, as more protocols need to be used, the attack surface increments, and we need to
handle all the protocol interactions and limitations.

As stated, this is the road chosen by the OAuth and OpenID-Connect authors, so there is little to gain from choosing it again

\subsection{Low level protocol}

At the first glance this is much more complex, as we need to reimplement everything from TCP up to OAuth in a single solution, but we can gain a lot
of features from experimental protocols, and add the federation and authorization support, which is found virtually nowhere, so we gain in:

\begin{itemize}
\item \textbf{Efficiency}: handshake data can be secured from TCP resets and can include authentication data, no multiple handshake or multiple
chain of trust controls.
\item \textbf{Federation}: we can finally design the protocol so that authentication on multiple domains is the same, by including domain discovery techniques.
\item \textbf{Authorization}: we can design the system so the user can force an application to a lower level of authorization if the application is not trusted.
\item \textbf{Additional features}:
\begin{itemize}
\item \textbf{transport flexibility}: multistream support, and choosing every stream transport features will increase application features while simplifying them
\item \textbf{multihoming}: finally design a protocol whose connection status is not dependent from layer 3 (IP) data.
\item \textbf{multicast}: including this will greatly simplify application development and content delivery
\item \textbf{datagram}: handling message begin/end despite of packet data splitting will simplify user data management
\item \textbf{uniformity}: transport, authentication get fused together, the application should be decoupled from user authentication, simplifying and securing
existing solutions
\end{itemize}
\end{itemize}

This seems obviously more work, but the overall code used for the whole protocol pile will be much less, thus reducing the attack surface.

~\\

As we are talking about a new, experimental protocol anyway, the obvious choice should be this one. To avoid the SCTP/DCCP mistakes, the protocol will
need to be able to work both on top of UDP (for bypassing firewall and NAT problems) and directly on top of IP (for efficiency) seamlessly, so we should also take
into account a transitional phase between UDP based and IP-based transport.

Again, attack surface will be reduced, especially after the code base stabilizes, and there will be no need to analyze the interaction between multiple protocols,
thus simplifying the development phase.