A side-channel attack is any attack based on information gained from the implementation of a computer system, rather than weaknesses in the implemented algorithm itself (e.g. cryptanalysis and software bugs). Surprisingly detailedsensitive information is being leaked out from a number of high-profile, top-of-the-line web applications in healthcare, taxation, investment and web search despite HTTPS protection.
Compression side-channel attacks can be used to read some data by knowing only the size of the compressed data such as the CRIME, and BREACH attacks. To understand how compression side-channel attacks work, we must have a fair understanding of compression algorithms.
How compression works
In compression algorithms any phrase that is repeated gets stored once. This means that if a certain string of characters is repeated somewhere in the text, it is only stored the first time. The second time it occurs as a reference to the first occurrence, therefore when a text occurs multiple times it is very efficiently compressed so the size is smaller. This characteristic can be used in a compression side channel attack.
If a text contains both a secret and a user-controlled part, we can guess the secret by just looking at the length of the compressed result. Therefore an attacker can read some data if they can read the size of the compressed page, and they have control over some of the data on that page. This is the basis for the CRIME and BREACH attacks, where CRIME depends on compression on the transport layer and BREACH depends on HTTP compression.
In September 2012, security researchers Thai Duong and Juliano Rizzo announced CRIME, a compression side-channel attack against HTTPS. The attack takes advantage of an information leak in the compression ratio of TLS requests as a side channel to enable them to decrypt the requests made by the client to the server. This, in turn, allows them to grab the user’s login cookie and then hijack the user’s session and impersonate them on high-value destinations such as banks or e-commerce sites.
The demonstration showed how an attacker might execute this attack to recover the headers of an HTTP request. Since HTTP headers contain cookies, and cookies are the primary vehicle for web application authentication (after login), this presents a significant attack. By disabling TLS/SSL-level compression – which was already little-used, and in fact disabled in most browsers – the attack is completely mitigated. Google and Mozilla have developed patches to defend against the CRIME attack.
While CRIME was mitigated by disabling TLS compression, BREACH attacks HTTP responses. These are compressed using the common HTTP compression, which is much more common than TLS-level compression. The BREACH attack is an offshoot of CRIME. Released at Black Hat USA 2013 by researchers Angelo Prado, Neal Harris and Yoel Gluck, BREACH enables an attacker to read encrypted messages over the Web by injecting plaintext into an HTTPS request and measuring compression changes.
BREACH is a category of vulnerabilities and not a specific instance affecting a specific piece of software. To be vulnerable, a web application must be served from a server that uses HTTP-level compression, reflect user-input in HTTP response bodies and reflect a secret (such as a CSRF token) in HTTP response bodies. It is important to note that the attack is agnostic to the version of TLS/SSL, and does not require TLS-layer compression.
“Even if TLS-level compression is disabled, it is very common to use gzip at the HTTP level. Furthermore, it is very common that secrets (such as CSRF tokens) and user input are included in the same HTTP response, and therefore (very likely) in the same compression context,” the researchers wrote. “This allows essentially the same attack demonstrated by [Thai] Duong and [Juliano] Rizzo, but without relying on TLS-level compression.”
Prado, Harris and Gluck said at Black Hat that several ingredients make up the attack, starting with compression such as gzip, a stable webpage, the ability to measure the victim’s traffic—usually via man-in-the-middle attack, a CSRF token or some other secret in the response body, an attacker-supplied guess and a bootstrapping sequence.
“It is common for Web applications to reflect user input, such as URL parameters, in HTTP response bodies,” the paper said. “Since DEFLATE (the basis for gzip) takes advantage of repeated strings to shrink the compression payload, an attacker can use the reflected URL parameter to guess the secret one character at a time.”
The BREACH vulnerability can be exploited with just a few thousand requests, and can be executed in under a minute. The number of requests required depend on the secret size. The power of the attack comes from the fact that it allows guessing a secret one character at a time. With each correct guess of the secret, the response is compressed further, indicating to the attacker that they are getting closer. This provides an oracle that an attacker can exploit to recover the first character of [the token],” the researchers said. “Then, the attacker proceeds in the same manner to recover [the token] byte-by-byte.” Within 30 seconds during their demo, they had the 30-character encrypted token deciphered and could do so with 95 percent accuracy.
A number of mitigations were suggested by CERT and the researchers behind the attack, some of which could protect only individual Web pages rather than an entire application. The mitigations include disabling HTTP compression, separation of secrets from user input, randomization of secrets in client requests, masking of secrets by XORing with a random secret per request, protecting pages from CSRF attacks, and obfuscating the length of Web responses with random bytes of information.