Death to the filters - how to validate JSON correctly

JSON is a pretty popular data-interchange format. It's supported across programming languages, fits naturally into Javascript environment, it's human-readable and it's much lighter than XML. No wonder the format is widely adopted in many web applications.
Disclaimer: This post will probably be nothing new to security experts. It's meant for developers, as I've seen vulnerable applications using the bad practices described here.

Is JSON secure?

Well, it has some quirks (it had even more of them in the past) but mostly it's good enough. Provided two things:

you don't use eval!
you are sure that's JSON (not everything that's a valid Javascript is JSON, it's only a subset)

Eval() is, as usual, asking yourself for trouble. Unfortunately, it's common - it's even mentioned in JSON's own documentation as a recommended practice:

To convert a JSON text into an object, you can use the eval() function. eval() invokes the JavaScript compiler. Since JSON is a proper subset of JavaScript, the compiler will correctly parse the text and produce an object structure. The text must be wrapped in parens to avoid tripping on an ambiguity in JavaScript's syntax.
var myObject = eval('(' + myJSONtext + ')');
The eval function is very fast. However, it can compile and execute any JavaScript program, so there can be security issues. The use of eval is indicated when the source is trusted and competent. It is much safer to use a JSON parser. In web applications over XMLHttpRequest, communication is permitted only to the same origin* that provide that page, so it is trusted. But it might not be competent. If the server is not rigorous in its JSON encoding, or if it does not scrupulously validate all of its inputs, then it could deliver invalid JSON text that could be carrying dangerous script. The eval function would execute the script, unleashing its malice.

* docs are outdated, you can do cross-origin request now (CORS)

Example - no filters

So, when using eval() it's crucial that the input is in fact valid JSON that has not been tampered with. Consider the following situation:

Our application uses JSON to store various user preferences (e.g. background color settings). Server does not care about the JSON, it's only relevant to client-side scripts. The data flow looks like this:

User fills out the 'change preferences' form
Javascript code creates JSON out of this and POSTs it to the server
Server stores JSON in the database and retrieves it in future requests. The resulting code looks like this: preferences_json = '{"background":"red"}';
Javascript needs to get preferences, so it uses:

var prefs = eval("(" + preferences_json + ")");
document.body.style.backgroundColor = prefs.background;
...

XSSing this is simple - escape from single quotes:

//  payload:    '; alert(1); //
// this will execute in the preferences_json line like this:
preferences_json = ''; alert(1); //';

So the server needs to either deny ' in input or escape the output (' becomes \'). Are we secure now? No!

Filtering

Even with proper escaping, we do not have to escape the string at all for XSS. We can do the code execution in eval() call:

// payload:   alert(1),{"background":"red"}
// This will become:
eval("("+'alert(1),{"background":"red"}'+")");
// and alert will execute.

One very bad practice I saw is disallowing certain characters in JSON input when storing it. For example, you could disallow "(" and ")" so that function can't be executed. But there are ways to execute functions without "()" - see for example this wonderful vector for IE. In fact, it's even possible to execute JS in this context if you're only allowing a-z A-Z 0-9 { } - : , . " (most of these characters are required for JSON to work anyway). Another example - allow \ and attacker can use obfuscation (e.g. \u0020 ) and embed pretty much any character.

If you validate JSON by looking at characters only and disallowing / allowing them - you're toasted. You've just opened a bag of tricks for attackers to use (and new techniques requiring fewer characters are still in development). Don't!

Try for yourself!

Think you can execute arbitrary code in this victim page? Yes, you can!

Source code for examples is, as always, on GitHub. And remember:

If you accept JSON input, make sure it's a legit JSON input (JSON is a subset of Javascript!). In PHP you can use json_decode()to validate server-side.
Validating JSON using character whitelist is a dead end. Don't do it. There are really tricky vectors around.
Don't use eval! There's JSON.parse() built into newer browsers, for older - use this.

the world. according to koto

Wednesday, August 24, 2011