19.09.2017 Views

the-web-application-hackers-handbook

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

494 Chapter 12 n Attacking Users: Cross-Site Scripting<br />

if user data is inserted into a quoted JavaScript string in an event handler, any<br />

quotation marks or backslashes in user input should be properly escaped with<br />

backslashes, and <strong>the</strong> HTML encoding should include <strong>the</strong> & and ; characters to<br />

prevent an attacker from performing his own HTML encoding.<br />

ASP.NET <strong>application</strong>s can use <strong>the</strong> Server.HTMLEncode API to sanitize common<br />

malicious characters within a user-controllable string before this is copied<br />

into <strong>the</strong> server’s response. This API converts <strong>the</strong> characters “&< and > into <strong>the</strong>ir<br />

corresponding HTML entities and also converts any ASCII character above 0x7f<br />

using <strong>the</strong> numeric form of encoding.<br />

The Java platform has no equivalent built-in API; however, it is easy to construct<br />

your own equivalent method using just <strong>the</strong> numeric form of encoding.<br />

For example:<br />

public static String HTMLEncode(String s)<br />

{<br />

StringBuffer out = new StringBuffer();<br />

for (int i = 0; i < s.length(); i++)<br />

{<br />

char c = s.charAt(i);<br />

if(c > 0x7f || c==’”’ || c==’&’ || c==’’)<br />

out.append(“&#” + (int) c + “;”);<br />

else out.append(c);<br />

}<br />

return out.toString();<br />

}<br />

A common mistake developers make is to HTML-encode only <strong>the</strong> characters<br />

that immediately appear to be of use to an attacker in <strong>the</strong> specific context. For<br />

example, if an item is being inserted into a double-quoted string, <strong>the</strong> <strong>application</strong><br />

might encode only <strong>the</strong> “ character. If <strong>the</strong> item is being inserted unquoted<br />

into a tag, it might encode only <strong>the</strong> > character. This approach considerably<br />

increases <strong>the</strong> risk of bypasses being found. As you have seen, an attacker can<br />

often exploit browsers’ tolerance of invalid HTML and JavaScript to change<br />

context or inject code in unexpected ways. Fur<strong>the</strong>rmore, it is often possible to<br />

span an attack across multiple controllable fields, exploiting <strong>the</strong> different filtering<br />

being employed in each one. A far more robust approach is to always HTMLencode<br />

every character that may be of potential use to an attacker, regardless<br />

of <strong>the</strong> context where it is being inserted. To provide <strong>the</strong> highest possible level<br />

of assurance, developers may elect to HTML-encode every nonalphanumeric<br />

character, including whitespace. This approach normally imposes no measurable<br />

overhead on <strong>the</strong> <strong>application</strong> and presents a severe obstacle to any kind of<br />

filter bypass attack.<br />

The reason for combining input validation and output sanitization is that this<br />

involves two layers of defenses, ei<strong>the</strong>r one of which provides some protection<br />

if <strong>the</strong> o<strong>the</strong>r one fails. As you have seen, many filters that perform input and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!