21.03.2013 Views

Problem - Kevin Tafuro

Problem - Kevin Tafuro

Problem - Kevin Tafuro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Discussion<br />

RFC 1738 defines the syntax for URLs. Section 2.2 of that document also defines the<br />

rules for encoding characters in a URL. While some characters must always be<br />

encoded, any character may be encoded. Essentially, this means that before you do<br />

anything with a URL—whether you need to parse the URL into pieces (i.e., username,<br />

password, host, and so on), match portions of the URLagainst a whitelist or<br />

blacklist, or something else entirely—you need to decode it.<br />

The problem is that you must make certain that you never decode a URLthat has<br />

already been decoded; otherwise, you will be vulnerable to double-encoding attacks.<br />

Suppose that the URLcontains the sequence “%25%34%31”. Decoded once, the<br />

result is “%41” because “%25” is the encoding for the percent symbol, “%34” is the<br />

encoding for the number 4, and “%31” is the encoding for the number 1. Decoded<br />

twice, the result is “A”.<br />

At first glance, this may seem harmless, but what if you were to decode repeatedly<br />

until there were no more escaped characters? You would end up with certain<br />

sequences of characters that are impossible to represent. The purpose of encoding in<br />

the first place is to allow the use of characters that have special meaning or that cannot<br />

be represented visually.<br />

Another potential problem with encoding that is limited primarily to C and C++ is<br />

that a NULL-terminator can be encoded anywhere in the URL. There are several<br />

approaches to dealing with this problem. One is to treat the decoded string as a<br />

binary array rather than a C-style string; another is to use the SafeStr library<br />

described in Recipe 3.4 because it gives no special significance to any one character.<br />

You can use the following spc_decode_url( ) function to decode a URL. It returns a<br />

dynamically allocated copy of the URLin decoded form. The result will be NULL-terminated,<br />

so it may be treated as a C-style string, but it may contain embedded NULLs<br />

as well. You can determine whether it contains embedded NULLs by comparing the<br />

number of bytes spc_decode_url( ) indicates that it returns with the result of calling<br />

strlen( ) on the decoded URL. If the URL contains embedded NULLs, the result from<br />

strlen( ) will be less than the number of bytes indicated by spc_decode_url( ).<br />

#include <br />

#include <br />

#include <br />

#define SPC_BASE16_TO_10(x) (((x) >= '0' && (x)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!