14.01.2013 Views

Mastering Regular Expressions - Table of Contents

Mastering Regular Expressions - Table of Contents

Mastering Regular Expressions - Table of Contents

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Matching an Email Address<br />

Page 294<br />

I'd like to finish with a lengthy example that brings to bear many <strong>of</strong> the regex<br />

techniques seen in these last few chapters, as well as some extremely valuable<br />

lessons about building up a complex regular expression using variables. Verifying<br />

correct syntax <strong>of</strong> an Internet email address is a common need, but unfortunately,<br />

because <strong>of</strong> the standard's complexity, * it is quite difficult to do simply. In fact, it<br />

is impossible with a regular expression because address comments may be nested.<br />

(Yes, email addresses can have comments: comments are anything between<br />

parentheses.) If you're willing to compromise, such as allowing only one level <strong>of</strong><br />

nesting in comments (suitable for any address I've ever seen), you can take a stab<br />

at it. Let's try.<br />

Still, it's not for the faint at heart. In fact, the regex we'll come up with is 4,724<br />

bytes long! At first thought, you might think something as simple as<br />

\w+\@[.\w]+ could work, but it is much more complex. Something like<br />

Jeffy <br />

is perfectly valid as far as the specification is concerned.** So, what constitutes a<br />

lexically valid address? <strong>Table</strong> 7-11 lists a lexical specification for an Internet<br />

email address in a hybrid BNF/regex notation that should be mostly<br />

self-explanatory. In addition, comments (item 22) and whitespace (spaces and<br />

tabs) are allowed between most items. Our task, which we choose to accept, is to<br />

convert it to a regex as best we can. It will require every ounce <strong>of</strong> technique we<br />

can muster, but it is possible. ***<br />

Levels <strong>of</strong> interpretation<br />

When building a regex using variables, you must take extra care to understand the<br />

quoting, interpolating, and escaping that goes on. With ^\w+\@[.\w]+$ as<br />

an example, you might naively render that as<br />

$username = "\w+";<br />

$hostname = "\w+(\.\w+)+";<br />

$email = "^$username\@$hostname$";<br />

... m/$email/o ...

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!