15.02.2015 Views

C# 4 and .NET 4

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

egular expressions ❘ 221<br />

The following table lists some of the main special characters or escape sequences that you can use. It is not<br />

comprehensive, but a fuller list is available in the MSDN documentation.<br />

symbol meaning eXamPle maTChes<br />

^ Beginning of input text ^B B, but only if fi rst character in text<br />

$ End of input text X$ X, but only if last character in text<br />

. Any single character except the<br />

newline character (\ )<br />

* Preceding character may be repeated<br />

zero or more times<br />

+ Preceding character may be repeated<br />

one or more times<br />

Preceding character may be repeated<br />

zero or one time<br />

i.ation<br />

ra*t<br />

ra+t<br />

rat<br />

isation, ization<br />

rt, rat, raat, raaat, <strong>and</strong> so on<br />

rat, raat, raaat <strong>and</strong> so on, but<br />

not rt<br />

rt <strong>and</strong> rat only<br />

\s Any whitespace character \sa [space]a, \ta, \na (\t <strong>and</strong> \n<br />

have the same meanings as in <strong>C#</strong>)<br />

\S Any character that isn’t whitespace \SF aF, rF, cF, but not \tf<br />

\b Word boundary ion\b Any word ending in ion<br />

\B Any position that isn’t a word<br />

boundary<br />

\BX\B<br />

Any X in the middle of a word<br />

If you want to search for one of the meta - characters, you can do so by escaping the corresponding character<br />

with a backslash. For example, . (a single period) means any single character other than the newline<br />

character, whereas \. means a dot.<br />

You can request a match that contains alternative characters by enclosing them in square brackets. For<br />

example, [1|c] means one character that can be either 1 or c . If you wanted to search for any occurrence<br />

of the words map or man , you would use the sequence ma[n|p] . Within the square brackets, you can also<br />

indicate a range, for example [a - z] , to indicate any single lowercase letter, [A - E] to indicate any uppercase<br />

letter between A <strong>and</strong> E (including the letters A <strong>and</strong> E themselves), or [0 – 9] to represent a single digit. If you<br />

want to search for an integer (that is, a sequence that contains only the characters 0 through 9), you could<br />

write [0 – 9]+ .<br />

The use of the + character indicates there must be at least one such digit, but there may<br />

be more than one — so this would match 9, 83, 854, <strong>and</strong> so on.<br />

displaying results<br />

In this section, you code the RegularExpressionsPlayaround example, so you can get a feel for how the<br />

regular expressions work.<br />

The core of the example is a method called WriteMatches() , which writes out all the matches from a<br />

MatchCollection in a more detailed format. For each match, it displays the index of where the match was<br />

found in the input string, the string of the match, <strong>and</strong> a slightly longer string, which consists of the match<br />

plus up to ten surrounding characters from the input text — up to fi ve characters before the match <strong>and</strong> up to<br />

fi ve a f t e r wa rd . ( I t i s f e we r t h a n fi ve c h a r ac t e r s i f t h e m at c h o c c u r r e d w it h i n fi ve c h a r ac t e r s of t h e b e g i n n i n g<br />

or end of the input text.) In other words, a match on the word messaging that occurs near the end of the<br />

input text quoted earlier would display <strong>and</strong> messaging of d (fi ve characters before <strong>and</strong> after the match),<br />

www.it-ebooks.info

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!