15.02.2015 Views

C# 4 and .NET 4

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

220 ❘ ChaPTer 9 strinGs And reGulAr expressiOns<br />

where the match was found. Running this code results in three matches. The following table details some of<br />

the RegexOptions enumerations.<br />

member name<br />

CultureInvariant<br />

ExplicitCapture<br />

IgnoreCase<br />

IgnorePatternWhitespace<br />

Multiline<br />

RightToLeft<br />

Singleline<br />

desCriPTion<br />

Specifies that the culture of the string is ignored.<br />

Modifies the way the match is collected by making sure that valid captures<br />

are the ones that are explicitly named.<br />

Ignores the case of the string that is input.<br />

Removes unescaped whitespace from the string <strong>and</strong> enables comments that<br />

are specified with the pound or hash sign.<br />

Changes the characters ^ <strong>and</strong> $ so that they are applied to the beginning <strong>and</strong><br />

end of each line <strong>and</strong> not just to the beginning <strong>and</strong> end of the entire string.<br />

Causes the inputted string to be read from right to left instead of the default<br />

left to right (ideal for some Asian <strong>and</strong> other languages that are read in this<br />

direction).<br />

Specifies a single-line mode where the meaning of the dot (.) is changed to<br />

match every character.<br />

So far, nothing is new from the preceding example apart from some .<strong>NET</strong> base classes. However, the power<br />

of regular expressions really comes from that pattern string. The reason is that the pattern string does not<br />

have to contain only plain text. As hinted earlier, it can also contain what are known as meta-characters,<br />

which are special characters that give comm<strong>and</strong>s, as well as escape sequences, which work in much the same<br />

way as <strong>C#</strong> escape sequences. They are characters preceded by a backslash (\) <strong>and</strong> have special meanings.<br />

For example, suppose that you wanted to find words beginning with n. You could use the escape sequence<br />

\b, which indicates a word boundary (a word boundary is just a point where an alphanumeric character<br />

precedes or follows a whitespace character or punctuation symbol). You would write this:<br />

const string pattern = @"\bn";<br />

MatchCollection myMatches = Regex.Matches(myText, pattern,<br />

RegexOptions.IgnoreCase |<br />

RegexOptions.ExplicitCapture);<br />

Notice the @ character in front of the string. You want the \b to be passed to the .<strong>NET</strong> regular expressions<br />

engine at runtime — you don’t want the backslash intercepted by a well-meaning <strong>C#</strong> compiler that thinks<br />

it’s an escape sequence intended for itself! If you want to find words ending with the sequence ion, you<br />

write this:<br />

const string pattern = @"ion\b";<br />

If you want to find all words beginning with the letter a <strong>and</strong> ending with the sequence ion (which has as its<br />

only match the word application in the example), you will have to put a bit more thought into your code.<br />

You clearly need a pattern that begins with \ba <strong>and</strong> ends with ion\b, but what goes in the middle You<br />

need to somehow tell the application that between the a <strong>and</strong> the ion there can be any number of characters<br />

as long as none of them are whitespace. In fact, the correct pattern looks like this:<br />

const string pattern = @"\ba\S*ion\b";<br />

Eventually you will get used to seeing weird sequences of characters like this when working with regular<br />

expressions. It actually works quite logically. The escape sequence \S indicates any character that is not a<br />

whitespace character. The * is called a quantifier. It means that the preceding character can be repeated any<br />

number of times, including zero times. The sequence \S* means any number of characters as long as they<br />

are not whitespace characters. The preceding pattern will, therefore, match any single word that begins<br />

with a <strong>and</strong> ends with ion.<br />

www.it-ebooks.info

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!