15.02.2015 Views

C# 4 and .NET 4

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

218 ❘ ChaPTer 9 strinGs And reGulAr expressiOns<br />

Microsoft ported it onto Windows, where up until now it has been used mostly with scripting<br />

languages. Today, regular expressions are supported by a number of .<strong>NET</strong> classes in the namespace<br />

System.Text.RegularExpressions . You can also fi nd the use of regular expressions in various parts<br />

of the .<strong>NET</strong> Framework. For instance, you will fi nd that they are used within the ASP.<strong>NET</strong> Validation<br />

server controls.<br />

If you are not familiar with the regular expressions language, this section introduces both regular<br />

expressions <strong>and</strong> their related .<strong>NET</strong> classes. If you are already familiar with regular expressions, you will<br />

probably want to just skim through this section to pick out the references to the .<strong>NET</strong> base classes. You<br />

might like to know that the .<strong>NET</strong> regular expression engine is designed to be mostly compatible with Perl 5<br />

regular expressions, although it has a few extra features.<br />

introduction to regular expressions<br />

The regular expressions language is designed specifi cally for string processing. It contains two features:<br />

➤<br />

➤<br />

A set of escape codes for identifying specifi c types of characters. You will be familiar with the use of<br />

the * character to represent any substring in DOS expressions. (For example, the DOS comm<strong>and</strong><br />

Dir Re* lists the fi les with names beginning with Re .) Regular expressions use many sequences like<br />

this to represent items such as any one character , a word break , one optional character , <strong>and</strong> so on.<br />

A system for grouping parts of substrings <strong>and</strong> intermediate results during a search operation.<br />

With regular expressions, you can perform quite sophisticated <strong>and</strong> high - level operations on strings. For<br />

example, you can:<br />

➤<br />

➤<br />

➤<br />

➤<br />

➤<br />

Identify (<strong>and</strong> perhaps either fl ag or remove) all repeated words in a string (for example, “ The<br />

computer books books ” to “ The computer books ”)<br />

Convert all words to title case (for example, “ this is a Title ” to “ This Is A Title ” )<br />

Convert all words longer than three characters to title case (for example, “ this is a Title ” to “ This is<br />

a Title ”)<br />

Ensure that sentences are properly capitalized<br />

Separate the various elements of a URI (for example, given http://www.wrox.com , extract the<br />

protocol, computer name, fi le name, <strong>and</strong> so on)<br />

Of course, all these tasks can be performed in <strong>C#</strong> using the various methods on System.String <strong>and</strong><br />

System.Text.StringBuilder . However, in some cases, this would require writing a fair amount of<br />

<strong>C#</strong> code. If you use regular expressions, this code can normally be compressed to just a couple of lines.<br />

Essentially, you instantiate a System.Text.RegularExpressions.RegEx object (or, even simpler, invoke<br />

a static RegEx() method), pass it the string to be processed, <strong>and</strong> pass in a regular expression (a string<br />

containing the instructions in the regular expressions language), <strong>and</strong> you ’ re done.<br />

A regular expression string looks at fi rst sight rather like a regular string, but interspersed with escape<br />

sequences <strong>and</strong> other characters that have a special meaning. For example, the sequence \b indicates the<br />

beginning or end of a word (a word boundary), so if you wanted to indicate you were looking for the<br />

characters th at the beginning of a word, you would search for the regular expression, \bth (that is, the<br />

sequence word boundary - t - h ). If you wanted to search for all occurrences of th at the end of a word,<br />

you would write th\b (the sequence t - h - word boundary). However, regular expressions are much more<br />

sophisticated than that <strong>and</strong> include, for example, facilities to store portions of text that are found in a<br />

search operation. This section merely scratches the surface of the power of regular expressions.<br />

For more on regular expressions, please review the book Beginning Regular<br />

Expressions, Wiley Publishing, 2005 (ISBN 978 - 0 - 7645 - 7489 - 4).<br />

www.it-ebooks.info

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!