15.04.2018 Views

programming-for-dummies

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3: String Searching<br />

In This Chapter<br />

Searching text sequentially<br />

Searching by using regular expressions<br />

Searching strings phonetically<br />

Searching <strong>for</strong> data is one of the most common functions in writing a computer<br />

program. Most searching algorithms focus on searching a list of<br />

values, such as numbers or names. However, there’s another specialized type<br />

of searching, which involves searching text.<br />

Searching text poses unique problems. Although you can treat text as one<br />

long list of characters, you aren’t necessarily searching <strong>for</strong> a discrete value,<br />

like the number 21 or the last name Smith. Instead, you may need to search<br />

a long list of text <strong>for</strong> a specific word or phrase, such as ant or cat food.<br />

Not only do you need to find a specific word or phrase, but you also may<br />

need to find that same word or phrase multiple times. Because of these differences,<br />

computer scientists have created a variety of searching algorithms<br />

specifically tailored <strong>for</strong> searching text.<br />

Computers only recognize and manipulate numbers, so every computer represents<br />

characters as a universally recognized numeric code. Two common<br />

numeric codes include the American Standard Code <strong>for</strong> In<strong>for</strong>mation Interchange<br />

(ASCII) and Unicode. ASCII contains 256 codes that represent mostly Western<br />

characters whereas Unicode contains thousands of codes that represent languages<br />

as diverse as Arabic, Chinese, and Cyrillic. When searching <strong>for</strong> text,<br />

computers actually search <strong>for</strong> numeric codes that represent specific text, so<br />

text searching is ultimately about number searching.<br />

One of the most popular uses <strong>for</strong> text searching algorithms involves a field<br />

called bioin<strong>for</strong>matics, which combines molecular biology with computer <strong>programming</strong>.<br />

The basic idea is to use long text strings, such as gcacgtaag, to<br />

represent a DNA structure and then search <strong>for</strong> a specific string within that<br />

DNA structure (such as cgt) to look <strong>for</strong> matches that could indicate how a<br />

particular drug could interact with the DNA of a virus to neutralize it.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!