15.04.2018 Views

programming-for-dummies

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

426<br />

Sequential Text Search<br />

The Rabin-Karp algorithm<br />

Although much faster than a brute <strong>for</strong>ce search, the Boyer-Moore algorithm<br />

still searches one character at a time. If you’re searching <strong>for</strong> a text string,<br />

you can speed up the search by examining blocks of text rather than individual<br />

characters.<br />

For example, if you’re searching <strong>for</strong> the string GAG, you could examine three<br />

characters at a time rather than examining a single character three times. To<br />

make searching blocks of characters faster, two computer scientists (Michael<br />

O. Rabin and Richard M. Karp) created the Rabin-Karp algorithm.<br />

This algorithm uses a hash function to convert a block of characters into a<br />

numeric value. Instead of examining individual characters, the Rabin-Karp<br />

algorithm uses its hash function to convert the original search string into a<br />

numeric value. So a hash function might convert the three-character string<br />

to search (GAG) into a numeric value of 3957.<br />

After converting the search string into a numeric value, the Rabin-Karp algorithm<br />

repetitively searches <strong>for</strong> blocks of characters that are the same length<br />

of the search string (such as three-characters) and uses its hash function to<br />

convert those blocks of text into a numeric value. Now instead of searching<br />

<strong>for</strong> matching characters, the Rabin-Karp algorithm searches just <strong>for</strong> matching<br />

hash values, as shown in Figure 3-3.<br />

Figure 3-3:<br />

The Rabin-<br />

Karp<br />

algorithm<br />

searches <strong>for</strong><br />

hash values.<br />

Searching <strong>for</strong> the string GAG<br />

G A T A G<br />

G 0 1 1 1 0 1 0 1 1<br />

A<br />

G<br />

A<br />

T<br />

The key to the Rabin-Karp algorithm is the speed and method of its hash<br />

function. If the hash function can create values quickly and insure that different<br />

strings never create the same hash value, this algorithm can run quickly.<br />

If the hash function calculates hash values slower than the computer can<br />

examine characters individually, this algorithm may run slower than another<br />

algorithm, such as the Boyer-Moore algorithm. Also if the hash function calculates<br />

identical hash values <strong>for</strong> two different strings, this algorithm won’t<br />

be accurate enough because it finds the wrong data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!