23.11.2014 Views

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

Data Structures and Algorithms in Java[1].pdf - Fulvio Frisone

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12.2.2 The Boyer-Moore Algorithm<br />

At first, we might feel that it is always necessary to exam<strong>in</strong>e every character <strong>in</strong> T <strong>in</strong><br />

order to locate a pattern P as a substr<strong>in</strong>g. But this is not always the case, for the<br />

Boyer-Moore (BM) pattern match<strong>in</strong>g algorithm, which we study <strong>in</strong> this section, can<br />

sometimes avoid comparisons between P <strong>and</strong> a sizable fraction of the characters <strong>in</strong><br />

T. The only caveat is that, whereas the brute-force algorithm can work even with a<br />

potentially unbounded alphabet, the BM algorithm assumes the alphabet is of fixed,<br />

f<strong>in</strong>ite size. It works the fastest when the alphabet is moderately sized <strong>and</strong> the pattern<br />

is relatively long. Thus, the BM algorithm is ideal for search<strong>in</strong>g words <strong>in</strong><br />

documents. In this section, we describe a simplified version of the orig<strong>in</strong>al<br />

algorithm by Boyer <strong>and</strong> Moore.<br />

The ma<strong>in</strong> idea of the BM algorithm is to improve the runn<strong>in</strong>g time of the bruteforce<br />

algorithm by add<strong>in</strong>g two potentially time-sav<strong>in</strong>g heuristics. Roughly stated,<br />

these heuristics are as follows:<br />

Look<strong>in</strong>g-Glass Heuristic: When test<strong>in</strong>g a possible placement of P aga<strong>in</strong>st T, beg<strong>in</strong><br />

the comparisons from the end of P <strong>and</strong> move backward to the front of P.<br />

Character-Jump Heuristic: Dur<strong>in</strong>g the test<strong>in</strong>g of a possible placement of P aga<strong>in</strong>st<br />

T, a mismatch of text character T[i] = c with the correspond<strong>in</strong>g pattern character<br />

P[j] is h<strong>and</strong>led as follows. If c is not conta<strong>in</strong>ed anywhere <strong>in</strong> P, then shift P<br />

completely past T[i] (for it cannot match any character <strong>in</strong> P). Otherwise, shift P<br />

until an occurrence of character c <strong>in</strong> P gets aligned with T[i].<br />

750

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!